COLLABORATIVE FILTERING RECOMMENDATION ALGORITHM TOWARDS INTELLIGENT COMMUNITY

. Collaborative ﬁltering recommendation algorithm is a successful and widely used recommendation method in recommender system. In the collaborative ﬁltering recommendation algorithm, the key step is to ﬁnd the nearest neighbor. Combined with the application scenario of the intelligent community, Pearson Correlation Coeﬃcient is introduced to improve the accuracy of similarity calculation. At the same time, considering that the resi- dents are relatively ﬁxed, the K-means clustering algorithm can be combined with the user-based collaborative ﬁltering recommendation algorithm to im- prove the sparsity of the matrix and improve the speed of recommendation. Validation results on MovieLens dataset show that the collaborative ﬁltering recommendation algorithm integrating with K-means clustering algorithm and community factors can more eﬀectively predict the actual user rating in the community application scenario, and improve the recommendation accuracy and recommendation speed, compared with the traditional collaborative ﬁlter- ing recommendation algorithm.

wealth to guide the community offering better public service and better community autonomy [14]. The information mined from intelligent community platform will be more accurately matched to the corresponding population, so as to provide new services, such as providing personalized recommendation services for residents, promoting communication between residents, helping the government decision-making from resident's demand.
The user's personality preferences in the community may not be the same, but the habits seem to be converging and the need for goods and services will be similar. For example,the plumber A near the community M provides a good service to a customer living in the community M and receives praise from the customer. In the intelligent community platform based on the existing evaluation data, it will use recommendation algorithm to recommend user living in the community M plumber A in priority and will not recommend plumber B working another area or far away from the community M [4]. This is a unique feature of recommendation based on intelligent community. We make a community-based similarity calculation. Recommendation algorithm also gives priority to the situation whether in a community or in a similar community. The traditional collaborative filtering algorithm is considering the similarity between users in all items, and its coverage is very broad but ignores the individual needs of some users. The traditional collaborative filtering algorithm is based on user rating data and exists data matrix sparsity and cold boot problem. When calculating the user similarity, the sample will be insufficient, so that the similarity calculation accuracy is not high, which leads to the unsatisfactory results. Therefore, this paper adopts the calculation methods based on community factor to replace the traditional similarity calculation methods.taking into account that the number of community residents is relatively fixed, we introduce K-means clustering algorithm, which can effectively alleviate the data matrix sparsity problem and improve the recommendation speed. This paper proposes a collaborative filtering recommendation algorithm integrating with K-means clustering algorithm and community factor, which provides personalized recommendation service for community residents.
2. Related works. At present, there are a lot of research on recommendation algorithms, the main research directions include collaborative filtering recommendation, content-based recommendation, association rule-based recommendation and hybrid recommendation algorithm [5]. Collaborative filtering recommendation algorithm is the most popular recommendation algorithm. This algorithm is based on the hypothesis that birds of a feather flock together. If the user A is similar to the user B, then A is also interested in the things that are likely to interest B. In this algorithm, the preference of the user to the item is directly described by the item rating, and the quality of the recommendation is closely related to the quality of user rating data. It is generally used in the system which can collect user rating data. Collaborative filtering recommendation algorithm can be divided into two types: item-based collaborative filtering recommendation and user-based collaborative filtering recommendation. The general ideas of the two algorithms are the same. By looking for the nearest neighbor, we use the nearest neighbor rating to predict the target user rating on the item. This classic collaborative filtering recommendation algorithm can be used directly in the personalized recommendation of intelligent Community. However, the potential problem of data sparsity makes the algorithm can not wholly adapt to the intelligent community. We need to improve it to offer the community residents more perfect personalized recommendation experience,combined with the actual community scene.
Before the appearance of recommendation engine, content-based recommendation algorithm has been widely used. it does not depend on the user rating of the item. The core idea of the algorithm is based on the characteristics of the items that the user has purchased or the items that user has explicitly expressed preferences. We analyze the relevance of these items to predict the recommended items. In the content-based recommendation system, the content of the items is described and defined by mining the relevant attributes. Based on the evaluation characteristics of users, we use decision tree algorithm, vector based representation algorithm and neural network algorithm to establish the item data model. The system will study the user's interest, and finally examine the matching degree between the item and the user data. Content-based recommendation algorithm is based on the characteristics of the item , so it can recommend the matching items to users with special preferences and can also recommend new or not very popular items. There is no cold start and new item issues. The disadvantage of this algorithm is that it requires that the content of the feature has a high structure, and it is easy to extract valuable features. It hopes that the user's preferences can be expressed in terms of content characteristics.
In association rule-based recommendation algorithm, association rules can be regarded as the relevance between items. On the basis of the relevance, the purchased items will serve as a rule header and the recommended object will serve as a rule body [2]. Association rule-based recommendation algorithm stems from the idea that "users who buy some items tend to buy something else". Mining association rules is one of the most critical and time-consuming steps in the algorithm. The mining of association rules is to find out the potential relevance between items in the collection and items that are purchased at the same time. The relevance of these associations is described in a specific way. Since the user has purchased some items in the collection frequently, the user is more likely to buy the rest items of the collection. Therefore, the transformation rate of recommendation system based on association rule-based recommendation algorithm is high, but there is a problem of a huge amount of computation. Because the algorithm also needs to be based on user data, there are also problems such as cold boot and data sparsity.
Since each algorithm has its own advantages and limitations, in practice, hybrid recommendation is often adopted [11]. The combination of collaborative filtering recommendation and content-based recommendation has been mostly researched and applied. The easiest way is to use a collaborative filtering recommendation algorithm and content-based recommendation algorithm to produce a recommendation prediction results, and then combine the results by some methods. Although there are many ways to produce hybrid recommendation method in theory, they are not always effective in a specific problem. One of the most important principles of hybrid recommendation is to avoid or compensate for the weaknesses of each recommendation algorithm. In the mixed mode, there are some ideas, such as weight, switch, feature combination, cascade, feature augmentation and so on.
3. Improvement of similarity calculation method. Nearest neighbor query is the core of collaborative filtering recommendation algorithm, and its result and efficiency largely determine the effectiveness and efficiency of the proposed algorithm [15]. The nearest neighbor query relies on the similarity calculation method, so the similarity calculation method has a direct impact on the quality of the recommendation algorithm.
3.1. Traditional similarity calculation method. The traditional similarity calculation methods include Euclidean metric, cosine similarity and Pearsons Collection Similarity. The three similarity calculation methods are based on the user-item rating matrix. Euclidean distance method is simple and easy to understand in all similarity computing methods. Based on the user-item rating matrix, this method considers each user rating for all items as a point in the evaluation space. The linear distance between the corresponding points of two users express the similarity between users. The Euclidean distance d(x, y) is calculated by the following equation: Where d(x, y) represents the Euclidean distance between user x and the user y, x i represents the rating of user x on the item i, y i represents the rating of user y on the item i. The calculated Euclidean distance is a number greater than 0, in order to make it more able to reflect the similarity between users, so we can set it to (0, 1).the specific approach is: 1/(1 + d). sim(x, y) is calculated by the following equation: Where sim(x, y) represents similarity between user x and user y. Euclidean distance can reflect the absolute difference of individual numerical characteristics, so it is more used in the analysis of the difference between the dimensions of the data. For example, we use the user behavior indicators to analyze the similarities or differences in user value.
Cosine similarity method uses the cosine value of the angle between two vectors in vector space as the measure of the difference between the two individuals. It focuses on the difference of the two vectors in the direction, not the distance or length. Based on the user-item rating matrix, each user's item rating is regarded as a rating vector, and the cosine of the angle between the two user sore vectors represents the similarity of the two users. sim(u i , u j ) is calculated by the following equation: Cosine similarity distinguishes the difference from the direction, and is insensitive to absolute values. We use ratings of user to distinguish similarities and differences in interest. At the same time, it can fix the problem that there may be some inconsistency among users (because the cosine similarity is not sensitive to the absolute value).
Pearsons Collection Similarity introduces the influence of Pearson correlation coefficient. The Pearson correlation coefficient reflects the linear correlation between the two variables, which is between [-1,1]. When the linear relationship between the two variables is enhanced, the correlation coefficient tends to be 1 or -1. When a variable is increased and another variable also increases, it indicates that they are positively correlated. The correlation coefficient is larger than 0. If a variable increases, but another variable decreases, it indicates that they are negatively correlated. The correlation coefficient is less than 0. If the correlation coefficient is equal to 0, it indicates that there is no linear correlation between them. The Pearson correlation coefficient equal to the ratio of the standard deviation of the two variables to the covariance of the two variables. The equation for calculating the Pearson similarity measure is shown as follows.
Where R c,ui represents the user u i rating of the item c, R c,uj represents the user u j rating of the item c. R ui represents the user u i average rating for all items. R uj represents the user u j average rating for all items.
3.2. Similarity computation by introducing community factors. The community makes the social relations and the living environment of residents tend to be stable, which makes the living habits and items needed similar. The existence of community factors makes the community residents have differences in habits and needs so the community factor can provide a key role for the personalized recommendation for the intelligent community. In the traditional similarity calculation method, we introduce the community factor into it to improve the accuracy of user similarity calculation.
3.2.1. Extraction of community factor. As a fixed place for people's daily living, intelligent community has a unique factor that can influence the personalized service of community residents.
City of community: factors such as climate and customs have led to different consumer habits in different urban communities, so the consumption items are different. The cities are marked as {1, 2, 3, ..., N }.
Community level: to some extent, the differences of community level affect the needs of community residents. Ordinary community, mid-range community highend community are marked as{1, 2, 3}.
Community user gender: gender differences directly lead to differences in demand for special items. The male and female are marked as {1, 2}. Community user age: the age of the user directly reflects the user's experience and the different approach of dealing with people. In this paper, the age of community users is extracted by segmentation. We divided into nine age levels: age less than 10, age between 10 and 14, age between 15 and 18, age between 19 and 25, age between 26 and 35, age between 36 and 45, age between 46 and 55, age between 56 and 65, age more than 65 years. They are marked as {1, 2, 3, 4, 5, 6, 7, 8, 9}.
The degree of community users' education: There are differences in the educational level of the users in the community. Attitude towards life, ways of dealing with affairs and the life pursuit of users with different cultural backgrounds are not the same. In this paper, according to the academic qualifications, the educational level will be divided into seven levels: primary school, junior high school, high school, college, undergraduate, master, doctor. They are marked as {1, 2, 3, 4, 5, 6, 7}.
Community user Occupation: the user's occupation largely affected their living habits. Salary level also leads to the differences in the items which user needs. The occupations are marked as {1, 2, 3, ..., N }.

3.2.2.
Extraction of community factor. Combined with the application scenario of the intelligent community, the improved similarity calculation method is based on the Pearson similarity calculation method. Community factor is introduced into the improved similarity calculation. The steps of the improved similarity calculation method are as follows.
1. building community factor matrix C n×f based on community user space, n represents the number of users in user space. f stands for community factors.
2. using following formula to calculate the similarity of community factors.
Where sim 0 (u i , u j ) represents the community factor similarity between user u i and user u j . isSimilar (u i , u j ) /f is used to judge whether a community factor of two users is similar. f is the total number of community factors. 3. introducing the similarity of community factor into the Pearson similarity measurement formula.
4. Collaborative filtering recommendation algorithm based on K-means cluster integrating with community factor. Combined with the application scenario of the intelligent community, the community factor is introduced into the collaborative filtering recommendation algorithm, and is integrated into the Kmeans clustering algorithm [8,24]. This paper proposes a recommendation algorithm suitable for intelligent community scene: collaborative filtering recommendation algorithm based on K-means cluster integrating with community factors. The design process of the algorithm is shown in Figure 1.
1. Based on user rating data, the user-item rating matrix is constructed. 2. K users are randomly selected as initial group centroids in the user space covered by the rating matrix, marked as CC = {cc 1 , cc 2 , ..., cc j , cc k } Then we calculate the Euclidean distance between each user and the cluster centroids, and the formula for calculating the Euclidean distance is shown as follows.
Where d(u i , cc j ) represents the Euclidean distance between user u i and the cluster centroid cc j , r u i represents rating vector of the user u i for the item. r cc j represents rating vector of the cluster centroid cc j for the item. According to the idea that the smaller the Euclidean distance, the more similar the user rating, the user in the user space is clustered into the nearest centroid.
3. In each cluster, the average rating of each item is calculated, and a new cluster centroid is obtained. Repeat the clustering steps until the cluster centroid is no longer changed. The latest cluster centroid generates the k × m virtual user-item rating matrix R v . k represents the number of virtual users namely, the number of cluster centers, m represents the number of users.
4. Based on the virtual user-item rating matrix R v , we use the Pearson similarity measurement introduced community factors to calculate the user similarity, and the nearest neighbor set is constructed according to the similarity. Based on the user rating data of the nearest neighbor set, predict the rating of the target user to the new item, finally sort out the TOP-N recommended items. The item prediction rating P u,i is calculated by the following equation: Where P u,i represents the rating of target user u for the item i, R uj,i represents the rating of user u j for the item i, R u represents the average rating of user u for all items. R uj represents the average rating of user u j for all items.

Experimental evaluation.
5.1. Experimental data sets and experimental scheme. MovieLens data set is one of the most widely used data set in the recommendation algorithm experiment. The data set is a film rating data set formed by the GroupLens project team of American Minnesota university after years of investigation and collection. MovieLens data set is divided into three types: 100K, 1M and 10M by GroupLens project team. Each scale data set contains the details of user information, the details of the film information, the film item rating data, etc. Three scale data are open data set. This experiment will download the 100K standard MovieLens data set as a data sample for testing.
At present, the main measurement indexes of recommendation algorithms are prediction accuracy and classification accuracy. Prediction accuracy is a measure of the similarity between the predicted rating and the actual rating. The evaluation indicators of prediction accuracy include MAE, MSE, RMSE, etc. Classification accuracy is a measure of the percentage of users who really love the item that algorithm recommends. The indicators of classification accuracy include the accuracy of precision, recall rate etc. In this paper, we choose MAE and precision as the measure of the experiment. The K-means cluster algorithm is introduced in order to improve the speed of the recommendation algorithm, so we need to check the running time of the improved algorithm.
The MAE is used to measure the similarity between the predicted rating and the actual rating. The smaller the MAE value is, the higher the accuracy of the proposed algorithm is. the MAE formula is shown as follows.
Where p u,i represents the prediction rating for item i , r u,i represents the user actual rating for item i. I represents a collection of items used to predict scores.
The accuracy of Precision is used to measure the percentage of items that users prefer to be recommended. The formula is shown in formula. percision = N lr N lr + N dr (10) Where N lr represents the items that users really like in all recommended items. N dr represents the items that users dislike in all recommended items.
In order to compare and analyze the results of the later experiments, this paper takes 100K standard MovieLens data as samples, and the data selection scheme is as follows: In the MovieLens data set u.base, six sets of data are randomly selected as input data, namely , training set. They are :100 users × 100 movies,100 users × 200 movies, 300 users × 100 movies, 300 users × 200 movies, 500 users × 100 movies, 500 users × 200 movies. In the MovieLens data set u.test, 20 users were selected randomly as the target user, namely, test set. The number of cluster is 9 in accordance with user age classification number. The number of neighbor users in the nearest neighbor set is 10, N value of TOP-N recommended items is 10.
In this paper, the improved recommendation algorithm and the traditional userbased collaborative filtering recommendation algorithm are tested by using the Figure 2. MAE value of the recommendation algorithm above data selection scheme. The performance of the improved algorithm is analyzed by comparing MAE, Precision accuracy and computing speed of two algorithms.

5.2.
Experimental results. In this experiment, two algorithms are tested by using the experimental scheme. Experimental data of traditional Collaborative filtering recommendation algorithm and collaborative filtering algorithm integrating with community factor has been made a line chart with MAE value, the accuracy of precision and elapsed time. As the Figure2,3 and 4 show.
The smaller MAE value indicates the predicted results are closer to the fact. As the Figure 2 shows, both of MAE value of the two algorithms has experienced a growth with the number of input data increasing. This is because the data set increases, while the sparsity of user ratings is also increasing. However MAE value of K-means clustering collaborative filtering algorithm integrating with community factor increases slowly during the growth of data set. Especially when input data is 500×200, the MAE value and the growth rate of the two algorithms have the biggest difference. This is because the community-based K-means clustering collaborative filtering recommendation algorithm incorporate community factor information to improve the accuracy of user similarity calculations. According to Figure 3 , K-means clustering collaborative filtering algorithm integrating with community factor performs better than the traditional user-based collaborative filtering recommendation algorithm in terms of Precision accuracy. Joining the community factor makes the user similarity calculation method more comprehensive and improves the user rating density, so that user similarity calculation is more accurate and recommendation items are closer to actual results.
As the Figure 4 shows, the recommendation speed of K-means clustering collaborative filtering algorithm integrating with community factor are faster than traditional user-based collaborative filtering recommendation algorithm ,whatever the input is. With the increase of data set, the running time of the traditional algorithm is obviously increased. And improved K-means clustering algorithm integrating with community factors can generate clusters off-line. The nearest neighbor space  for online search is reduced to virtual user space. Virtual user spaces data quantity decreases dramatically, compared to the traditional algorithm which searches for the entire user space. The run time of improved algorithm are shortened overwhelmingly, which can efficiently solve the bottleneck problem on recommendation speed. 6. Conclusion. In order to offer personalized recommendation service to community residents in intelligent community , this paper proposes a K-means clustering collaborative filtering recommendation algorithm integrating with community factor. In the method of searching nearest neighbor user similarity, community factor is integrated into the algorithm, which can greatly improve the accuracy of user similarity computation. Considering community residents are relatively fixed, Kmeans clustering is introduced into the algorithm to improve the sparsity of the matrix and the recommendation speed. Finally, the superiority of the improved algorithm over the traditional collaborative filtering recommendation algorithm is verified by the MovieLens data set, compared with the traditional collaborative filtering recommendation algorithm.
Next, we are going to apply the algorithm to the intelligent community platform, and we hope to introduce feedback mechanism to realize the dynamic optimization of the recommendation algorithm according to the user's actual interest feedback to the recommendation results. At the same time, we hope that we can optimize the K-means clustering off-line computing in order to achieve higher efficiency of algorithm implementation.