Time aware topic based recommender system

News recommender systems efficiently handle the overwhelming number of news articles, simplify navigations, and retrieve relevant information. Many conventional news recommender systems use collaborative filtering to make recommendations based on the behavior of users in the system. In this approach, the introduction of new users or new items can cause the cold start problem, as there will be insufficient data on these new entries for the collaborative filtering to draw any inferences for new users or items. Content-based news recommender systems emerged to address the cold start problem. However, many content-based news recommender systems consider documents as a bag-of-words neglecting the hidden themes of the news articles. In this paper, we propose a news recommender system leveraging topic models and time spent on each article. We build an automated recommender system that is able to filter news articles and make recommendations based on users' preferences. We use topic models to identify the thematic structure of the corpus. These themes are incorporated into a content-based recommender system to filter news articles that contain themes that are of less interest to users and to recommend articles that are thematically similar to users' preferences. Our experimental studies show that utilizing topic modeling and spent time on a single article can outperform the state of the arts recommendation techniques. The resulting recommender system based on the proposed method is currently operational at The Globe and Mail (http://www.theglobeandmail.com/).

The increasing amount of electronic news articles requires better tools for searching, exploring, and organizing news article collections. Previously, news article were collected and stored in large text repositories and retrieved by a set of keywords. News article were seldom analyzed using their themes, because there were very few technologies to extract their thematic structures. Moreover, newspaper companies typically do not require users to subscribe and create their user profiles and users read news articles anonymously. Therefore, news recommender systems have to make recommendations without clear user profiles. In addition, many recommendation techniques face the cold start problem. This problem occurs when there is insufficient data to draw any inferences for new users or items [4].
To remedy the situation, in this paper, we design a news recommender system that eases reading and navigation through online newspapers. In essence, the recommender system acts as filters, delivering only news articles that can be considered relevant to a user. The resulting recommender system based on the proposed method is currently operational at The Globe and Mail. The Globe and Mail offers most authoritative news in Canada, featuring national and international news.
The major contributions of this paper are as follows: • Inferring users' profiles and predicting users' preferences by analyzing contents of large collections of news articles. • Designing a news recommender system based on both the content of news articles and users' time spent on each article. • Experimental studies on a news corpus and outperforming baseline recommendation approaches in terms of precision, accuracy, and recall. The structure of this paper is as follows. In Section 2, the related literature is reviewed. Section 3 describes main objectives of a news recommender system. Section 4 presents our proposed content-based news recommender system. In Section 5, we demonstrate the effectiveness of our approach through experiments. Section 6 concludes the paper and discusses future work.
2. Related work. All of the known recommender techniques have strengths and weaknesses. In this section, we briefly survey the different recommender techniques, the data that they support, and the algorithms they employ [5,6].
On this basis, the following three recommender techniques are distinguished: Collaborative filtering-based, Content-based, and Hybrid-based.
Collaborative filtering-based recommender systems make recommendations based on the behavior of other users in the system. Intuitively, these systems assume that if users agree about the quality of some items, then, they will likely agree about other items [9]. For example, if a group of users have similar tastes to Mary, then, Mary is likely to like the things the group likes which she hasn't seen yet. However, in this approach the introduction of new users or new items can cause the cold start problem, as there will be insufficient data on these new entries for the collaborative filtering to draw any inferences for new users or items. The system requires a substantial number of users to show interest to a new item before that item can be recommended [4,6]. Addressing the cold start problem can be important for a new user's engagement and is therefore of critical significance in trade applications.
Content-based recommender systems recommend items similar to items a user preferred in the past [1]. For example, a content-based news recommender system observes the collection of news articles a user prefers and reads frequently. Then, only the news articles that have a high degree of similarity to the user's read articles are recommended. The greatest strength of this approach is that it only considers the properties of an item, i.e. the content of news articles, and accordingly makes recommendations. Therefore, in this approach, once a new user is introduced to the system, as soon as they read their first article, the content-based recommender system starts by recommending articles similar to the read article. Thus, this approach does not cause the cold start problem mentioned in collaborative recommender systems. The weakness of this approach is that users are limited to being recommended news articles that are similar to their read history.
Hybrid recommender systems generate recommendations by combining the above two recommendation techniques, thus, maximizing the benefits and minimizing the disadvantages of them [1]. For example, a hybrid recommendation system that combines content-based and collaborative recommendation systems considers both the content of news articles and a user's demographic information to issue recommendations. Given the fact that this approach contains collaborative recommender systems, it contains the disadvantages of such systems. Therefore, this approach also suffers from the cold start problem.
Due to the textual nature of our news application domain and avoiding the cold start problem, we focus on content-based recommender systems. Most existing content-based news recommender systems are based on keywords that is they represent the content of news articles using a set of keywords neglecting the thematic structure of the articles. We apply topic models to discover hidden themes of the news articles, and we incorporate these themes into a content-based recommender system. The reasons we employ topic models in news recommender systems are as follows. Firstly, topic models yield great insight about different themes of a newspaper article. Secondly, topic models capture probabilities of assigning different themes to newspaper articles. Thirdly, topic models provide a generative probabilistic model for the themes. As a consequence, topic models accurately assign probabilities to an unseen document. Our experimental studies show that the proposed recommender system yields more accurate results than other counterparts.
3. Problem statement. News recommender systems arise to efficiently handle the overwhelming number of news articles, simplify navigations, and retrieve relevant information. Formally, the recommendation problem can be formulated as follows: Let U be the collection of |U| users, represented by U = {u 1 , u 2 , · · · , u |U | }, and let C = D∪Q represent all the news articles, where D, denoted by D = {d 1 , d 2 , · · · , d M }, is the collection of read articles that is all news articles that have been read by at least one user, and Q, denoted by Q = {q 1 , q 2 , · · · , q N }, is the collection of non-read articles that is all the latest articles published daily that have not yet been read and are to be recommended. Note that our news recommender system is capable of personalizing the collection of non-read articles (Q) for each user.
Let f be a utility function that measures the usefulness of a news article c ∈ C to a user u l ∈ U, i.e., f : U ×C → R, where R is a totally ordered set (e.g., non-negative integers or real numbers within a certain range). Then, for each user u l ∈ U, we want to choose such news article c ∈ C that maximizes the user's utility. More formally: In recommender systems, the sets U and C are usually defined by several characteristics [1]. Similarly, in our work, each user u l ∈ U is defined by a unique identifier, such as user ID. Each article in the collection C is defined by a unique article identifier and article content. In addition, we represent the utility of a news article by the amount of time a user spends on the article, which indicates the interestingness of the news article to the user. For example, user u 0 spent two minutes (out of five minutes 1 ) on the news article "d 0 : SpaceX launches fifth official mission". In our recommender system, the amount of time spent on the collection of nonread articles (Q) is not available. Thus, the fundamental issue of our recommender system is that the utility function f is not defined on the whole U × C space, but only on U × D space. This means f needs to be extrapolated to the space U × Q. Therefore, the goal of our news recommender system is to estimate the time each user would spend on the non-read news articles and issue appropriate recommendations based on these estimates. 4. The time aware topic based recommender system. In this section, we propose a time aware content-based news recommender system by employing Latent Dirichlet Allocation (LDA). LDA-based topic modeling approaches to measure the similarity between read news articles and non-read news articles. LDA-based approaches elicit a topic model from the collection of news articles. The topic model represents news articles as a multinomial distribution over topics, where each topic is a multinomial distribution over words. Then, given the time a user has spent on read news articles, and the topic models of the collection of news articles, a user's time spent toward non-read news articles is estimated. 4.1. LDA-based topic models. Latent Dirichlet Allocation (LDA), proposed by Blei et al. [3], is a generative probabilistic model for collections of discrete data such as text corpora. The basic idea is that documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words. LDA also assumes that a corpus is a collection of D documents. Let D = {w 1 , w 2 , · · · , w N } represent a corpus of length N , resulting from the concatenation of the D documents which contains N words in total, where each word w i belongs to a set of unique vocabulary words of size V 2 . LDA assumes that each word w i ∈ D is associated with a latent topic variable z i where i ∈ {1, 2, · · · , N }. Each of these topics t = 1 · · · K is associated with a multinomial Φ t over V vocabulary words, such that p(w i |z i = t) = Φ zi,wi . Each Φ t is generated from a Dirichlet distribution with prior β. Also, each document d is associated with a multinomial distribution Θ d over K topics, such that p(z i = t|d) = Θ d,zi , generated from a Dirichlet distribution with prior α. To discover the set of topics used in the corpus D, the objective is , that is the term distribution for each topic, and (2) to obtain an estimate of Θ, where Θ = { Θ d } D d=1 , that is the topic distribution for each document. LDA is one such model.
In LDA, each document d is generated by first drawing a distribution over K topics with parameters Θ d , generated from a Dirichlet distribution with prior α. The words in the document are then generated by drawing a topic z i = t from this distribution and then drawing a word w i from that topic according to a multinomial distribution with parameters Φ t generated from a Dirichlet distribution with prior β [3].
This procedure is a joint probability distribution over the random variables (D, z, Φ, Θ) given by [2] Note that words are the only observed variables. The hyperparameters α and β are input from the user. The latent topic assignments z, document distributions over topics Θ, and topic distributions over words Φ are all unobserved. Estimation of Θ and Φ requires computing the latent topic assignments z, p( z|D, α, β). Unfortunately, this posterior distribution is intractable due to the coupling between Φ and Θ [3]. However, Griffiths et al. [10,11] proposed to use Gibbs sampling to obtain approximate estimates for the latent variables as well as the posterior distributions. In this method, the parameter sets Θ and Φ can be integrated out because they can be interpreted as statistics of the associations between the observed w i and the corresponding z i [10,13].
With a set of samples from the posterior distributions Φ and Θ can be computed by integrating across the full set of samples. For any single sample we can estimate Θ d,t by where n is the total number of words from document d assigned to topic t and n (d) .
is the total number of words in document d. Similarly, Φ t,wi is estimated by where n (wi) t is the total number of times word w i is assigned to topic t and n (.) t is the total number of words assigned to topic t.

4.2.
The proposed algorithm. Our content-based recommender system employs probabilistic topic models to uncover the thematic similarity between news articles and a user's preferences. Then, news articles that have a high degree of thematic similarity to the user's preferences are recommended.
We assume a collection of users is represented by U = {u 0 , u 1 , · · · , u |U | }. Let the corpus of news articles be C = D ∪ Q, where D = {d 1 , d 2 , · · · , d M } is the collection of read articles, and Q = {q 1 , q 2 , · · · , q N } is the collection of non-read articles. We define a read article d i ∈ D as a tuple of textual content and a subset of readers. That is d i =< t i , U i >, where t i is the textual content, represented by a sequence of terms of the article and U i ⊂ U is a subset of users associated with the article. Similarly, a non-read article q j ∈ Q is defined by q j =< t j , ∅ >, where the set of readers is empty.
Our task is to appropriately recommend non-read articles to users or alternatively to assign users to non-read articles. In other words, for each non-read article q j =< t j , ∅ >, we plan to predict the most appropriate subset of users and replace it with the empty set (∅).
The proposed content-based news recommender system consists of the following three steps. 4.2.1. Building a topic model. In this step, we use LDA-based topic models to best reflect the thematic structure of news articles. We build a topic model from the collection of read articles (D). Our topic model assumes that each news article d i ∈ D has a multinomial distribution over K topics with parameters Θ di . As a result of this step, we obtain Θ D that is an M × K array of topic probabilities given read articles, where M is the total number of read articles and K is the total number of topics.

4.2.2.
Inference and learning. We use the topic model, built in Section 4.2.1, to infer the multinomial distribution of each non-read article (q j ∈ Q) over K topics with parameters Θ qj . As a result of this step, we obtain Θ Q that is an N × K array of topic probabilities given non-read articles, where N is the total number of non-read articles and K is the total number of topics.

Making recommendations.
For each user u l ∈ U, we obtain their collection of read articles D u l ⊂ D and their respective topic vectors Θ Du l . Given a collection of non-read articles Q, and their topic vectors Θ Q , our proposed method outputs a ranked list Q u l y = {q 0 , q 1 , · · · , q y }, where q r ∈ Q, of y non-read articles interesting to a user u l .
The probability of article q r being interesting to user u l is computed for each q r ∈ Q as p(q r |u l , Q, D u l ) = (4) InterestingnessScore(q r , u l , D u l ) qj ∈Q InterestingnessScore(q j , u l , D u l ) , InterestingnessScore(q r , u l , D u l ) calculates how interesting article q r is to user u l . This score can be any real non-negative number. DocSim(q r , d i , D u l ) measures the similarity between two articles, i.e. q r and d i , given a collection of read articles by user u l (D u l ) and returns a similarity measure ranging between [0, 1], and timeSpent[u l , d i ] is the amount of time user u l spends on article d i . We apply LDA-based topic model to compute the article similarity. We utilize two arrays Θ qr and Θ di , obtained from Sections 4.2.1 and 4.2.2, to determine the similarity between q r and d i . Arrays Θ qr and Θ di represent the latent topic distribution of articles q r and d i . Thus, inspired from Chang et al. [8], we view each article as a topic-based vector and use cosine-based similarity measure to compute the similarity between a read and a non-read article. Note that our experimental studies show similar results for other similarity measure approaches, such as Manhattan distance. A comprehensive survey on similarity measures between vectors can be found at [7].
Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The more similar hence the more co-oriented the vectors, thus the cosine of the angle between them is closer to one. Cosine similarity measure is often used to compare documents for text mining, classification, and clustering purposes [7]. Equation 6 is used to calculate the similarity.
where "·" denotes the inner product of two vectors, and | x| represents the size of the vector. Finally, we return top y articles ranked by the p(q r |u l , Q, D u l ) probability. We compare the performance of our proposed content-based recommender system against the following baseline recommendation systems: 5.1. A tfidf-based content-based recommendation system. In this recommendation system solely bag-of-words tfidf representation of news articles is used. Term frequency-inverse document frequency (tfidf) [15] is a statistical measure that increases proportionally to the frequency of a term in a document but lessens by the frequency of the term among documents in the corpus. The tf idf score of a term t in document d, represented by tfidf(t,d), is defined as

Experiments. We conducted experiments on
where tf t,d measures the ratio of the number of times term t appears in document d to the total number of terms in document d, and M is the total number of documents in a corpus, and df t is the number of documents containing term t.

Item popularity.
In this method items are ranked based on the spent time on each article as a popularity measure. In fact, the results based on popularity are not personalized but are used in many research [12] to show the effectiveness of methods.

5.3.
Item-to-Item collaborative filtering (ItemKNN). Item-to-Item collaborative filtering method has been commercially used by Aamzon [18]. Each article is represented by a vector of users on which they have spent time, then cosine measure is utilized to assess the similarity among articles. We tested the method with different number of neighbors and found 80 is the best. ItemKNN using Binary feature ignores the amount of time spent on each article. Instead, this method represents each article by a vector of users who have or have not read the article.

5.4.
User-to-user collaborative filtering (UserKNN). User-to-User collaborative filtering is a classical collaborating filtering method [24]. This method is similar to ItemKNN where each user is represented by a vector of articles she has read and similarities are computed among the users (rather than items). In this method, we used the same setting as the ItemKNN. UserKNN using Binary feature, similar to ItemKNN using Binary feature, ignores the amount of time spent on each article.

5.5.
Non-negative matrix factorization (NMF). The method is based on Nonnegative Matrix factorization [17] where the user-article matrix is factorized into two matrices with the property that all matrices have no negative value. Compared to traditional matrix factorization, the result of this method is interpretable and are more proper for the ranking tasks in recommendation. For this method, we set number of factors to 30 as increasing it had no significant effect on the result. 5.6. Content-based recommender system using document embedding. Document Embedding learns a vector-space representation of the terms of a document by exploiting a two-layer neural network [21,16]. The architecture of the model that is used for training document embedding is the distributed memory phrase vector. When the model is trained on a dataset of documents, it tunes the word vectors and document vectors according to stochastic gradient descent optimization. In our experiment, the model is trained on a corpus of 73, 000 news articles. The learned document vectors are extracted from the model and used to determine the similarity of two articles regarding the cosine similarity of the vectors.

5.7.
LDA-based recommendation system. LDA-based recommendation system is explained in Section 4.2. The topic models were trained with 1000 iterations of Gibbs sampling [10,11] used in the MALLET [20]. Initial values for the hyperparameters α and β applied to all our experiments are α = 50.0/K and β = 0.01. Note that these parameters are default parameters of most LDA-based topic models, expected to result in a fine-grained decomposition of the corpus into topics [11]. The optimum number of topics is expected to result in a fine-grained decomposition of the corpus into topics [11], where topic distributions over words are of minimum similarity. Furthermore, the optimum number of topics leads to a low cross-entropy between the term distribution learned by the topic model and the distribution of terms in an unseen test article. Thus, the optimum number of topics results in a lower perplexity score indicating that the model is better in predicting distribution of the test article [3].
In our experiments, we learn topics for different values of K and choose the value which minimizes the perplexity score. The experiments are conducted using different topic models for different number of topics K, where K = 20 · · · K = 300. Figure 1 illustrates the average perplexity as a function of number of K. In this figure, the values of K ∈ [180 · · · 190] achieve the best performance in terms of perplexity.
As mentioned earlier, a topic model generates K topics, where each topic is a distribution over V words, denoted by Φ k = {w 1 , w 2 , · · · , w V }. Similarity between topics is the similarity of topic distributions over words across different topics. We calculate the normalized average sum of similarity scores between every pair of K topics (K ∈ [180 · · · 190]), generated from The Globe and Mail corpus. As illustrated  in Figure 2, K = 187 results in the most fine-grained decomposition of the corpus into topics with the minimum similarity between topic-word distributions. 5.8. Evaluation of the recommender system. In this section, we evaluate the performance of our proposed content-based news recommender system using the following metrics: precision, recall, and F-measure.
Precision, recall, and F-measure are well-known evaluation metrics in information retrieval literature [19]. For each user, we use the original set of read articles as the ground truth T g . Assume that the set of recommended news articles are T r , so that the correctly recommended articles are T g ∩ T r . Precision, recall, and F-measure are defined as follows:  Figure 3. Precision of the proposed recommender system as a function of number of recommended articles, using the following recommendation systems: bag-of-words with tfidf, Item popularity, Item-to-Item collaborative filtering (ItemKNN), User-to-user collaborative filtering (UserKNN), Non-negative matrix factorization (NMF), Content-based recommender system using document embedding, and LDA on The Globe and Mail corpus.
In our experiments, the number of recommended articles ranges from 1 to 30. Figures 3, 4, and 5 illustrate the precision, recall, and F-measure of the proposed recommender system as a function of number of recommended articles.
Empirical comparisons show that using topic models to represent articles improves the precision, recall, and F-measure. Since the only difference between the comparisons is the article similarity function DocSim(q r , d i , D u l ), which compares the similarity between a new non-read article q r and a read article d i , analyzing the differences between the two article similarity measures provides explanation about the performance difference.
The bag-of-words with tfidf approach represents two articles by tfidf vectors. Then, the cosine similarity between these vectors are computed and used in the recommendation system. Generally speaking, the tfidf article similarity measures the quantity of term overlap, where each term has a different weight, in the two   Figure 4. Recall of the proposed recommender system as a function of number of recommended articles, using the following recommendation systems: bag-of-words with tfidf, Item popularity, Item-to-Item collaborative filtering (ItemKNN), User-to-user collaborative filtering (UserKNN), Non-negative matrix factorization (NMF), Content-based recommender system using document embedding, and LDA on The Globe and Mail corpus.
articles [25]. This approach ignores the thematic structures of articles to perform the similarity measure. The LDA-based approaches first generate a set of topic vectors for the articles, each of which is represented by a distribution over terms. Terms in each topic are semantically coherent. Then, LDA-based recommender systems measure the cosine similarity between the topic vectors. Generally speaking, using LDA-based topic vectors quantifies the topic similarity between the two articles. These vectors yield a higher precision, recall, and F-measure than when using tfidf or document embedding vectors. Key to this improvement is incorporating thematic structure of news articles into the recommendation system. This leads to better estimates for topic similarity between two articles.
Hence we recommend using topic models to represent articles for time aware content-based news recommender systems.
6. Conclusions. This paper presents a time aware topic based recommender system for The Globe and Mail, a company that offers most authoritative news in Canada, featuring national and international news. One of the important problems of The Globe and Mail newswire is the growing amount of articles, which in   Figure 5. F-measure of the proposed recommender system as a function of number of recommended articles, using the following recommendation systems: bag-of-words with tfidf, Item popularity, Item-to-Item collaborative filtering (ItemKNN), User-to-user collaborative filtering (UserKNN), Non-negative matrix factorization (NMF), Content-based recommender system using document embedding, and LDA on The Globe and Mail corpus. turn demands a system to automatically filter and deliver the content according to readers' preferences. Furthermore, in the collaborative-filtering-based recommender system at The Globe and Mail, the introduction of new news articles can cause the cold start problem, as there will be insufficient data on these new entries for the collaborative filtering to work accurately. We propose to utilize the latent Dirichlet allocation (LDA) model to discover hidden themes of the news articles. We incorporate these themes into a contentbased recommender system. Our experimental studies show that the proposed recommendation system yields better results than solely bag-of-words with tfidf presentation. Moreover, given the fact that our recommender system considers the content of news articles to make recommendations, introducing a new news article does not cause the cold start problem.
Applying topic models in a content-based recommender system yields more accurate results than other recommender systems. However, our content-based recommender system must effectively evolve with its content. In our current system, the topic model needs to be generated offline. For instance, once non-read news articles enter the collection of read articles, the topic model needs to be updated to reflect the themes of new articles. This offline generation of a topic model is a drawback, as it hinders the system's ability to evolve quickly. We could develop a real-time content-based recommender system, that leverages a stream of news articles and is capable of handling online LDA [14] a