Information Diffusion in Social Sensing

Statistical inference using social sensors is an area that has witnessed remarkable progress in the last decade. It is relevant in a variety of applications including localizing events for targeted advertising, mar- keting, localization of natural disasters and predicting sentiment of investors in financial markets. This paper presents a tutorial description of three important aspects of sensing-based information diffusion in social networks from a communications/signal processing perspective. First, diffusion models for information exchange in large scale social networks together with social sensing via social media networks such as Twitter is considered. Second, Bayesian social learning models in online reputation systems are presented. Finally, the principle of revealed preferences arising in micro-economics theory is used to parse datasets to determine if social sensors are utility maximizers and then determine their utility functions. All three topics are explained in the context of actual experimental datasets from health networks, social media and psychological experiments. Also, algorithms are given that exploit the above models to infer underlying events based on social sensing. The overview, insights, models and algorithms presented in this paper stem from recent developments in computer-science, economics, psychology and electrical engineering.


I. INTRODUCTION AND MOTIVATION
Humans can be viewed as social sensors that interact over a social network to provide information about their environment.Examples of information produced by such social sensors include Twitter posts, Facebook status updates, and ratings on online reputation systems like Yelp and Tripadvisor.Social sensors go beyond physical sensors -for example, user opinions/ratings (such as the quality of a restaurant) are available on Tripadvisor but are difficult to measure via physical sensors.Similarly, V. Krishnamurthy (e-mail: vikramk@ece.ubc.ca) and W. Hoiles (email whoiles@eceubc.ca)are with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, V6T 1Z4, Canada.
This research was supported by the Canada Research Chairs program, Natural Sciences and Engineering Research Council of Canada and Social Sciences and Humanities Research Council of Canada.future situations revealed by the Facebook status of a user are impossible to predict using physical sensors [1].
Statistical inference using social sensors is an area that has witnessed remarkable progress in the last decade.It is relevant in a variety of applications including localizing special events for targeted advertising [2], [3], marketing [4], [5], localization of natural disasters [6], and predicting sentiment of investors in financial markets [7], [8].For example, [9] reports sthat models built from the rate of tweets for particular products can outperform market-based predictors.

A. Context: Why social sensors?
Social sensors present unique challenges from a statistical estimation point of view.First, social sensors interact with and influence other social sensors.For example, ratings posted on online reputation systems strongly influence the behavior of individuals. 1 Such interacting sensing can result in non-standard information patterns due to correlations introduced by the structure of the underlying social network.Thus certain events "go viral" [5], [12].Second, due to privacy concerns and timeconstraints, social sensors typically do not reveal raw observations of the underlying state of nature.Instead, they reveal their decisions (ratings, recommendations, votes) which can be viewed as a low resolution (quantized) function of their raw measurements and interactions with other social sensors.This can result in misinformation propagation, herding and information cascades.Third, the response of a social sensors may not be consistent with that of an utility maximizer.
Social sensors are enabled by technological networks.Indeed, social media sites that support interpersonal communication and collaboration using Internet-based social network platforms, are growing rapidly.The following excerpt from [13] illustrates the increasing importance of social sensors in marketing: • "53% of people on Twitter recommend companies and/or products in their tweets, with 48% of them delivering on their intention to buy the product.
(ROI Research for Performance, June 2010) • The average consumer mentions specific brands over 90 times per week in conversations with friends, family, and co-workers.(Keller Fay, WOMMA, 2010) • Consumer reviews are significantly more trustednearly 12 times more-than descriptions that come from manufacturers, according to a survey of US mom Internet users by online video review site EXPO.(eMarketer, February 2010) • In a study conducted by social networking site myYearbook, 81% of respondents said they had received advice from friends and followers relating to a product purchase through a social site; 74% of those who received such advice found it to be influential in their decision.(ClickZ, January 2010) McKinsey estimates that the economic impact of social media on business is potentially greater than $1 trillion since social media facilitates efficient communication and collaboration within and across organizations.

B. Main Results and Organization
There is strong motivation to construct models that facilitate understanding the dynamics of information flow in social networks.This paper presents a tutorial description of three important aspects of sensing-based information diffusion in social networks from a communications/signal processing perspective: 1) Information Diffusion in Large Scale Social Networks: The first topic considered in this paper (Sec.II) is diffusion of information in social networks comprised of a population of interacting social sensors.The states of sensors evolve over time as a probabilistic function of the states of their neighbors and an underlying target process.Several recent papers investigate such information diffusion in real-world social networks.Motivated by marketing applications, [14] studies diffusion (contagion) behavior in Facebook.Using data from 260,000 Facebook pages (which advertise products, services and celebrities), [14] analyzes information diffusion.In [15], the spread of hashtags on Twitter is studied.There is a wide range of social phenomena such as diffusion of technological innovations, sentiment, cultural fads, and economic conventions [16], [17] where individual decisions are influenced by the decisions of others.
We consider the so called Susceptible-Infected-Susceptible (SIS) model [18] for information diffusion in a social network.It is shown for social networks comprised of a large number of agents how the dynamics of degree distribution can be approximated by the mean field dynamics.Mean field dynamics have been studied in [19] and applied to social networks in [17] and leads to a tractable model for the dynamics social sensors.
We demonstrate using influenza datasets from the U.S Centers for Disease Control and Prevention (CDC) how Twitter can be used as a real time social sensor for tracking the spread of influenza.That is, a health network (namely, Influenza-like Illness Surveillance Network (ILInet)) is sensed by a real time microblogging social media network (namely, Twitter).
We also review two recent methods for sampling social networks, namely, social sampling and respondentdriven sampling.Respondent-driven sampling is now used by the U.S. Centers for Disease Control and Prevention (CDC) as part of the National HIV Behavioral Surveillance System in health networks.
2) Bayesian Social Learning in Online Reputation Systems: The second topic of this paper (Sec.III) considers online reputation systems where individuals make recommendations based on their private observations and recommendations of friends.Such interaction of individuals and their social influence is modelled as Bayesian social learning [20], [21], [16] on a directed acyclic graph.Data incest (misinformation propagation) arises as a result of correlations in recommendations due to the intersection of multiple paths in the information exchange graph.Necessary and sufficient conditions are given on the structure of information exchange graph to mitigate data incest.Experimental results on human subjects are presented to illustrate the effect of social influence and data incest on decision making.
The setup differs from classical signal processing where sensors use noisy observations to compute estimates -in social learning agents use noisy observations together with decisions made by previous agents, to estimate the underlying state of nature.Social learning has been used widely in economics, marketing, political science and sociology to model the behavior of financial markets, crowds, social groups and social networks; see [20], [21], [22], [16], [23], [24] and numerous references therein.Related models have been studied in the context of sequential decision making in information theory [25], [26] and statistical signal processing [27], [28] in the electrical engineering literature.Social learning can result in unusual behavior such as herding [21] where agents eventually choosing the same action irrespective of their private observations.As a result, the actions contain no information about the private observations and so the Bayesian estimate of the underlying random variable freezes.Such behavior can be undesirable, particularly if individuals herd and make incorrect decisions.
3) Revealed Preferences and Detection of Utility Maximizers: The third topic considered in this paper (Sec.IV) is the principle of revealed preferences arising in microeconomics.It is used as a constructive test to determine: Are social sensors utility optimizers in their response to external influence?The key question considered is as follows: Given a time-series of data D = {(p t , x t ), t ∈ {1, 2, . . ., T }} where p t ∈ R m denotes the external influence, x t denotes the response of an agent, is it possible to detect if the agent is a utility maximizer?
These issues are fundamentally different to the modelcentric theme used in the telecommunications literature where one postulates an objective function (typically convex) and then proposes optimization algorithms.In contrast the revealed preference approach is data centric -given a dataset, we wish to determine if is consistent with utility maximization.
We present a remarkable result called Afriat's theorem [29], [30] which provides a necessary and sufficient condition for a finite dataset D to have originated from a utility maximizer.Also a multi-agent version of Afriat's theorem is presented to determine if the dataset generated by multiple agents is consistent with playing from the equilibrium of a potential game.
Unlike model centric applications of game theory in telecommunications, our revealed preferences approach is data centric: 1) Given a time series dataset of probe and response signals, how can one detect if the response signals are consistent with a Nash equilibrium generated by players in a concave potential game? 2) If consistent with a concave potential game, how can the utility function of the players be estimated?
We present three datasets involving social sensors to illustrate Afriat's theorem of revealed preferences.These datasets are: (i) an auction conducted by undergraduate students at Princeton University, (ii) aggregate power consumption in the electricity market of Ontario province and (iii) Twitter dataset for specific hashtags.
Varian (chief economist at Google) has written several influential papers on Afriat's theorem in the economics literature.These include measuring the welfare effect of price discrimination [31], analysing the relationship between prices of broadband Internet access and time of use service [32], and ad auctions for advertisement position placement on page search results from Google [32], [33].Despite widespread use in economics, revealed preference theory is relatively unknown in the electrical engineering literature.

C. Perspective
The unifying theme that underpins the three topics in this paper stems from predicting global behavior given local behavior: individual social sensors interact with other sensors and we are interested in understanding the behavior of the entire network.Information diffusion, social learning and revealed preferences are important issues for social sensors.We treat these issues in a highly stylized manner so as to provide easy accessibility to a communications/signal processing audience.The underlying tools used in this paper are widely used in signal processing, control, information theory and network communications.
Social Sensors vs Cognitive Radio: There are interesting parallels between social sensors (considered in this paper) and spectrum sensing in cognitive radio.
Cognitive radios [34] scan spectrum, determine spectral gaps, and then collaborate with other cognitive radios to determine the best spectrum gap to transmit in.A social sensor uses cognition to choose the most relevant content in a social media; that is, it scans social media (termed diversity spectrum in [35]), then interacts with other users to determine the most relevant information, and then broadcasts it in the social media.Of course, with social sensors there is the additional complexity that the information communicated diffuses and influences other social sensors to re-post that content; this can result in a cascading behavior [36].
Books and Tutorials: The literature in social learning, information diffusion and revealed preferences is extensive.In each of the following sections, we provide a brief review of relevant works.Seminal books in social networks, social learning and network science include [37], [38], [16], [39].There is a growing literature dealing with the interplay of technological and social networks [40].Social networks overlaid on technological networks account for a significant fraction of Internet use.As discussed in [40], three key aspects that cut across social and technological networks are the emergence of global coordination through local actions, resource sharing models and the wisdom of crowds.These themes are addressed in the current paper in the context of social learning, diffusion and revealed preferences.Other tutorials include [41], [42].

II. INFORMATION DIFFUSION IN LARGE SCALE SOCIAL NETWORKS
This section addresses the first topic of the paper, namely, information diffusion models and their mean field dynamics in social networks.The setting is as follows: The states of individual nodes in the social network evolve over time as a probabilistic function of the states of their neighbors and an underlying target process (state of nature).The underlying target process can be viewed as the market conditions or competing technologies that evolve with time and affect the information diffusion.The nodes in the social network are sampled randomly to determine their state.As the adoption of the new technology diffuses through the network, its effect is observed via sentiments (such as tweets) of these selected members of the population.These selected nodes act as social sensors.In signal processing terms, the underlying target process can be viewed as a signal, and the social network can be viewed as a sensor.The key difference compared to classical sensing is that the sensor now is a social network with diffusion dynamics and noisy measurements (due to sampling nodes).
As described in Sec.I, a wide range of social phenomena such as diffusion of technological innovations, cultural fads, ideas, behaviors, trends and economic conventions [43], [44], [45], [16] can be modelled by diffusion in social networks.Another important application is sentiment analysis (opinion mining) where the spread of opinions amongst people is monitored via social media.Motivated by the above setting, this section proceeds as follows: 1) We describe the Susceptible-Infected-Susceptible (SIS) model for diffusion of information in social networks which has been extensively studied in [38], [46], [17], [18], [37].2) Next, it is shown how the dynamics of the infected degree distribution of the social network can be approximated by the mean field dynamics.The mean field dynamics state that as the number of agents in the social network goes to infinity, the dynamics of the infected degree distribution converges to that of an ordinary differential (or difference) equation.and ILInet at a global level) and Twitter is used as a social sensor to monitor the spread of the influenza.4) Finally, this section also describes how social networks can be sampled.We review two recent methods for sampling social networks, namely, social sampling and respondent-driven sampling; the latter being used in health networks.The aim is to estimate the underlying target state that is being sensed by the social network and also and the state probabilities of the nodes by sampling measurements at nodes in the social network.In a Bayesian estimation context, this is equivalent to a filtering problem involving estimation of the state of a prohibitively large scale Markov chain in noise.The mean field dynamics yields a tractable approximation with provable bounds for the information diffusion.Such mean field dynamics have been studied in [19] and applied to social networks in [46], [17], [37].For an excellent recent exposition of interacting particle systems comprising of agents each with a finite state space, see [47], where the more apt term "Finite Markov Information Exchange (FMIE) process" is used.
Regarding real datasets, in addition to the case study presented below, for other examples of diffusion datasets and their analysis see [14], [15].A repository of social network datasets can be obtained at [48].

A. Social Network Model
A social network is modelled as a graph with N vertices: G = (V, E), where V = {1, 2, . . ., N }, and E ⊆ V ×V.
(1) Here, V denotes the finite set of vertices, and E denotes the set of edges.In social networks, it is customary to use the terminology network, nodes and links for graph, vertices and edges, respectively.
We use the notation (m, n) to refer to a link between node m and n.The network may be undirected in which case (m, n) ∈ E implies (n, m) ∈ E. In undirected graphs, to simplify notation, we use the notation m, n to denote the undirected link between node n and m.If the graph is directed, then (m, n) ∈ E does not imply that (n, m) ∈ E. We will assume that self loops (reflexive links) of the form i, i are excluded from E.
An important parameter of a social network G = (V, E) is the connectivity of its nodes.Let N (m) and D (m) denote the neighbourhood set and degree (or connectivity) of a node m ∈ V , respectively.That is, with | • | denoting cardinality, For convenience, we assume that the maximum degree of the network is uniformly bounded by some fixed integer D.
Let N (d) denote the number of nodes with degree d, and let the degree distribution P (d) specify the fraction of nodes with degree d.That is, for d = 0, 1, . . ., D, Here, I {•} denotes the indicator function.Note that d P (d) = 1.The degree distribution can be viewed as the probability that a node selected randomly with uniform distribution on V has a connectivity d.
Random graphs generated to have a degree distribution P that is Poisson were formulated by Erdös and Renyi [49].Several recent works show that large scale social networks are characterized by connectivity distributions that are different to Poisson distributions.For example, the internet, www have a power law connectivity distribution P (d) ∝ d −γ , where γ ranges between 2 and 3.Such scale free networks are studied in [50].In the rest of this chapter, we assume that the degree distribution of the social network is arbitrary but known-allowing an arbitrary degree distribution facilities modelling complex networks.
Let k = 0, 1, . . .denote discrete time.Assume the target process s is a finite state Markov chain with transition probability A ss = P (s k+1 = s |s k = s) . (3) In the example of technology diffusion, the target process can denote the availability of competition or market forces that determine whether a node adopts the technology.In the model below, the target state will affect the probability that an agent adopts the new technology.

B. SIS Diffusion Model for Information in Social Network
The model we present below for the diffusion of information in the social network is called the Susceptible-Infected-Susceptible (SIS) model [18], [37].The diffusion of information is modelled by the time evolution of the state of individual nodes in the network.Let x (m) k ∈ {0, 1} denote the state at time k of each node m in the social network.Here, x (m) k = 0 if the agent at time k is susceptible and x (m) k = 1 if the agent is infected.At time k, the state vector of the N nodes is Assume that the process x evolves as a discrete time Markov process with transition law depending on the target state s.If node m has degree D (m) = d, then the probability of node m switching from state i to j is Here, A (m) k denotes the number of infected neighbors of node m at time k.That is, In words, the transition probability of an agent depends on its degree distribution and the number of active neighbors.
With the above probabilistic model, we are interested in modelling the evolution of infected agents over time.Let ρ k (d) denote the fraction of infected nodes at each time k with degree d.We call ρ the infected node distribution.So with d = 0, 1, . . ., D, The SIS model assumes that the infection spreads according to the following dynamics: 1) At each time instant k, a single agent, denoted by m, amongst the N agents is chosen uniformly.Therefore, the probability that the chosen agent m is infected and of degree d is ρ k (d) P (d).The probability that the chosen agent m is susceptible and of degree d is (1 − ρ k (d)) P (d).2) Depending on whether its state x (m) k is infected or susceptible, the state of agent m evolves according to the transition probabilities specified in (5).With the Markov chain transition dynamics of individual agents specified above, it is clear that the infected distribution ρ k = ρ k (1), . . ., ρ k ( D) is an D d=1 N (d) state Markov chain.Indeed, given ρ k (d), due to the infection dynamics specified above Our aim below is to specify the transition probabilities of the Markov chain ρ.Let us start with the following statistic that forms a convenient parametrization of the transition probabilities.Given the infected node distribution ρ k at time k, define θ(ρ k ) as the probability that at time k a uniformly sampled link in the network points to an infected node.We call θ(ρ k ) as the infected link probability.Clearly In terms of the infected link probabilities, the scaled transition probabilities 2 of the process ρ are: p01(d, a, s) P(a out of l neighbours infected) In the above, the notation θ k is the short form for θ(ρ k ).The transition probabilities p01 and p10 defined 2 The transition probabilities are scaled by the degree distribution P (d) for notational convenience.Indeed, since N (d) = N P (d), by using these scaled probabilities we can express the dynamics of the process ρ in terms of the same-step size 1/N as described in Theorem 2.1.Throughout this chapter, we assume that the degree distribution P (d), d ∈ {1, 2, . . ., D}, is uniformly bounded away from zero.That is, min d P (d) > for some positive constant .
above model the diffusion of information about the target state s over the social network.We have the following martingale representation theorem for the evolution of Markov process ρ.
Theorem 2.1: For d = 1, 2, . . ., D, the infected distributions evolve as where w is a martingale increment process, that is E{w k+1 |F k } = 0. Recall s is the finite state Markov chain that models the target process.
The above theorem is a well-known martingale representation of a Markov chain [51]-it says that a discrete time Markov process can be obtained by discrete time filtering of a martingale increment process.The theorem implies that the infected distribution dynamics resemble what is commonly called a stochastic approximation (adaptive filtering) algorithm in statistical signal processing: the new estimate is the old estimate plus a noisy update (the "noise" being a martingale increment) that is weighed by a small step size 1/N when N is large.Subsequently, we will exploit the structure in Theorem 2.1 to devise a mean field dynamics model which has a state of dimension D. This is to be compared with the intractable state dimension

C. Mean Field Dynamics of Information Diffusion
The mean field dynamics state that as the number of agents N grows to infinity, the dynamics of the infected distribution ρ, described by (11), in the social network evolves according to the following deterministic difference equation that is modulated by a Markov chain that depends on the target state evolution s: That the above mean field dynamics follow from (11) is intuitive.Such averaging results are well known in the adaptive filtering community where they are deployed to analyze the convergence of adaptive filters.The difference here is that the limit mean field dynamics are not deterministic but Markov modulated.Moreover, the mean field dynamics here constitute a model for information diffusion, rather than the asymptotic behavior of an adaptive filtering algorithm.As mentioned earlier, from an engineering point of view, the mean field dynamics yield a tractable model for estimation.
We then have the following exponential bound result for the error of the mean field dynamics approximation.
Theorem 2.2: For a discrete time horizon of T points, the deviation between the mean field dynamics ρk in (13) and actual infected distribution in ρ k (11) satisfies The proof of the above theorem follows from [19, Lemma 1].Actually in [19] the mean field dynamics are presented in continuous time as a system of ordinary differential equations.The exponential bound follows from an application of the Azuma-Hoeffding inequality.The above theorem provides an exponential bound (in terms of the number of agents N ) for the probability of deviation of the sample path of the infected distribution from the mean field dynamics for any finite time interval T .
The stochastic approximation and adaptive filtering literature [52], [53] has several averaging analysis methods for recursions of the form (11).The well studied mean square error analysis [52], [53] computes bounds on E ρk − ρ k 2 instead of the maximum deviation in Theorem 2.2.A mean square error analysis of estimating a Markov modulated empirical distribution is given in [54].Such mean square analysis assume a finite but small step size 1/N in (11).
Numerical Example: We simulate the diffusion of information through a network comprising of N = 100 nodes (with maximum degree D = 17).It is assumed that at time k = 0, 5% of nodes are infected.The mean field dynamics model is investigated in terms of the infected link probability (9).The infected link probability θ(ρ k ) is computed using (13).
Assume each agent is a myopic optimizer and, hence, chooses to adopt the technology only if c (m) ≤ A tion U [0, C(s k )].Therefore, the transition probabilities in (5) are The probability that a product fails is p F = 0.3, i.e., The infected link probabilities obtained from network simulation (9) and from the discrete-time mean field dynamics model (13) are illustrated in Figure 2. To illustrate that the infected link probability computed from ( 13) follows the true one (obtained by network simulation), we assume that the value of C jumps from 1 to 10 at time k = 200, and from 10 to 1 at time k = 500.As can be seen in Figure 2, the mean field dynamics provide an excellent approximation to the true infected distribution.

D. Example: Social Sensing of Influenza using Twitter
In this section, we utilize datasets from 3 different social networks (namely, (i) Harvard college social network, (ii) influenza datasets from the U.S Centers for Disease Control and Prevention (CDC) and (iii) Twitter, to show how Twitter can be used as a real time social sensor for detecting outbreaks of influenza.

1) Twitter as a social sensor:
A key advantage of using social media for rapid sensing of disease outbreaks in health networks is that it is low cost and provides rapid results compared with traditional techniques.For example, CDC must contact thousands of hospitals to query the data which causes a reporting lag of approximately one to two weeks [55].Using real time microblogging platforms such as Twitter for disease detection has several advantages: the tweets are publicly available, high tweet posting frequency users often provide meta-data (i.e.city, gender, age), and Twitter contains a diverse set of users [55].
Several papers have considered using Twitter data for estimating influenza infection rates.In [56], [57] support vector regression supervised learning algorithms is used to relate the volume of Twitter posts that contain specific words (i.e.flu, swine, influenza) to the number of confirmed influenza cases in the U.S. as reported by the CDC.Multiple linear regression [58], [59], and unsupervised Bayesian algorithms [60] have been used to relate the number of tweets of specific words to the influenza rate reported by the CDC.The detection algorithms [56], [58], [60] do not consider the dynamics of the disease propagation and the dynamics of information diffusion in the Twitter network.To reduce the effect of information diffusion in the network, [61] proposes a support vector machine (SVM) classifier to detect: a) if the tweet indicates the users awareness of influenza or indicates the user is infected, and b) if the influenza reference is in reference to another person.The classified tweets are then used to train a multiple linear regression model.To account for the diffusion dynamics of Twitter [62], [63] utilize an Autoregressive with Exogenous input (ARX) model.The exogenous input is the number of unique Twitter users with influenza related tweets, and the output is the number of infected users as reported by the CDC.
If the social network is known then the influenza spread can be formulated in terms of the diffusion model (11).Given the U.S. population of several hundred millions, it is reasonable to adopt the mean field dynamics (13).With the influenza infection rate modelled using (13), the results can be used as an exogenous input to an ARX or Nonlinear ARX (NARX) models to predict the volume of Twitter messages related to influenza as illustrated in Fig. 1.In this framework, the Twitter messages are used to validate the underlying propagation model of influenza of use for predicting the infection rate and outbreak detection.
2) Social Network Influenza Dataset: We consider the dataset [64] obtained from a social network of 744 un-dergraduate students from Harvard College.The health of the 744 students was monitored from September 1, 2009 to December 31, 2009 and was reported by the university Health Services.To construct the social network, students were presented with a background questionnaire.In the questionnaire students are asked: "Please provide the contact information for 2-3 Harvard College students who you know and who you think would like to participate in this study", and "...provide us with the names and contact information of 2-3 of your friends...".This information was used to construct the degree distribution and links of the social network.A movie containing the spread of the influenza in the 744 college students over the 122 day sampling period can be viewed as the Youtube video titled "Social Network Sensors for Early Detection of Contagious Outbreaks" at http://www.youtube.com/watch?v= 0TD06g2m8qM.Fig. 3(a-c) display 3 illustrative snapshots from this video; red nodes denote infected students while yellow nodes depict their neighbors in the social network.
3) Models for Influenza Diffusion: From the data in the youtube video for the Harvard students, we observed the following regarding the transition probabilities p ij (d, A, s) defined in (5).As expected, students with a larger number of infected neighbors A contract influenza sooner.The data shows that the transition probabilities were approximately independent of the degree of the node d.Since the data provided was during an actual influenza outbreak we set the target state of the network (i.e.s) constant.Therefore the transition probabilities depend only on the number of infected neighbours and were estimated as That is, the dataset reveals that the probability of getting infected given a = 2 infected neighbors is substantially higher than with a = 1 infected neighbor, as expected.The estimated infected link probability θ k in (9) versus time (days) k is displayed in Fig. 3(d).Recall from Sec.II-C that the infected link probability θ k is related to the mean field dynamics equation ( 13).This allows the transition probabilities and θ k to be used to predict the infection rate dynamics.
Other graph-theoretic measures also play a role in the analysis of the diffusion.Students with high kcoreness 3 are expected to contract influenza earlier.Additionally, students that have high betweenness centrality (i.e.number of shortest paths from all students to all others that pass through that student) contract influenza earlier then students with low betweenness centrality.These observations show that the diffusion of influenza in the network depends strongly on the underlying health network structure.The dynamic model (7) accounts for the effects of the degree of nodes, however to account for the effects from betweenness centrality and k-coreness would require a more sophisticated formulation then that presented in Sec.II-A.
4) Time series model for Influenza Tweets: In Sec.II-D3 we illustrated how the mean field dynamics model ( 13) can be used to estimate the influenza infection rate with the model parameters estimated from a sampled set of the entire population.To validate the estimated parameters for the entire network requires that the infection rate be related to an observable response, in this case the number of Twitter mentions of a specific keyword.Two time series models are considered for relating the infection rate to the number of Twitter mentions.The models are validated using two real-world datasets of Twitter mentions and number of influenza cases in the U.S..
The number of influenza cases in the U.S. is obtained from the CDC 4 which publishes weekly reports from the U.S. outpatient Influenza-like Illness Surveillance Network (ILInet).The data reported by the CDC is comprised of reports from over 3000 health providers nationwide and was obtained for the dates between September 1, 2012 to October 1, 2013.The associated Twitter data for the 122 day period was obtained using the software PeopleBrowsr 5 .The pre-specified Twitter search terms used are: flu, swine and influenza.Since our focus is on monitoring influenza dynamics in the U.S., we excluded all tweets as tagged as originating from outside the U.S. The total number of mentions of a specific keyword on each is obtained using PeopleBrowsr.
We used two time series models for the volume of tweets and compared their performance.The first time series model considered is the ARX model defined by: In ( 16), τ k is the number of influenza related tweets at k, ρ k is the exogenous input of the infected influenza patients, n a , n b , a i , b i , ∆ and d are model parameters with v k an iid noise process.∆ models the delay between patient contraction, and the respective individual tweeting their symptoms.d models the mean number of tweets related to influenza that are not related to an actual infection.
The second time series model we used is the nonlinear autoregressive exogenous (NARX) model given by: (17) In ( 17) F denotes a nonlinear function which relates the exogenous input and previous tweets to the current number of tweets.Here we consider F as a support vector machine which can be trained using historical data.Note that if F was independent of previous tweets, previous exogenous inputs, and no delay (i.e.n b = 0 and ∆ = 0), then (17) would be identical to the SVM classifier used in [56], [57] to relate the number of tweets to number of infected agents.
The number of reported influenza cases, associated Twitter data, and results of the model training and prediction are displayed in Fig. 4 for the ARX ( 16) and NARX (17) models.As seen from Fig. 4(a) ,the dominant word for indicating a possible influenza outbreak is flu as compared with swine and influenza.Notice that there is a lag between the maximum confirmed influenza cases and the # of tweets; however, there is an increase in the number of tweets prior to the peak of infected patients.These dynamics are a result of a combination of infection propagation dynamics and the diffusion of information on Twitter.To account for these dynamics the ARX and NARX models presented in Sec.II-D4 are utilized.The training and prediction accuracy of these models for n a = 0, n b = 2 (i.e model input parameters ρ k−∆ and ρ k−∆−1 ) are displayed in Fig. 4(b).As seen, the NARX (17) model provides a superior estimate as compared with the ARX model (16).Interestingly there is a ∆ = 18 day delay between the maximum number of infected patients and the maximum number of Twitter mentions containing the word flu.This is contrast to the dynamics observed for the 2009 [56] and 2010-2011 [63] influenza outbreaks which show that the increase in Twitter mentions occurs earlier or at the same time as the number of infected patients increases.This also emphasizes the importance of using the mean field dynamics model for influenza propagation as compared with only using Twitter data for predicting the influenza infection rate.Here we have used the CDC data to estimate the number of infected agents, however the mean field dynamics model ( 13) could be used to estimate the dynamics of disease propagation and relate this to the observable number of tweets in real-time.
To summarize, the above datasets illustrate how Twitter can be used as a sensor for monitoring the spread of influenza in a heath network.The propagation of influenza was modeled according to the SIS model and the dynamics of tweets according to an autoregressive model.

E. Sentiment-Based Sensing Mechanism
In the above dataset, samples of influenza affected individuals were obtained from a Harvard college social network.More generally, it is often necessary to sample individuals in a social network to estimate an underlying state of nature such as the sentiment.An important question regarding sensing in a social network is: How can one construct a small but representative sample of a social network with a large number of nodes?In [65] several scale-down and back-in-time sampling procedures are studied.Below we review three sampling schemes.The simplest possible sampling scheme is uniform sampling.We also briefly describe social sampling and respondent-driven sampling which are recent methods that have become increasingly popular.
At each time k, the empirical sentiment distribution z k can be viewed as noisy observations of the infected distribution ρ k and target state process s k .
2) Social Sampling: Social sampling is an extensive area of research; see [66] for recent results.In social sampling, participants in a poll respond with a summary of their friend's responses.This leads to a reduction in the number of samples required.If the average degree of nodes in the network is d, then the savings in the number of samples is by a factor of d, since a randomly chosen node summarizes the results form d of its friends.However, the variance and bias of the estimate depend strongly on the social network structure 7 .In [66], a social sampling method is introduced and analyzed where nodes of degree d are sampled with probability proportional to 1/d.This is intuitive since weighing neighbors' values by the reciprocal of the degree undoes the bias introduced by large degree nodes.It then illustrates this social sampling method and variants on the LIVEJOURNAL network (livejournal.com)comprising of more than 5 million nodes and 160 million directed edges.
3) MCMC Based Respondent-Driven Sampling (RDS): Respondent-driven sampling (RDS) was introduced by Heckathorn [67], [68], [69] as an approach for sampling from hidden populations in social networks and has gained enormous popularity in recent years.There are more than 120 RDS studies worldwide involving sex workers and injection drug users [70].As mentioned in [71], the U.S. Centers for Disease Control and Prevention (CDC) recently selected RDS for a 25-city study of injection drug users that is part of the National HIV Behavioral Surveillance System [72].
RDS is a variant of the well known method of snowball sampling where current sample members recruit future sample members.The RDS procedure is as follows: A small number of people in the target population serve as seeds.After participating in the study, the seeds recruit other people they know through the social network in the target population.The sampling continues according to this procedure with current sample members recruiting the next wave of sample members until the desired sampling size is reached.Typically, monetary compensations are provided for participating in the data collection and recruitment.
RDS can be viewed as a form of Markov Chain Monte Carlo (MCMC) sampling (see [71] for an excellent exposition).Let {m l , l = 1 : α(d)} be the realization of an aperiodic irreducible Markov chain with state space N (d) comprising of nodes of degree d.This Markov chain models the individuals of degree d that are snowball sampled, namely, the first individual m 1 is sampled and then recruits the second individual m 2 to be sampled, who then recruits m 3 and so on.Instead of the independent sample estimator (18), an asymptotically unbiased MCMC estimate is then generated as where π(m), m ∈ N (d), denotes the stationary distribution of the Markov chain.For example, a reversible Markov chain with prescribed stationary distribution is straightforwardly generated by the Metropolis Hastings algorithm.
In RDS, the transition matrix and, hence, the stationary distribution π in the estimator ( 19) is specified as follows: Assume that edges between any two nodes m and n have symmetric weights W (m, n) (i.e., W (m, n) = W (n, m), equivalently, the network is undirected).In RDS, node m recruits node n with transition probability W (m, n)/ n W (m, n).Then, it can be easily seen that the stationary distribution is Using this stationary distribution, along with the above transition probabilities for sampling agents in (19), yields the RDS algorithm.
It is well known that a Markov chain over a nonbipartite connected undirected network G is aperiodic.Then, the initial seed for the RDS algorithm can be picked arbitrarily, and the above estimator is an asymptotically unbiased estimator.
Note the difference between RDS and social sampling: RDS uses the network to recruit the next respondent, whereas social sampling seeks to reduce the number of samples by using people's knowledge of their friends' (neighbors') opinions.
Finally, the reader may be familiar with the DARPA network challenge in 2009 where the locations of 10 red balloons in the continental US were to be determined using social networking.In this case, the winning MIT Red Balloon Challenge Team used a recruitment based sampling method.The strategy can also be viewed as a variant of the Query Incentive Network model of [73].

F. Summary and Extensions
This section has discussed the diffusion of information in social networks.Mean field dynamics were used to approximate the asymptotic infected degree distribution.An illustrative example of the spread of influenza was provided.Finally, methods for sampling the population in a social networks were reviews.Below we discuss some related concepts and extensions.
Bayesian Filtering Problem: Given the sentiment observations described above, how can the infected degree distribution ρ k and target state s k be estimated at each time instant?The partially observed state space model with dynamics (13) and discrete time observations from sampling the network can be use to obtain Bayesian filtering estimates of the underlying state of nature.Computing the conditional mean estimate s k , ρ k given the sentiment observation sequence is a Bayesian filtering problem.In fact, filtering of such jump Markov linear systems have been studied extensively in the signal processing literature [74], [75] and can be solved via the use of sequential Markov chain Monte-Carlo methods.For example, [6] reports on how a particle filter is used to localize earthquake events using Twitter as a social sensor.
Reactive Information Diffusion: A key difference between social sensors and conventional sensors in statistical signal processing is that social sensors are reactive: A social sensor uses additional information gained to modify its behavior.Consider the case where the sentimentbased observation process is made available in a public blog.Then, these observations will affect the transition dynamics of the agents and, therefore, the mean field dynamics.
How Does Connectivity Affect Mean Field Equilibrium?: The papers [46], [17] examine the structure of fixed points of the mean field differential equation (13) when the underlying target process s is not present (equivalently, s is a one state process).They consider the case where the agent transition probabilities are parametrized by p 01 (d, a) = µF (d, a) and p 10 = p F .Then, defining λ = µ/p F , they study how the following two thresholds behave with the degree distribution and diffusion mechanism: 1) Critical threshold λ c : This is defined as the minimum value of λ for which there exists a fixed point of ( 13) with positive fraction of infected agents, i.e., ρ ∞ (d) > 0 for some d and, for λ ≤ λ c , such a fixed point does not exist.2) Diffusion threshold λ d : Suppose the initial condition ρ 0 for the infected distribution is infinitesimally small.Then, λ d is the minimum value of λ for which ρ ∞ (d) > 0 for some d, and such that, for Determining how these thresholds vary with degree distribution and diffusion mechanism is very useful for understanding the long term behavior of agents in the social network.

III. BAYESIAN SOCIAL LEARNING MODELS FOR ONLINE REPUTATION SYSTEMS
In this section we address the second topic of the paper, namely, Bayesian social learning amongst social sensors.The motivation can be understood in terms of the following social sensing example.Consider the following interactions in a multi-agent social network where agents seek to estimate an underlying state of nature.Each agent visits a restaurant based on reviews on an online reputation website.The agent then obtains a private measurement of the state (e.g., the quality of food in a restaurant) in noise.After that, he reviews the restaurant on the same online reputation website.The information exchange in the social network is modelled by a directed graph.Data incest [76] arises due to loops in the information exchange graph.This is illustrated in the graph of Fig. 5. Agents 1 and 2 exchange beliefs (or actions) as depicted in Fig. 5.The fact that there are two distinct paths between Agent 1 at time 1 and Agent 1 at time 3 (these paths are denoted in red) implies that the information of Agent 1 at time 1 is double counted leading to a data incest event.How can data incest be removed so that agents obtain a fair (unbiased) estimate of the underlying state?The methodology of this section can be interpreted in terms of the recent Time article [77] which provides interesting rules for online reputation systems.These include: (i) review the reviewers, and (ii) censor fake (malicious) reviewers.The data incest removal algorithm proposed in this paper can be viewed as "reviewing the reviews" of other agents to see if they are associated with data incest or not.
The rest of this section is organized as follows: 1) Sec.III-A to III-C describe the social learning model that is used to mimic the behavior of agents in on-line reputation systems.The information exchange between agents in the social network is formulated on a family of time dependent directed acyclic graphs.2) Sec.III-D presents an incest removal algorithm so that the online reputation system achieves a fair rating.A necessary and sufficient condition is given on the graph structure of information exchange between agents so that a fair rating is achievable.3) Sec.III-E discusses conditions under which treating individual social sensors as Bayesian optimizers is a useful idealization of their behavior.In particular, it is shown that the ordinal behavior of humans can be mimicked by Bayesian optimizers under reasonable conditions.4) Sec.III-F presents a dataset obtained from a psychology experiment to illustrate social learning and data incest patterns.Related work: Collaborative recommendation systems are reviewed and studied in [78], [79].The books [16], [39] study information cascades in social learning.In [80], a model of Bayesian social learning is considered in which agents receive private information about the state of nature and observe actions of their neighbors in a tree-based network.Another type of mis-information caused by influential agents (agents who heavily affect actions of other agents in social networks) is investigated in [22].Mis-information in the context of this paper is motivated by sensor networks where the term "data incest" is used [76].Data incest also arises in Belief Propagation (BP) algorithms [81], [82] which are used in computer vision and error-correcting coding theory.BP algorithms require passing local messages over the graph (Bayesian network) at each iteration.For graphical models with loops, BP algorithms are only approximate due to the over-counting of local messages [83] which is similar to data incest in social learning.With the algorithms presented in this section, data incest can be mitigated from Bayesian social learning over nontree graphs that satisfy a topological constraint.The closest work to the current paper is [76].However, in [76], data incest is considered in a network where agents exchange their private belief states -that is, no social learning is considered.Simpler versions of this information exchange process and estimation were investigated in [84], [85], [86].We also refer the reader to [40] for a discussion of recommender systems.

A. Classical Social Learning
We briefly review the classical social learning model for the interaction of individuals.Subsequently, we will deal with more general models over a social network.
Consider a multi-agent system that aims to estimate the state of an underlying finite state random variable x ∈ X = {1, 2, . . ., X} with known prior distribution π 0 .Each agent acts once in a predetermined sequential order indexed by k = 1, 2, . . .Assume at the beginning of iteration k, all agents have access to the public belief π k−1 defined in Step (iv) below.The social learning protocol proceeds as follows [21], [16]: (i) Private Observation: At time k, agent k records a private observation y k ∈ Y from the observation distribution B iy = P (y|x = i), i ∈ X.Throughout this paper we assume that Y = {1, 2, . . ., Y } is finite.(ii) Private Belief: Using the public belief π k−1 available at time k−1 (Step (iv) below), agent k updates its private posterior belief η k (i) = P (x k = i|a 1 , . . ., a k−1 , y k ) using Bayes formula: Here 1 X denotes the X-dimensional vector of ones, η k is an X-dimensional probability mass function (pmf).
(iii) Myopic Action: Agent k takes action a k ∈ A = {1, 2, . . ., A} to minimize its expected cost Here c a = (c(i, a), i ∈ X) denotes an X dimensional cost vector, and c(i, a) denotes the cost incurred when the underlying state is i and the agent chooses action a.
Agent k then broadcasts its action a k .
(iv) Social Learning Filter: Given the action a k of agent k, and the public belief π k−1 , each subsequent agent k > k performs social learning to update the public belief π k according to the "social learning filter": where σ(π, a) = 1 X R π a P π is the normalization factor of the Bayesian update.In (22), the public belief π k (i) = P (x k = i|a 1 , . . .a k ) and R π a = diag(P (a|x = i, π), i ∈ X) has elements The following result which is well known in the economics literature [21], [16] states that if agents follow the above social learning protocol, then after some finite time k, an information cascade occurs. 8heorem 3.1 ( [21]): The social learning protocol leads to an information cascade in finite time with probability 1.That is, after some finite time k social learning ceases and the public belief π k+1 = π k , k ≥ k, and all agents choose the same action a k+1 = a k , k ≥ k.
Instead of reproducing the proof, let us give some insight as to why Theorem 3.1 holds.It can be shown using martingale methods that at some finite time k = k * , the agent's probability P (a k |y k , π k−1 ) becomes independent of the private observation y k .Then clearly, P (a k = a|x k = i, π k−1 ) = P (a k = a|π).Substituting this into the social learning filter (22), we see that π k = π k−1 .Thus after some finite time k * , the social learning filter hits a fixed point and social learning stops.As a result, all subsequent agents k > k * completely disregard their private observations and take the same action a k * , thereby forming an information cascade (and therefore a herd).

B. Data Incest in Online Reputation Systems
In comparison to the previous subsection, we now consider social learning on a family of time dependent directed acyclic graphs -in such cases, apart from herding, the phenomenon of data incest arises.
Consider an online reputation system comprised of agents {1, 2, . . ., S} that aim to estimate an underlying state of nature (a random variable).Let x ∈ X = {1, 2, . . ., X} represent the state of nature (such as the quality of a restaurant/hotel) with known prior distribution π 0 .Let k = 1, 2, 3, . . .depict epochs at which events occur.These events involve taking observations, evaluating beliefs and choosing actions as described below.The index k marks the historical order of events.For simplicity, we refer to k as "time".
It is convenient also to reduce the coordinates of time k and agent s to a single integer index n: We refer to n as a "node" of a time dependent information flow graph G n which we now define.Let denote a sequence of time-dependent directed acyclic graphs (DAGs) 9 of information flow in the social network until and including time k where n = s+S(k −1).Each vertex in V n represents an agent s in the social network at time k and each edge (n , n ) in E n ⊆ V n ×V n shows that the information (action) of node n (agent s at time k ) reaches node n (agent s at time k ).It is clear that G n is a sub-graph of G n+1 .The Adjacency Matrix A n of G n is an n × n matrix with elements A n (i, j) given by The transitive closure matrix T n is the n × n matrix where for matrix M , the matrix sgn(M ) has elements Note that A n (i, j) = 1 if there is a single hop path between nodes i and j, In comparison, T n (i, j) = 1 if there exists a path (possible multi-hop) between i and j.
The information reaching node n depends on the information flow graph G n .The following two sets will be used to specify the incest removal algorithms below: Thus H n denotes the set of previous nodes m that communicate with node n in a single-hop.In comparison, F n denotes the set of previous nodes m whose information eventually arrives at node n.Thus F n contains all possible multi-hop connections by which information from a node m eventually reaches node n.
Example: Consider S = 2 two agents with information flow graph for three time points k = 1, 2, 3 depicted in Fig. 6 characterized by the family of DAGs {G 1 , . . ., G 7 }.The adjacency matrices A 1 , . . ., A 7 are constructed as follows: A n is the upper left n × n submatrix of A n+1 and 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 . 9 A DAG is a directed graph with no directed cycles.Let us explain these matrices.Since nodes 1 and 2 do not communicate, clearly A 1 and A 2 are zero matrices.Nodes 1 and 3 communicate, hence A 3 has a single one, etc.Note that if nodes 1,3,4 and 7 are assumed to be the same individual, then at node 7, the individual remembers what happened at node 5 and node 1, but not node 3.This models the case where the individual has selective memory and remembers certain highlights.From ( 27) and ( 28), where H 7 denotes all one hop links to node 7 while F 7 denotes all multihop links to node 7.

C. Data Incest Model and Social Influence Constraint
Each node n receives recommendations from its immediate friends (one hop neighbors) according to the information flow graph defined above.That is, it receives actions {a m , m ∈ H n } from nodes m ∈ H n and then seeks to compute the associated public beliefs π m , m ∈ H n .If node n naively (incorrectly) assumes that the public beliefs π m , m ∈ H n are independent, then it would fuse these as This naive data fusion would result in data incest.

1) Aim:
The aim is to provide each node n the true posterior distribution subject to the following social influence constraint: There exists a fusion algorithm A such that 2) Discussion.Fair Rating and Social Influence: We briefly pause to discuss (30) and (31).(i) We call π 0 n− in (30) the true or fair online rating available to node n since F n defined in (28) denotes all information (multi-hop links) available to node n.By definition π 0 n− is incest free and is the desired conditional probability that agent n needs.Indeed, if node n combines π 0 n− together with its own private observation via social learning, then clearly are, respectively, the correct (incest free) private belief for node n and the correct after-action public belief.If agent n does not use π 0 n− , then incest can propagate; for example if agent n naively uses (29).
Why should an individual n agree to use π 0 n− to combine with its private message?It is here that the social influence constraint (31) is important.H n can be viewed as the "social message", i.e., personal friends of node n since they directly communicate to node n, while the associated beliefs can be viewed as the "informational message".As described in the remarkable recent paper [87], the social message from personal friends exerts a large social influence 10 -it provides significant incentive (peer pressure) for individual n to comply with the protocol of combining its estimate with π 0 n− and thereby prevent incest.[87] shows that receiving messages from known friends has significantly more influence on an individual than the information in the messages.This study includes a comparison of information messages and social messages on Facebook and their direct effect on voting behavior.To quote [87], "The effect of social transmission on real-world voting was greater than the direct effect of the messages themselves..." In Sec.III-F, we provide results of an experiment on human subjects that also illustrates social influence in social learning.[88] is an influential paper in the area of social influence. 10In a study conducted by social networking site myYearbook, 81% of respondents said they had received advice from friends and followers relating to a product purchase through a social site; 74 percent of those who received such advice found it to be influential in their decision.(Click Z, January 2010).

D. Incest Removal in Online Reputation System
It is convenient to work with the logarithm of the unnormalized belief 11 ; accordingly define The following theorem shows that the logarithm of the fair rating π 0 n− defined in (30) can be obtained as a weighted linear combination of the logarithms of previous public beliefs.
Theorem 3.2 (Fair Rating Algorithm): Suppose the network administrator runs the following algorithm for an online reputation system: where Then l n− (i) ∝ log π 0 n− (i).That is, the fair rating log π 0 n− (i) defined in ( 30) is obtained.In (32), w n is an n − 1 dimensional weight vector.Recall that t n denotes the first n−1 elements of the nth column of the transitive closure matrix T n .Theorem 3.2 says that the fair rating π 0 n− can be expressed as a linear function of the action log-likelihoods in terms of the transitive closure matrix T n of graph G n .This is intuitive since π 0 n− can be viewed as the sum of information collected by the nodes such that there are paths between all these nodes and n.
Theorem 3.3 (Achievability of Fair Rating): Consider the fair rating algorithm specified by (32).With available information (π m , m ∈ H n ) to achieve the estimates l n− of algorithm (32), a necessary and sufficient condition on the information flow graph (Recall w n is specified in (32).)Note that the constraint (33) is purely in terms of the adjacency matrix A n , since the transitive closure matrix ( 26) is a function of the adjacency matrix.

E. Ordinal Decisions and Bayesian Social Sensors
The social learning protocol assumes that each agent is a Bayesian utility optimizer.The following discussion puts together ideas from the economics literature to show that under reasonable conditions, such a Bayesian model is a useful idealization of agents' behaviors.This means that the Bayesian social learning follows simple intuitive rules and is therefore, a useful idealization.(In Sec.IV, we discuss the theory of revealed preferences which yields a non-parametric test on data to determine if an agent is a utility maximizer.) Humans typically make monotone decisions -the more favorable the private observation, the higher the recommendation.Humans make ordinal decisions 12 since humans tend to think in symbolic ordinal terms.Under what conditions is the recommendation a n made by node n monotone increasing in its observation y n and ordinal?Recall from the social learning protocol ( 21) that the actions of agents are So an equivalent question is: Under what conditions is the argmin increasing in observation y n ?Note that an increasing argmin is an ordinal propertythat is, argmin a c a B yn π 0 n− increasing in y implies argmin a φ(c a B yn π 0 n− ) is also increasing in y for any monotone function φ(•).
The following result gives sufficient conditions for each agent to give a recommendation that is monotone and ordinal in its private observation: Theorem 3.4: Suppose the observation probabilities and costs satisfy the following conditions: (A1) B iy are TP2 (totally positive of order 2); that is, B i+1,y B i,y+1 ≤ B i,y B i+1,y+1 .(A2) c(x, a) is submodular.That is, c(x, a + 1) − c(x, a) ≤ c(x + 1, a + 1) − c(x + 1, a).Then 1) Under (A1) and (A2), the recommendation a n (π 0 n− , y n ) made by agent n is increasing and hence ordinal in observation y n , for any π 0 n− .2) Under (A2), a n (π 0 n− , y n ) is increasing in belief π 0 n− with respect to the monotone likelihood ratio (MLR) stochastic order 13 for any observation y n .The proof is in [89].We can interpret the above theorem as follows.If agents makes recommendations that are monotone and ordinal in the observations and monotone in the prior, then they mimic the Bayesian social learning model.Even if the agent does not exactly follow a Bayesian social learning model, its monotone ordinal behavior implies that such a Bayesian model is a useful idealization.
Condition (A1) is widely studied in monotone decision making; see the classical book by Karlin [90] and [91]; numerous examples of noise distributions are TP2.Indeed in the highly cited paper [92] in the economics literature, observation y +1 is said to be more "favorable news" than observation y if Condition (A1) holds.
Condition (A2) is the well known submodularity condition [93], [94], [95].(A2) makes sense in a reputation system for the costs to be well posed.Suppose the recommendations in action set A are arranged in increasing order and also the states in X for the underlying state are arranged in ascending order.Then (A2) says: if recommendation a + 1 is more accurate than recommendation a for state x; then recommendation a + 1 is also more accurate than recommendation a for state x + 1 (which is a higher quality state than x).
In the experiment results reported in Sec.III-F, we found that (A1) and (A2) of Theorem 3.4 are justified.

F. Psychology Experiment Dataset
To illustrate social learning, data incest and social influence, this section presents an actual psychology experiment that was conducted by our colleagues at the Department of Psychology of University of British Columbia in September and October, 2013.The participants comprised 36 undergraduate students who participated in the experiment for course credit.
1) Experiment Setup: The experimental study involved 1658 individual trials.Each trial comprised two participants who were asked to perform a perceptual task interactively.The perceptual task was as follows: Two arrays of circles denoted left and right, were given to each pair of participants.Each participant was asked to judge which array (left or right) had the larger average diameter.The participants answer (left of right) constituted their action.So the action space is A = {0 (left), 1 (right)}.
The circles were prepared for each trial as follows: two 4 × 4 grids of circles were generated by uniformly sampling from the radii: {20, 24, 29, 35, 42} (in pixels).The average diameter of each grid was computed, and if the means differed by more than 8% or less than 4%, new grids were made.Thus in each trial, the left array and right array of circles differed in the average diameter by 4-8% For each trial, one of the two participants was chosen randomly to start the experiment by choosing an action according to his/her observation.Thereafter, each participant was given access to their partner's previous response (action) and the participants own previous action prior to making his/her judgement.This mimics the social learning protocol.The participants continued choosing actions according to this procedure until the experiment terminated.The trial terminated when the response of each of the two participants did not change for three successive iterations (the two participants did not necessarily have to agree for the trial to terminate).
In each trial, the actions of participants were recorded along with the time interval taken to choose their action.As an example, Fig. 8 illustrates the sample path of decisions made by the two participants in one of the 1658 trials.In this specific trial, the average diameter of the left array of circles was 32.1875 and the right array was 30.5625 (in pixels); so the ground truth was 0 (left).2) Experimental Results: The results of our experimental study are as follows: a) Social learning Model: As mentioned above, the experiment for each pair of participants was continued until both participants' responses stabilized.In what per-centage of these experiments, did an agreement occur between the two participants?The answer to this question reveals whether "herding" occurred in the experiments and whether the participants performed social learning (influenced by their partners).The experiments show that in 66% of trials (1102 among 1658), participants reached an agreement; that is herding occurred.Further, in 32% of the trials, both participants converged to the correct decision after a few interactions.
To construct a social learning model for the experimental data, we consider the experiments where both participants reached an agreement.Define the social learning success rate as # expts where both participants chose correct answer # expts where both participants reached an agreement • In the experimental study, the state space is X = {0, 1} where x = 0, when the left array of circles has the larger diameter and x = 1, when the right array has the larger diameter.The initial belief for both participants is considered to be π 0 = [0.5, 0.5].The observation space is assumed to be Y = {0, 1}.
To estimate the social learning model parameters (observation probabilities B iy and costs c(i, a)), we determined the parameters that best fit the learning success rate of the experimental data.The best fit parameters obtained were 14B iy = 0.61 0.39 0.41 0.59 , c(i, a) = 0 2 2 0 .
Note that B iy and c(i, a) satisfy both the conditions of the Theorem 3.4, namely TP2 observation probabilities and single-crossing cost.This implies that the subjects of this experiment made monotone and ordinal decisions.b) Data incest: Here, we study the effect of information patterns in the experimental study that can result in data incest.Since private observations are highly subjective and participants did not document these, we cannot claim with certainty if data incest changed the action of an individual.However, from the experimental data, we can localize specific information patterns that can result in incest.In particular, we focus on the two information flow graphs depicted in Fig. 9.In the two graphs of Fig. 9, the action of the first participant at time k influenced the action of the second participant at time k + 1, and thus, could have been double counted by the first participant at time k + 2. We found that in 79% of experiments, one of the information patterns shown in Fig. 9 occurred (1303 out of 1658 experiments).Further, in 21% of experiments, the information patterns shown in Fig. 9 occurred and at least one participant changed his/her decision, i.e., the judgement of participant at time k + 1 differed from his/her judgements at time k + 2 and k.These results show that even for experiments involving two participants, data incest information patterns occur frequently (79%) and causes individuals to modify their actions (21%).It shows that social learning protocols require careful design to handle and mitigate data incest.

G. Summary and Extensions
In this section, we have outlined a controlled sensing problem over a social network in which the administrator controls (removes) data incest and thereby maintains an unbiased (fair) online reputation system.The state of nature could be geographical coordinates of an event (in a target localization problem) or quality of a social unit (in an online reputation system).As discussed above, data incest arises due to the recursive nature of Bayesian estimation and non-determinism in the timing of the sensing by individuals.Details of proofs, extensions and further numerical studies are presented in [76], [96].
We summarize some extensions of the social learning framework that are relevant to interactive sensing.
1) Wisdom of Crowds: Surowiecki's book [97] is an excellent popular piece that explains the wisdom-ofcrowds hypothesis.The wisdom-of-crowds hypothesis predicts that the independent judgments of a crowd of individuals (as measured by any form of central tendency) will be relatively accurate, even when most of the individuals in the crowd are ignorant and error prone.The book also studies situations (such as rational bubbles) in which crowds are not wiser than individuals.Collect enough people on a street corner staring at the sky, and everyone who walks past will look up.Such herding behavior is typical in social learning.
2) In which order should agents act?: In the social learning protocol, we assumed that the agents act sequentially in a pre-defined order.However, in many social networking applications, it is important to optimize the order in which agents act.For example, consider an online review site where individual reviewers with different reputations make their reviews publicly available.If a reviewer with high reputation publishes her review first, this review will unduly affect the decision of a reviewer with lower reputation.In other words, if the most senior agent "speaks" first it would unduly affect the decisions of more junior agents.This could lead to an increase in bias of the underlying state estimate. 15On the other hand, if the most junior agent is polled first, then since its variance is large, several agents would need to be polled in order to reduce the variance.We refer the reader to [99] for an interesting description of who should speak first in a public debate. 16It turns out that for two agents, the seniority rule is always optimal for any prior -that is, the senior agent speaks first followed by the junior agent; see [99] for the proof.However, for more than two agents, the optimal order depends on the prior, and the observations in general.

IV. REVEALED PREFERENCES: ARE SOCIAL SENSORS UTILITY MAXIMIZERS?
We now move on to the third main topic of the paper, namely, the principle of revealed preferences.The main question addressed is: Given a dataset of decisions made by a social sensor, is it possible to determine if the social sensor is a utility maximizer?More generally, is a dataset from a multiagent system consistent with play from a Nash equilibrium?If yes, can the behavior of the social sensors be learned using data from the social network?
These questions are fundamentally different to the model-based theme that is widely used in the communications literature in which an objective function (typically convex) is proposed and then algorithms are constructed to compute the minimum.In contrast, the revealed preference approach is data-centric-we wish to determine whether the dataset is obtained from an utility maximizer.In simple terms, revealed preference theory seeks to determine if an agent is an utility maximizer subject to budget constraints based on observing its choices over time.The principle of revealed preferences is widely studied in the micro-economics literature.As mentioned in Sec.I, Varian (chief economist at Google) has written several influential papers in this area.In this section we will use the principle of revealed preferences on datasets to determine how social sensors behave as a function of external influence.The setup is depicted in the schematic diagram Fig. 10.

A. Afriat's Theorem for a single agent
The theory of revealed preferences was pioneered by Samuelson [100].Afriat published a highly influential paper [29] in revealed preferences (see also [101]).Given a time-series of data D = {(p t , x t ), t ∈ {1, 2, . . ., T }} where p t ∈ R m denotes the external influence, x t denotes the response of agent, and t denotes the time index, is it possible to detect if the agent is a utility maximizer?An agent is a utility maximizer if for every external influence p t , the chosen response x t satisfies with u(x) a non-satiated utility function.Nonsatiated means that an increase in any element of response x results in the utility function increasing. 17As shown by Diewert [102], without local nonsatiation the maximization problem (34) may have no solution.
In (34) the budget constraint p t x ≤ I t denotes the total amount of resources available to the social sensor for selecting the response x to the external influence p t .For example, if p t is the electricity price and x t the associated electricity consumption, then the budget of the social sensor is the available monetary funds for purchasing electricity.In the real-world social sensor datasets provided in this paper, further insights are provided for the budget constraint.
The celebrated "Afriat's theorem" provides a necessary and sufficient condition for a finite dataset D to have originated from an utility maximizer.Afriat's theorem has subsequently been expanded and refined, most notably by Diewert [102], Varian [30] and Blundell [103].
Theorem 4.1 (Afriat's Theorem): Given a dataset D = {(p t , x t ) : t ∈ {1, 2, . . ., T }}, the following statements are equivalent: 1) The agent is a utility maximizer and there exists a nonsatiated and concave utility function that satisfies (34).2) For scalars u t and λ t > 0 the following set of inequalities has a feasible solution: A nonsatiated and concave utility function that satisfies (34) is given by: 4) The dataset D satisfies the Generalized Axiom of Revealed Preference (GARP), namely for any As pointed out in Varian's influential paper [30], a remarkable feature of Afriat's theorem is that if the dataset can be rationalized by a non-trivial utility function, then it can be rationalized by a continuous, concave, monotonic utility function."Put another way, violations of continuity, concavity, or monotonicity cannot be detected with only a finite number of demand observations".
Verifying GARP (statement 4 of Theorem 4.1) on a dataset D comprising T points can be done using Warshall's algorithm with O(T 3 ) [30], [104] computations.Alternatively, determining if Afriat's inequalities (35) are feasible can be done via a LP feasibility test (using for example interior point methods [105]).Note that the utility (36) is not unique and is ordinal by construction.Ordinal means that any monotone increasing transformation of the utility function will also satisfy Afriat's theorem.Therefore the utility mimics the ordinal behavior of humans, see also Sec.III-E.Geometrically the estimated utility (36) is the lower envelop of a finite number of hyperplanes that is consistent with the dataset D.
Note that GARP is equivalent to the notion of "cyclical consistency" [106] -they state that the responses are consistent with utility maximization if no negative cycles are present.As an example, consider a dataset D with T = 2 observations resulting from a utility maximization agent.Then GARP states that p 1 x 1 ≥ p 1 x 2 =⇒ p 2 x 2 ≤ p 2 x 1 .From (34), the underlying utility function must satisfy u(x 1 ) ≥ u(x 2 ) =⇒ u(x 2 ) ≤ u(x 1 ) where the equality results if Another remarkable feature of Afriat's Theorem is that no parametric assumptions of the utility function of the agent are necessary.To gain insight into the construction of the inequalities (35), let us assume the utility function u(x) in ( 34) is increasing for positive x, concave, and differentiable.If x t solves the maximization problem (34), then from the Karush-Kuhn-Tucker (KKT) conditions there must exist Lagrange multipliers λ t such that is satisfied for all t ∈ {1, 2, . . ., T }.Note that since u(x) is increasing, ∇u(x t ) = λ t p t > 0, and since p t is strictly positive, λ t > 0. Given that u(x) is a concave differentiable function, it follows that Denoting u t = u(x t ) and u τ = u(x τ ), and using the KKT conditions and concave differential property, the inequalities (35) result.To prove that if the solution of ( 35) is feasible then GARP is satisfied can be performed using the duality theorem of linear programming as illustrated in [104].

B. Revealed Preferences for Multi-agent Social Sensors
We now consider a multi-agent version of Afriat's theorem for deciding if a dataset is generated by playing from the equilibrium of a potential game 18 An example is the control of power consumption in the electrical grid.Consider a corporate network of financial management operators that select the electricity prices in a set of zones in the power grid.By selecting the prices of electricity the operators are expected to be able to control the power consumption in each zone.The operators wish to supply their consumers with sufficient power however given the finite amount of resources the operators in the corporate network must interact.This behavior can be modelled as a game.Recent analysis of energy use scheduling and demand side management schemes in the energy market have been performed using potential games [108], [109], [110].Another example of potential games are congestion games [111], [112], [113], [114] in which the utility of each player depends on the amount of resource it and other players use.
Consider the social network presented in Fig. 10, given a time-series of data from N agents D = {(p t , x 1 t , . . ., x n t ) : t ∈ {1, 2, . . ., T }} with p t ∈ R m the external influence, x i t the response of agent i, and t the time index, is it possible to detect if the dataset originated from agents that play a potential game?
The characterization of how agents behave as a function of external influence, for example price of using a resource, and the responses of other agents in a social network, is key for analysis.Consider the social network illustrated in Fig. 10.There are a total of n interacting agents in the network and each can produce a response x i t in response to the other agents and an external influence p t .Without any a priori assumptions about the agents, how can the behaviour of the agents in the social network be learned?In the engineering literature the behaviour of agents is typically defined a priori using a utility function, however our focus here is on learning the behaviour of agents.The utility function captures the satisfaction or payoff an agent receives from a set of possible responses, denoted by X. Formally, a utility function u : X → R represents a preference relation between responses x 1 and x 2 if and only if for every x 1 , x 2 ∈ X, u(x 1 ) ≤ u(x 2 ) implies x 2 is preferred to x 1 .Given a time-series of data D = {(p t , x 1 t , . . ., x n t ) : t ∈ {1, 2, . . ., T }} with p t ∈ R m the external influence, x i t the response of agent i, and t the time index, is it possible to detect if the series originated from an agent that is a utility maximizer?10.Schematic of a social network containing n agents where pt ∈ R m denotes the external influence, and x i t ∈ R m the response of agent i in response to the external influence and other agents at time t.Note that dotted line denotes consumers 4, . . ., n − 1.The aim is to determine if the dataset D defined in (37), is consistent with play from a Nash equilibrium.
In a network of social sensors (Fig. 10), the responses of agents may be dependent on both the external influence p t and the responses of the other agents in the network, denoted by x −i t .The utility function of the agent must now include the responses of other agents-formally if there are n agents, each has a utility function u i (x i , x −i t ) with x i denoting the response of agent i, x −i t the responses of the other n − 1 agents, and u i (•) the utility of agent i.Given a dataset D, is it possible to detect if the data is consistent with agents playing a game and maximizing their individual utilities?Deb, following Varian's and Afriat's work, shows that refutable restrictions exist for the dataset D, given by (37), to satisfy Nash equilibrium (38) [107], [29], [115].These refutable restrictions are however, satisfied by most D [107].The detection of agents engaged in a concave potential game, and generating responses that satisfy Nash equilibrium, provide stronger restrictions on the dataset D [107], [116].We denote this behaviour as Nash rationality, defined as follows: Definition 4.1 ([116], [117], [118]): Given a dataset D is consistent with Nash equilibrium play if there exist utility functions u i (x i , x −i ) such that In (38), u i (x, x −i ) is a nonsatiated utility function in x, x −i = {x j } j =i for i, j ∈ {1, 2, . . ., n}, and the elements of p t are strictly positive.Nonsatiated means that for any > 0, there exists a x i with x i − x i t 2 < such that u i (x i , x −i ) > u i (x i t , x −i t ).If for all x i , x j ∈ X i , there exists a concave potential function V that satisfies for all the utility functions u i (•) with i ∈ {1, 2, . . ., n}, then the dataset D satisfies Nash rationality.Just as with the utility maximization budget constraint in (34), the budget constraint p t x i ≤ I i t in (38) models the total amount of resources available to the social sensor for selecting the response x i t to the external influence p t .The detection test for Nash rationality (Definition 4.1) has been used in [119] to detect if oil producing countries are collusive, and in [107] for the analysis of household consumption behaviour.
In the following sections, decision tests for utility maximization, and a non-parametric learning algorithm for predicting agent responses are presented.Three real world datasets are analyzed using the non-parametric decision tests and learning algorithm.The datasets are comprised of bidders auctioning behaviour, electrical consumption in the power grid, and on the tweeting dynamics of agents in the social network Twitter illustrated in Fig. 10.

C. Decision Test for Nash Rationality
This section presents a non-parametric test for Nash rationality given the dataset D defined in (37).If the dataset D passes the test, then it is consistent with play according to a Nash equilibrium of a concave potential game.In Sec.IV-D, a learning algorithm is provided that can be used to predict the response of agents in the social network provided in Fig. 10.
The following theorem provides necessary and sufficient conditions for a dataset D to be consistent with Nash rationality (Definition 4.1).The proof is analogous to Afriat's Theorem when the concave potential function of the game is differentiable [107], [116], [120].
Theorem 4.2 (Multiagent Afriat's Theorem): Given a dataset D (37), the following statements are equivalent: 1) D is consistent with Nash rationality (Definition 4.1) for an n-player concave potential game.2) Given scalars v t and λ i t > 0 the following set of inequalities have a feasible solution for t, τ ∈ {1, . . ., T }, 3) A concave potential function that satisfies (38) is given by:  41) is ordinal-that is, unique up to positive monotone transformations.Therefore several possible options for V (•) exist that would produce identical preference relations to the actual potential function V (•).In 4) of Theorem 4.2, the first condition only provides necessary and sufficient conditions for the dataset D to be consistent with a Nash equilibrium of a game, therefore the second condition is required to ensure consistency with the other statements in the Multiagent Afriat's Theorem.The intuition that connects statements 1 and 3 in Theorem 4.2 is provided by the following result from [118]; for any smooth potential game that admits a concave potential function V , a sequence of responses {x i } i∈{1,2,...,n} are generated by a pure-strategy Nash equilibrium if and only if it is a maximizer of the potential function, for each probe vector p t ∈ R m + .The non-parametric test for Nash rationality involves determining if (40) has a feasible solution.Computing parameters v t and λ i t > 0 in (40) involves solving a linear program with T 2 linear constraints in (n + 1)T variables, which has polynomial time complexity [105].In the special case of one agent, the constraint set in (40) is the dual of the shortest path problem in network flows.Using the graph theoretic algorithm presented in [121], the solution of the parameters u t and λ t in (35) can be computed with time complexity O(T 3 ).

D. Learning Algorithm for Response Prediction
In the previous section a non-parametric tests to detect if a dataset D is consistent with Nash rationality was provided.If the D satisfies Nash rationality, then the Multiagent Afriat's Theorem can be used to construct the concave potential function of the game for agents in the social network illustrated in Fig. 10.In this section we provide a non-parametric learning algorithm that can be used to predict the responses of agents using the constructed concave potential function (41).
To predict the response of agent i, denoted by xi τ , for probe p τ and budget I i τ , the optimization problem (42) is solved using the estimated potential function V (41), p τ , and I i τ .Computing xi τ requires solving an optimization problem with linear constraints and concave piecewise linear objective.This can be solved using the interior point algorithm [105].The algorithm used to predict the response xτ = (x

E. Dataset 1: Online Multiwinner Auction
This section illustrates how Afriat's Theorem from Sec.IV-A can be used to determine if bidders in an online multiwinner auction are utility optimizers.Online auctions are rapidly gaining popularity since bidders do not have to gather at the same geographical location.Several researchers have focused on the timing of bids and multiple bidding behavior in Amazon and eBay auctions [122], [123], [124], [125].The analysis of the bidding behavior can be exploited by auctioneers to target suitable bidders and thereby increase profits.
The multiwinner auction dataset was obtained from an experimental study conducted amongst undergraduate students in Electrical Engineering at Princeton University in March 25 th 2011 19 .The multiwinner auction consists of bidders competing for questions that will aid them for an upcoming midterm exam.The social network is composed of n = 12 bidders where the bidders do not interact with other bidders, they only interact with the external influence, refer to Fig. 10.Each bidder is endowed with 500 tokens prior to starting the multiwinner auction.The number of questions being auctioned is not known to the bidders, this prevents the bidders from immediately submitting their entire budget in the final auction.Each auction consists of auctioning a single question at an initial price of 10 tokens and has a duration of 30 min.At the beginning of each auction the bidders are provided with the number of winners, denoted k, that auction will have, and the budget of each bidder.The bids are private information with each bidder only informed when their bid has been outbid.If the bidding behaviour of agents satisfies Afriat's test (35) for utility maximization, the next goal is to classify the behaviour of bidders into two categories: strategic and frantic.If a bidder fails Afriat's test then they are classified as irrational.Strategic bidders will typically submit a large number of bids and a smaller bid amount when compared to frantic bidders.With this bidding behaviour, strategic bidders force the other bidders to spend too much eliminating them from competing in future auctions.Frantic bidders are however only interested in winning the current auction.
To apply Afriat's test (35), the external influence p t , and bidder responses x t must be defined.The external influence for each bidder is defined by p i t = [p i t (1), p i t (2)] with p i t (1) = initial bid amount representing the bidders interest level for winning, and p i t (2) = # of winners representing the perception of winning where i is the bidder and t the auction.Two datasets are considered for analysis denoted by D 1 and D 2 .An identical external influence is used to construct both D 1 and D 2 .The responses in D 1 are given by x i t = [x i t (1), x i t (2)] where x i t (1) = # of bids and x i t (2) = mean bid amount; and for D 2 the inputs of x i t are given by x i t (1) = # of bids and For utility maximization bidders, an estimate of the utility function of each bidder is required to classify them as strategic or frantic.To estimate the utility function of the bidders, a subset of data from D 1 , denoted as D1 , is selected such that the preferences of all agents i in D1 are identical.It was determined that D1 = {(p i t , x i t ) : i ∈ {1, 3, 5, 7, 12}}.Since the preferences of these bidders are identical, we can consider all the data in D1 as originating from a single representative bidder.This allows an improved estimate of the utility function of these bidders as compared to learning the utility function for each bidder separately.An analogous explanation is used for the construction of D2 = {(p i t , x i t ) : i ∈ {1, 2, 3, 4, 7, 8, 9}} from the dataset D 2 .The estimated utility function for D1 is given in Fig. 11(a) and for D2 in Fig. 11(b).As seen from Fig. 11(a) and Fig. 11(b), bidders have a preference to increase the number of bids compared with increasing the mean bid amount or the difference in mean bid amount.This follows logically as x i t (2) increases, the bidder will have to pay more tokens to win the question limiting their ability to bid in future auctions.Interestingly, the bidders show strategic and frantic behaviour in both datasets D1 and D2 , as seen in Fig. 11(a) and Fig. 11(b).This is consistent with the results in [124] which show that bidders change their bidding behavior between successive auctions.
The analysis shows that auctioneers should target bidders that show frantic bidding behavior as they are likely to overspend on items increasing the revenue of the auctioneer.Such behavior can be detected using utility maximization test and constructed utility function from Afriat's Theorem.

F. Dataset 2: Ontario Electrical Energy Market Dataset
In this section we consider the aggregate power consumption of different zones in the Ontario power grid.A sampling period of T = 79 days starting from January 2013 is used to generate the dataset D for the analysis.All price and power consumption data is available from the Independent Electricity System Operator20 (IESO) website.Each zone is considered as an agent in the corporate network illustrated in Fig. 10.The study of corporate social networks was pioneered by Granovetter [127], [128] which shows that the social structure of the network can have important economic outcomes.
Examples include agents choice of alliance partners, assumption of rational behavior, self interest behavior, and the learning of other agents behavior.Here we test for rational behavior (i.e.utility maximization and Nash rationality), and if true then learn the associated behavior of the zones.This analysis provides useful information for constructing demand side management (DSM) strategies for controlling power consumption in the electricity market.For example, if a utility function exists it can be used in the DSM strategy presented in [129], [130].
The zones power consumption is regulated by the associated price of electricity set by the senior management officer in each respective zone.Since there is a finite amount of power in the grid, each officer must communicate with other officers in the network to set the price of electricity.Here we utilize the aggregate power consumption from each of the n = 10 zones in the Ontario power grid and apply the non-parametric tests for utility maximization (35) and Nash rationality (40) to detect if the zones are demand responsive.If the utility maximization or Nash rationality tests are satisfied, then the power consumption behaviour is modelled by constructing the associated utility function (36) or concave potential function of the game (41).
To perform the analysis the external influence p t and response of agents x t must be defined.In the Ontario power grid the wholesale price of electricity is dependent on several factors such as consumer behaviour, weather, and economic conditions.Therefore the external influence is defined as p t = [p t (1), p t (2)] with p t (1) the average electricity price between midnight and noon, and p t (2) as the average between noon and midnight with t denoting day.The response of each zone correspond to the total aggregate power consumption in each respective tie associated with p t (1) and p t (2) and is given by x i t = [x i t (1), x i t (2)] with i ∈ {1, 2, . . ., n}.The budget I i t of each zone has units of dollars as p t has units of $/kWh and x i t units of kWh.We found that the aggregate consumption data of each zone does not satisfy Afriat's utility maximization test (35).This points to the possibility that the zones are engaged in a concave potential game-this would not be a surprising result as network congestion games have been shown to reduce peak power demand in distributed demand management schemes [109].To test if the dataset D is consistent with Nash rationality the detection test (40) is applied.The dataset for the power consumption in the Ontario power gird is consistent with Nash rationality.Using (43), a concave potential function for the game is constructed.Using the constructed potential function, when do agents prefer to consume power?The marginal rate of substitution21 (MRS) can be used to determine the preferred time for power usage.Formally, the MRS of x i (1) for x i (2) is given by ∂ V /∂x i (2) .
From the constructed potential function we find that MRS 12 > 1 suggesting that the agents prefer to use power in the time period associated with x t (1)-that is, the agents are willing to give up MRS 12 kWh of power in the time period associated with x i (2) for 1 additional kWh of power in time period associated with x i (1).The analysis in this section suggests that the power consumption behavior of agents is consistent with players engaged in a concave potential game.Using the Multiagent Afriat's Theorem the agents preference for using power was estimated.This information can be used to improve the DSM strategies presented in [129], [130] to control power consumption in the electricity market.

G. Dataset 3: Twitter Data
Does the tweeting behaviour of Twitter agents satisfy a utility maximization process?The goal is to investigate how tweets and trend indices 22 impact the tweets of agents in the Twitter social network.The information provided by this analysis can be used in social media marketing strategies to improve a brand and for brand awareness.As discussed in [131], Twitter may relay on a huge amount of agent-generated data which can be analyzed to provide novel personal advertising to agents.
To apply Afriat's utility maximization test (35), we choose the external influence and response as follows.External influence p t = [#Sony, 1/#Playstation] for each day t.The associated response taken by the agents in the network is given by x t = [#Microsoft, #Xbox].Notice that the probe p t (2) can be interpreted as the frequency of tweets with the word Playstation (i.e. the trending index).The dataset D of external influence and responses is constructed from T = 80 days of Twitter data starting from January 1 st 2013.The dataset D satisfies the utility maximization test (35).This establishes that utility function exists for agents that is dependent on the number of tweets containing the words Microsoft and Xbox.The data shows that tweets containing the word Microsoft and Xbox are dependent on the number of tweets containing Sony and trending index of Playstation.This dependency is expected as Microsoft produces the game console Xbox, and Sony produces the game console Playstation both which have a large number of brand followers (e.g.Xbox has over 3 million, and Playstation over 4 million).To gain further insight into the behaviour of the agents, (36) from Afriat's Theorem is used to construct a utility function for the agents.Fig. 12 shows the constructed utility function of the agents.As seen, agents have a higher utility for using the word Microsoft as compared to Xbox-that is, agents prefer to use the word Microsoft to that of Xbox.Interestingly, if we define the response to be x t = [#Microsoft, 1/#Xbox], then the dataset satisfies utility maximization.From the constructed utility function, not shown, the agents prefer to increase the tweets containing the word Microsoft compared to increasing the trend index of Xbox.If instead x t = [1/#Microsoft, 1/#Xbox], then the dataset satisfies utility maximization and agents prefer to increase the trend index of Microsoft compared to that of Xbox.
To summarize, the above analysis suggests the following interesting fact: Xbox has a lower utility than Microsoft in terms of Twitter sentiment.Therefore, online marketing strategies should target the brandname Microsoft of Xbox.

H. Summary and Extensions
The principle of revealed preferences is an active research area with numerous recent papers.We have already mentioned the papers [102], [30], [103].Below we summarize some related literature that extends the basic framework of Afriat's theorem.
Afriat's theorem holds for finite datasets and gives an explicit construction of a class of concave utility functions that rationalize the dataset.Mas Colell [132] has given sufficient conditions under which as the data set size T grows to infinity, the underlying utility function of the consumer can be fully identified.
Though the classical Afriat's theorem holds for linear budget constraints p t x ≤ I t in (34), an identical formulation holds for certain non-linear budget constraints as illustrated in [133].The budget constraints considered in [133] are of the form {x ∈ R m + |g(x) ≤ 0} where g : R m + → R is an increasing continuous function and R m + denotes the positive orthant.Also [133] shows how the results in [132] on recoverability of the utility function can be extended to such nonlinear budget constraints.However, learning the utility function from a finite dataset in the case of a non-linear budget constraint requires sophisticated machine learning algorithms [134].The machine learning algorithms can only guarantee that the estimated utility function is approximately consistent with the dataset D-that is, the estimated utility is not guaranteed to contain all the preference relations consistent with the dataset D.
In [121], results in statistical learning theory are applied to the principle of revealed preferences to address the question: Is the class of demand functions derived from monotone concave utilities efficiently probably approximately correct (PAC) learnable?It is shown that Lipschitz utility functions are efficiently PAC learnable.In [120], the authors extend the results of [121] and show that for agents engaged in a concave potential game that satisfy Nash rationality, if the underlying potential function satisfies the Lipschitz condition then the potential function of the game is PAC learnable.
Finally, in many cases, the responses of agents are observed in noise.Then determining if an agent is a utility maximizer (or a multiagent system's response is consistent with play from a Nash) becomes a statistical decision test.In [135] it is shown how stochastic optimization algorithms can be devised to optimize the probe signals to minimize the type II errors of the decision test subject to a fixed type I error.
V. CLOSING REMARKS This paper has discussed three important and interrelated themes regarding the dynamics of social sensors, namely, diffusion models for information in social networks, Bayesian social learning and revealed preferences.In each case, examples involving real datasets were given to illustrate the various concepts.The unifying theme behind these three topics stems from predicting global behavior given local behavior: individual social sensors make decisions and learn from other social sensors and we are interested in understanding the behavior of the entire network.In Sec.II we showed that the global degree of infected nodes can be determined by mean field dynamics.In Sec.III, it was shown that despite the apparent simplicity in information flows between social sensors, the global system can exhibit unusual behavior such as herding and data incest.Finally, in Sec.IV a non-parametric method was used to determine the utility functions of a multiagent systemthis can be used to predict the response of the system.This paper has dealt with social sensing issues of relevance to an engineering audience.There are several topics of relevance to social sensors that are omitted due to space constraints, including: • Coordination of decisions via game-theoretic learning [136], [137], [138] or Bayesian game models such as global games [139] • Consensus formation over social networks and cooperative models of network formation [140], [141] • Small world models [142], [143] • Peer to peer media sharing [144], [145] • Privacy and security modelling [146], [147]

Fig. 1 .
Fig. 1.Dynamics of Health Network are modelled using the SIS model and a Linear and Nonlinear Autoregressive with Exogenous input time series models, refer to Sec.II for details.

θ
(ρ k ) = D d=1 (# of links from infected node of degree d) D d=1 (# of links of degree d) = D d=1 d P (d) ρ k (d) D d d P (d) .

D
d=1 N (d) of the Markov chain ρ.
) where C 1 and C 2 are positive constants and T = O(N ).

= 1 .Fig. 2 .
Fig.2.The infected link probability obtained from network simulation compared to the one obtained from the mean field dynamics model(13).The transition probabilities in(15) depend only on the number of infected neighbors A (m) k (the parameters are defined in Scenario 1).
Fig. 3. Snapshots from youtube video of Harvard undergraduate social network propagation of influenza and the estimated infected link probability θ k (9) for October 10 to December 23, 2009.

1 )Fig. 4 .
Fig.4.The experimental data is obtained for September 2012 to October 2013 as described in Sec.II-D2.The ARX(16) and NARX(17) models are utilized to estimate the number of tweets with flu given the number of infected influenza patients.

Fig. 5 .
Fig. 5. Example of the information flow (communication graph) in a social network with two agents and over three event epochs.The arrows represent exchange of information.

Fig. 6 .
Fig. 6.Example of information flow network with S = 2 two agents, namely s ∈ {1, 2} and time points k = 1, 2, 3, 4. Circles represent the nodes indexed by n = s + S(k − 1) in the social network and each edge depicts a communication link between two nodes.

Fig. 7 .
Fig. 7. Two arrays of circles were given to each pair of participants on a screen.Their task is to interactively determine which side (either left or right) had the larger average diameter.The partner's previous decision was displayed on screen prior to the stimulus.

Fig. 8 .
Fig. 8. Example of sample path of actions chosen by two participants in a single trial of the experiment.In this trial, both participants eventually chose the correct answer 0 (left).

1 Fig. 9 .
Fig.9.Two information patterns from our experimental studies which can result in data incest.
2) = mean change in bid amount.The response x i t (2) in D 1 provides the expected bid amount, and x i t (2) in D 2 a measure of the statistical dispersion of the bids.The budget I i t of each bidder has units of tokens multiplied by # of bids, and is constrained as the number of tokens and auction duration are finite.The datasets D 1 and D 2 are constructed from T = 6 auctions.The nonparametric test (35) is applied to each dataset D 1 and D 2 to detect irrational bidders.For dataset D 1 bidder 4 is irrational, and for D 2 bidder 11 is irrational.Note that the classification of irrational behaviour is dependent on the choice of response signals used for analysis by the experimentalist.
(a) Estimated utility function u(xt) using dataset D1 defined in Sec.IV-E.(b) Estimated utility function u(xt) using dataset D2 defined in Sec.IV-E.

Fig. 11 .
Fig. 11.Estimated utility function of bidders is constructed using (36) using the datasets D1 and D2 defined in Sec.IV-E.The black dots represent the utility associated with experimentally measured responses, and the colour (blue to red) indicates the utility level.The black dots indicate the observed demands, and the shape (i.e.circle, diamond etc.) denotes the respective bidder.

Fig. 12 .
Fig. 12.Estimated utility function u(xt) using dataset D defined in Sec.IV-G and constructed using the non-parametric learning algorithm (36) from Afriat's Theorem.
1, x 2 , . . ., x n ) = min Step 1: Select a probe vector p τ ∈ R m τ , • • • , xn τ ) is given below: τ ∀i ∈ {1, 2, . . ., n} The bidders do not communicate with each other during the auction.At the end of each auction, the first k highest bidders are selected, denoted by ζ 1 , ζ 2 , . . ., ζ k .The bidders ζ 1 , ζ 2 , . . ., ζ k are awarded with the question, and pay the second largest bid amount (i.e.bidder ζ k pays ζ k−1 's bid amount).In the multiwinner auction it is in the self interest of bidders to force other bidders to spend too much eliminating them from competing in successive auctions.