Influence Prediction for Continuous-Time Information Propagation on Networks

We consider the problem of predicting the time evolution of influence, the expected number of activated nodes, given a set of initially active nodes on a propagation network. To address the significant computational challenges of this problem on large-scale heterogeneous networks, we establish a system of differential equations governing the dynamics of probability mass functions on the state graph where the nodes each lumps a number of activation states of the network, which can be considered as an analogue to the Fokker-Planck equation in continuous space. We provides several methods to estimate the system parameters which depend on the identities of the initially active nodes, network topology, and activation rates etc. The influence is then estimated by the solution of such a system of differential equations. This approach gives rise to a class of novel and scalable algorithms that work effectively for large-scale and dense networks. Numerical results are provided to show the very promising performance in terms of prediction accuracy and computational efficiency of this approach.


Introduction
Viral signal propagation on large heterogeneous networks is an emerging research subject of both theoretical and practical importance. Influence prediction is one of the most fundamental problems about propagation on networks, and it has been arising from many real-world applications of significant societal impact, such as news spread on social media, viral marketing, computer malware detection, and epidemics on heterogeneous networks. For instance, when considering a social network formed by people such as that of Facebook or Twitter, the viral signal can be a tweet or a trendy topic being retweeted by users (nodes) on the network formed by their followee-follower relationships. We call a user activated if he/she participates to tweet, and the followers of this user get activated if they retweet his/her tweet later, thus the activation process gradually progresses (propagates) and the tweet spreads out. A viral signal can also be a new electronic gadget that finds wide-spread adoption in the user population through a word-of-mouth viral marketing process [15,16,23], and a user is called activated when he/she adopts this new gadget. Influence prediction is to quantitatively estimate how influence, defined by the expected number of activated nodes, evolves over time during the propagation when a specific set (called source set) of nodes are initially activated.
Influence prediction is also the most critical step in solving problems arising from many important downstream applications such as influence maximization [6,12,13,29] and outbreak detection [7,16]. For instance, in influence maximization, the goal is to select the source node set of a given size from the propagation network such that its influence is maximized at a prescribed time. Obviously, influence prediction serves as the most fundamental subroutine in the computation, and the quality of influence maximization heavily depends on the accuracy of influence prediction.
• Quantitative estimate of influence for time t before equilibrium. Note the equilibrium state of SI model on network is trivial: all nodes that can be reached from the source set will be infected as time tending to infinity. However, practical interests often lie in influence before equilibrium. For example, merchants would like to know how many people will be influenced by commercial advertisement within one month rather than three years later. In this case, we need to estimate the time evolution of influence in early to middle stage where the propagation is still in nonequilibrium state.
Our discussion also includes the case of self-activation where the unactivated nodes can activate themselves automatically. If infected nodes can recover, become susceptible and prune to future infection, then the model is called susceptible-infected-susceptible (SIS), for which we only provide a brief discussion near the end of this paper. Given network G = (V, E), the stochastic propagation process is determined by the distribution of the activation times between nodes. In this paper, we mainly consider the model with activation times exponentially distributed which is widely used in classical epidemic study and social network. In this model, the time for a just activated node i to activate each unactivated j in N out i , denoted by t i,j , follows exp(α ij ) distribution (here t follows exp(α) distribution if the probability density function of t is p t (τ ) = αe −ατ for τ ≥ 0), and is independent of any other t i ,j where i = i and/or j = j . Here α ij > 0 indicates the instantaneous activation rate of j by i. If (i, j) / ∈ E, we set α ij = 0 by convention. Note that exponential random variable following exp(α) has mean 1/α, therefore, the larger α ij is, the faster i can activate j on expectation. Hence α ij can be interpreted as the impact level (weight) of i on j. Similarly, the time for an inactive node i to get self activated follows exp(β i ) for some β i > 0. When recovery scenario is considered, an activated node i can recover in time following exp(γ i ) for some γ i > 0 since it is activated.
The continuous-time propagation model with heterogeneous activation rates appears suitable for a great number of real-world applications and has been advocated by many recent works [10,12,22,27,28]. In addition, this model yields a time-homogeneous Markov propagation process so that numerical simulations can be implemented in a straightforward manner and some theoretical analysis of the algorithm can be carried out. Therefore, we focus on the development of the algorithm on this propagation model, and evaluate the performance numerically to obtain references worthy of trust through a large amount of Monte Carlo simulations. However, the general framework using Fokker-Planck equations -the main strategy in the current work -as well as the error estimations developed in Section 2.3 apply to any propagation models (e.g., activation time not exponentially distributed, such as Hawkes processes) on networks.
To estimate time evolution of influence of a source set S, we define a single stochastic process N (t; S) as the number of activated nodes at time t when the source set is S. Then we directly compute the probability distribution of N (t; S): ρ k (t; S) := Pr(N (t; S) = k), for k = 0, 1, . . . , K. (1) The influence, defined by the expected number of activated nodes at time t, can therefore be calculated easily by where K := |V | is the size of the network. The main focus of this paper is to establish a general framework for computing (predicting) influence µ(t; S) based on (1) and (2) for any given source set S. More precisely, we build the system of equations for the time evolution of {ρ k (t; S) : 0 ≤ k ≤ K}, analyze its properties, estimate the parameters in the equations, and solve for all ρ k (t; S) to predict the influence µ(t; S) in (2) for all t. Since the source S is arbitrarily set in advance, we drop the symbol S in the derivation hereafter for notation simplicity.
The idea of deriving evolution equations of {ρ k (t) : 0 ≤ k ≤ K} is closely related to the theory of Fokker-Planck equation. In continuous space R n , consider a classical stochastic process X(t) that stands for the location of a particle at time t. Let ρ(x, t) denote the probability density that X(t) is located at x ∈ R n at time t, then ρ(x, t) evolves over time with a constraint R n ρ(x, t)dx = 1 at every t. The Fokker-Planck equation, also known as the forward Kolmogorov equation, is a deterministic partial differential equation governing the time evolution of ρ(x, t). For example, if X(t) moves according to a stochastic differential is an n-dimensional Brownian motion and Ψ(·) is a scalar-valued potential function, then the Fokker-Planck equation Here ∆ρ(x, t) corresponds to the W (t) term in the SDE and is called the diffusion term, and ∇ · (∇Ψ(x)ρ(x, t)) is the drift term. Note that the statistics of X(t) can be completely determined by the solution ρ(x, t) of the Fokker-Planck equation.
Likewise, the probabilities {ρ k (t) : 0 ≤ k ≤ K} in our approach also evolve over time with K k=0 ρ k (t) = 1 at all t. The time evolution of ρ k (t) is also governed by certain Fokker-Planck equation which is now a system of deterministic ordinary differential equations since the state space is discrete N (t) = 0, 1, . . . , K rather than continuous R n . In recent years, there have been growing research interests in general graph-based Fokker-Planck equations to study problems related to optimal transport on finite graphs [4,11,5]. In the present paper, however, our goal is to find the Fokker-Planck equation that governs ρ k (t) in (1), and solve for these ρ k (t) to obtain influence µ(t) using (2). We also analyze how the coefficient errors in Fokker-Planck equation affect accuracy of predicting µ(t) using this approach.

Related Work
Previous study of influence estimation on networks is mainly restricted to statistically homogeneous and wellmixed populations, particularly in the context of statistical properties of dynamical processes on complex networks in physics. A comprehensive survey is provided in [22]. The typical approach is based on mean-field approximation (MFA) to establish a system of differential equations for the compartment model which groups nodes with statistically identical properties into one. For example, degree-based MFA groups nodes of the same degree which are considered to have identical behavior statistically, and hence can significantly reduce the size of the system [2]. Pair approximation includes the joint distribution in the system of equations, which essentially applies moment closure after the joint distribution of paired nodes, and is shown to have improved accuracy over standard MFA [1,8,21]. Other generalization and improvements of MFA and pair approximation using compartment models and motif expansions can be found in [8,17,18,19,26], and references therein.
As noted in Section 1.1, our focus in this paper is instead on influence prediction (estimation) on deterministically heterogeneous networks particularly in non-equilibrium stage, which is significantly different from existing works including those mentioned above. For influence prediction in this setting, prototype MFA for Markov propagation model developed in [14,30] is generalized to arbitrary network topology [28], and then further extended to inhomogeneous activation and recovery rate between nodes [27]. The model adopts the MFA and first-order moment closure, i.e., substituting the joint distribution of two activated neighbor nodes by product of marginal distributions for individual nodes, to retain a feasible size of the derived system of differential equations. A second-(or higher-) order moment closure can be considered but the limitations and instability are discussed in [3]. Assuming absence of recovery, the exact solution is available due to the Markov property of propagation, however, its computational complexity increases drastically in terms of size and density of general networks [12]. As an alternative to solving for influence based on evolution equations, methods based on sampling propagations (also called cascades) and statistical learning technique are also developed, but often posing various requirements on input data and output results. For instance, a scalable computational method based on learning the coverage function of each node based on sampling and kernel estimation is developed, which can can only predict the influence at a prescribed time [10]. The work is further extended to estimate the time-varying intensity of propagation using similar coverage function idea [9]. Learning-based methods are usually companioned with a great amount of accuracy analysis based on classical theory of sampling complexity. However, the major problem with learning-based approaches is in the use of large amount of samplings to ensemble the unknown function or probability of interests but lack of a comprehensive understanding of the underlying dynamics and unique properties associated with the stochastic propagation on networks. Moreover, learning-based methods can have special assumptions on data which may not be realistic in real-world applications. To achieve moderate accuracy level in large-scale and complex network, learning-based methods require extensive amount of sampling/simulations which causes significant computational burden and hinders their applicability in real-world problems.

Proposed Method
In this section, we first derive the Fokker-Planck equation for the probabilities ρ k (t) of N (t) for the propagation model with exponentially distributed activation times. We provide two effective methods to estimate the coefficients in the Fokker-Planck equation for large heterogeneous networks. Then we establish the relation between the estimation error of the coefficients in the Fokker-Planck equation and the accuracy in the predicted influence for general propagation models using our approach.

The Fokker-Planck equation of ρ k (t)
Let G = (V, E) and {α ij : (i, j) ∈ E} (and {β i : i ∈ V } for self-activation and {γ i : i ∈ V } for recovery) be given and the source set S be chosen arbitrarily. The number of activated nodes, N (t), has K + 1 states corresponding to N (t) = 0, 1, . . . , K. Let M k denote the state that N (t) = k nodes in G are activated. Then, for the general SIS model with self-activation and recovery, the transitions between states of N (t) can be illustrated as follows, Here, q k (t) is the transition rate from M k to M k+1 and r k (t) is the rate from M k to M k−1 at time t, and they depend on the structure of G = (V, E), the activation parameters α ij (and β i and γ i for self-activation and recovery respectively), and the source set S.
Recall that ρ k (t) is the probability of N (t) being in state M k according to definition (1). Therefore, the time evolution of ρ k (t) is governed by the discrete Fokker-Planck equation with these q k (t) and r k (t) to be determined: To rewrite (4) into a concise matrix formulation, we define two (K + 1) × (K + 1) matrices Q(t) and R(t) as follows: and all other entries are zeros. Here [P ] j,l stands for the (j, l)-th entry of matrix P . Note that only the diagonal and superdiagonal (subdiagonal) entries of Q(t) (R(t)) are nonzeros, and [Q(t)] K+1,K+1 = [R(t)] 1,1 = 0 for all t. With matrices Q(t) and R(t) given above, we define a row (K + 1)-vector ρ(t) := (ρ 0 (t), ρ 1 (t), . . . , ρ K (t)) and rewrite (4) as The system (7) is consistent with the nature of process N (t) in (3) with a tridiagonal transition matrix Q(t) + R(t). The initial value ρ(0) can be easily determined given S: let |S| denote the cardinality of S, then ρ(0) is a binary (K + 1)-vector such that ρ |S| (0) = 1 and ρ k (0) = 0 for all k = |S|. Therefore, we can solve (4) for ρ(t) to obtain influence µ(t) based on (2) once the transition rates q k (t) and r k (t) are determined. The following subsection is devoted to the estimation of these rates.

Estimation of transition rates q k (t) and r k (t)
Recall that q k (t) stands for the transition rate of N (t) from M k to M k+1 as shown in (3). Namely, q k (t) is the instantaneous rate for the (k + 1)-th node to be activated given that there are currently k activated node (with numerous possible choices of such k nodes in V and q k (t) aggregates all the information) at time t. Similarly, r k (t) is the instantaneous rate for any of these k activated nodes to get recovered. Therefore, we focus on the estimation of q k (t) and a similar derivation can be easily carried out for r k (t).
The estimation of rate q k (t) consists of two factors: (i) the identities of the k currently activated nodes, and (ii) the instantaneous activation rate imposed by these k nodes to all the unactivated nodes at the time t. For factor (ii), the propagation model with exponentially distributed activation times yield constant instantaneous rates if the identities of the k nodes are given. For factor (i), q k (t) need to aggregate all K k possible combinations of k activated nodes. The following theorem provides the compositions of q k (t) and r k (t). Here we call U activated if all nodes in U are activated and the others in U c = U \ V are unactivated. The proof frequently calls two simple facts about selection probability given multiple instantaneous rates, which we provide as Propositions 1 and 2 in the Appendix for completeness. Theorem 1. For every k = 0, 1, . . . , K, let S k := {U ⊂ V : |U | = k} be the collection of all subsets of size k in V . Let Pr(t; U ) be the probability that U is activated among those in S k , and define Then the transition rates q k (t) and r k (t) in (4) are given by Proof. Suppose nodes in U ⊂ S k are currently activated, then these nodes tend to activate their neighbors still in U c independently and simultaneously. More precisely, node i ∈ U imposes a node-to-node activation rate to each of its unactivated neighbor j ∈ N out i ∩U c independently and simultaneously, and hence total rate is j∈N out i ∩U c α ij . Therefore, the combined instantaneous rate of all nodes in U for node-to-node activation is given by α(U ) in (8) according to Proposition 1. Meanwhile, each node i in U c tends to be self activated with rate β i and hence their total instantaneous self-activation rate is β(U ) given in (8). As Pr(t; U ) is the probability that nodes in U are activated, we obtain the total instantaneous rate q k (t) as for N (t) to transit from state M k to M k+1 as (9) according to Proposition 2. The derivation for r k (t) in (9) follows similarly.
Note that there are |S k | = K k possible combinations U and U ∈S k Pr(t; U ) = 1 for all t. Moreover, q k (t) is a convex combination of the instantaneous activation rates α(U ) + β(U c ) with weights given by Pr(t; U ) according to Theorem 1. Hence q k (t) is closer to the α(U ) + β(U c ) with larger Pr(t; U ). The composition of r k (t) in Theorem 1 has similar interpretation. Although it is not practical to obtain the probability Pr(t; U ) for all U , Theorem 1 suggests that we can approximate q k (t) and r k (t) using the activation and recovery rates of those U with large weight Pr(t; U ).
We now present two estimation methods and practical implementations using this idea for the case without self-activation and recovery (which essentially yields the standard susceptible-infection (SI) propagation model) on heterogeneous networks. In this case, we have q k (t) = U ∈S k α(U ) Pr(t; U ) and r k (t) = 0. Therefore, the key is to approximate q k (t) using α(U ) of few U with the largest probabilities Pr(t; U ).
Estimate q k based on the shortest distance. For every k = 1, . . . , K, we can easily determine a combination U * k with large Pr(t; U ) over all U in S k as follows: recall that the expected time for node i to activate j ∈ N out i is 1/α ij , it is therefore natural to define the distance from i to j as D(i, j) := 1/α ij , which can also be generated to the distance from set S to a node j as D(S, j) := min i∈S D(i, j). Due to independency of all node-to-node activations and property of exponential distributions, the set U * k consisting of the k nodes with shortest distance to source S has larger probability to be activated first among those in S k . In practical implementation, we apply Dijkstra's method [25] on the weighted graph G with edge weights given by 1/α ij and origin S, and then sort the nodes as i 1 , i 2 , . . . , i K with ascending distance from source S, i.e., D(S, i 1 ) ≤ D(S, i 2 ) ≤ · · · ≤ D(S, i K ) (if i ∈ S then D(S, i) = 0) and set U * k = {i 1 , . . . , i k } for k = 1, . . . , K. Then we approximate q k (t) byq k (t) = α(U * k ), which remains as constant for all t once the source set S is given. This method is referred to as FPE-dist in the numerical experiments.
Estimate q k based on the largest overall probabilities. To refine the approximation using single U * k in FPE-dist, we can estimate q k (t) using multiple combinations U with the largest probabilities. For a fixed S, we employ the the following recursive method to determine the sets {U 1 k , . . . , U m k k } ⊂ S k to be used in calculation of q k in (9) for k = 1, 2, . . . , K − 1. Here m k is a user-customized number of k-combinations selected from S k (larger m k yields more accurate approximation to q k at the expense of higher computation complexity.) Suppose we have already obtained U 1 k , . . . , U m k k for k such that Pr(U 1 k ) ≥ · · · ≥ Pr(U m k k ), our next step is to obtain U 1 k+1 , . . . , U m k+1 k+1 . To this end, we proceed with previous U l k in the order of l = 1, . . . , m k and compute α(j|U l k ) for every neighbor node j of U l k (by neighbor j of a subset U we meant that j ∈ N out i for some i ∈ U , and α(j|U ) := i∈U ∩N in j α ij is the total activation rate imposed to j by nodes in U .) By Proposition 1, neighbor j of U l k will be activated before other neighbors j with probability Pr(j|U l k ) = α(j|U l k )/ j α(j |U l k ) where the summation in the denominator is over all neighbors j of U l k . Therefore, Pr(U ) = Pr(j|U l k ) Pr(U l k ) for U := U l k ∪ {j} and T (U l k ) ≤ t j . Note that each neighbor j of U l k yields such a U of size k + 1. All these U 's are then candidates for U 1 k+1 , . . . , U m k+1 k+1 later. We proceed with each U l k in the aforementioned way for l = 1, . . . , m k and obtain a number of sets U 's with probabilities Pr(U ). Note that if two or more of these U 's are identical, then we keep only one of them and merge their probabilities Pr(U ). Then we sort these U 's with Pr(U ) in descending order and only keep the first m k+1 as U 1 k+1 , . . . , U m k+1 k+1 . By this method, we are likely (but not guaranteed) to maintain a list {U 1 k , . . . , U m k k } with largest probabilities among all those in S k for each k. Then we approximate q k (t) bŷ q k (t) := m k l=1 α(U l k ) Pr(U l k ) which is again constant for all t. This method essentially constructs a branching tree with K + 1 layers, where layer k consists of m k nodes U 1 k , . . . , U m k k each having a relative probability in its layer, and others with small probabilities in S k are removed so that computation complexity is maintained within a feasible scale. We refer this method to as FPE-tree in the numerical experiments.
Once we obtained the estimateq k (t), the last step is to solve the Fokker-Planck equation ρ (t) = ρ(t)Q(t) numerically. There are two straightforward methods to compute ρ(t): the Runge-Kutta method which can handle time varying Q(t) and very large K (with computation complexity O(K)) but needs to proceed the computation starting from t = 0; and direct computation of ρ(t) = ρ(0)e t 0 Q(s)ds with bidiagonal matrix Q. In particular, if Q is constant, then the computation ρ(t) = ρ(0)e tQ is very fast using matrix exponential [20,24,31] and can be directly done for any specific t > 0 rather than from t = 0.
The steps for influence prediction using Fokker-Planck equation (4) is summarized in Algorithm 1. For completeness we include the self-activation and recovery rates. (9) and form matrices Q(t), R(t) as in (5)-(6).

Error estimate for influence prediction
In this section, we conduct error analysis of the proposed influence prediction method. For simplicity, we consider the case without recovery scenario, and assume that the propagation starts with self-activation, i.e., ρ(0) = (1, 0, . . . , 0) ∈ R K+1 , since derivations generalize to other initials trivially. It is worth noting that the results obtained in this section apply to any propagation model.

Experimental results
We first apply the proposed method to networks (with various sizes and parameters) generated by four models commonly used in social/biological/contact networking applications: Erdős-Rényi's random, smallworld, scale-free, and Kronecker network 2 . Activation rates {α ij } are drawn from interval (0, 1) uniformly to simulate the inhomogeneous propagation rates across edges. Unless otherwise noted, we only consider node-to-node activations in propagations without self-activation and recovery. In all cases except those in Fig. 1, exact solutions for influence are computationally infeasible due to the large size and heterogeneous transmission rates between nodes, we therefore use enough Monte Carlo Markov chain (MCMC) simulated cascades (5000 cascades for each network) to compute the ground truth density ρ(t) and influence µ(t).
In Fig. 1, we show the performance of our method based on Fokker-Planck equation in Section 2.2 using FPE-dist and FPE-tree. The NIMFA (N-interwined mean field approximation) is a state-of-the-art method that uses mean-field theory to obtain a system of differential equations to calculate the probability p i (t) (node i gets activated at time t) [27,28], and estimates the influence by i p i (t). Note that we take a completely  different approach to calculate the probability ρ k (t) for each possible influence size k and estimate the influence by k kρ k (t). For the influence prediction test, we find that our approach appears to be more accurate as shown in Fig. 1, especially FPE-tree which matches ground truth (MCMC) very closely (but at the expense of higher computational cost to estimate transition rates q k (t)). The FPE-dist also provides reasonably accurate solution but requires much lower computational cost, hence we only use this version in other tests with large networks. Note that NIMFA requires solving a nonlinear system of K differential equations numerically and hence has the same order of computation complexity as our approach. In Fig. 2, we show the influence prediction result on networks of much larger size K = 1024. Despite of very different network structures, FPE-dist provides faithful influence prediction and matches ground truth (MCMC) closely.
Influence prediction problem is considered very challenging computationally, especially for dense networks. In Fig. 3 we test FPE-dist on very dense Erdős-Rényi's random networks of size K = 1024 where average degrees are (1/K) i |N out i | = 32, 64, and 128 respectively. On all of these networks, FPE-dist returns highly accurate prediction of influence which justifies its robustness.
The influence prediction problem considered in this paper, as noted in Section 1.1, is significantly different from those for dynamical processes on on networks in statistical physics. Our network is deterministically heterogenous, meaning that G = (V, E) and α ij on all edges are given, and they play critical roles in propagations. Therefore, the identities of nodes in source set S matter greatly (in contrary the nodes in a network are not distinguishable in most statistical physics problems) which leads to many important followup questions such as influence maximization (e.g., finding the source set S that solves max |S|≤k0 µ(t; S) for some prescribed size k 0 ∈ N and time t) [6,12,13,29] and outbreak detection [7,16]. To see the critical role of source set S, we apply FPE-dist to three different choices of source set S 1 , S 2 , S 3 all with |S i | = 10 and show the prediction results in the middle panel of Fig. 3. Here S 1 is the choice obtained by the influence maximization function from ConTinEst code [10], S 2 consists of the ten nodes with largest degrees in G, and S 3 contains ten nodes randomly chosen from the network. The plots clearly show different influences of these sources sets S i 's due to the deterministically heterogeneous structure of the network. Nevertheless, FPE-dist has very robust performance and matches the ground truths (MCMC) closely in every case.
We also compare FPE-dist to the state-of-the-arts learning-based ConTinEst algorithm [10]. The network data and its implementation are obtained from the ConTinEst package published by its authors 3 . ConTinEst is a state-of-the-arts learning-based algorithm that uses parametrized kernel functions to approximate the coverage of each node based on Monte Carlo samplings. The result is shown in the right panel of Fig. 3. From this test, we see that FPE-dist is very accurate as it matches the ground truth (MCMC) much better. Moreover, ConTinEst takes excessively long time to estimate influence for denser networks as those in the left panel of Fig. 3, while FPE-dist still works robustly without suffering the issue at all. Note that comprehensive comparison of ConTinEst with several other existing methods is reported in [10], from which significant improvement in accuracy of the proposed method FPE-dist can be projected.
We established the relation between estimation error in {q k (t)} and the prediction error in µ(t) in Section 2.3. To check this numerically, we apply FPE-dist to a dense Erdős-Rényi's network of size K = 300 and average degree (1/K) i |N out i | = 150 (α ij again drawn from (0, 1) uniformly) with source set S = {1, . . . , 10}, and check the estimated q k (t), ρ k (t) and µ(t) with those obtained by MCMC simulated cascades. Recall that FPE-dist uses very crude estimate of q k (t) by setting a constantq k = α(U * k ) where U * k contains the k nodes of shortest distance from S in Section 2.2. We first plot theq k for k = 10, 70, 130, 190 and compare with q k (t) given by ground truth (MCMC simulations) in the top row of Fig. 4. To this end, we observe from (4) that q k (t) = −( k j=0 ρ j (t)) /ρ k (t), so we obtain ρ k (t) from MCMC simulations and apply finite difference to get ρ k (t) and hence q k (t). Note that q k (t) = 0 for most t because k j=0 ρ (t) or ρ k (t) vanish there and obtaining these q k (t) is unstable numerically, so the comparison is only meaningful for t where ρ k (t) is away from zero. From top row of Fig. 4, we can see the estimatedq k appear to accurately captured the mean of q k (t) but can be quite deviated (i.e., with large |q k (t) − q k (t)|/q k (t)). However, the densitiesρ k (t) computed using theseq k are still close to the ground truth ρ k (t), as shown by the small relative error |ρ k (t) − ρ k (t)|/ρ k (t) in the bottom leftmost panel of Fig. 4. This also yields a small relative error in influence prediction |μ(t) − µ(t)|/µ(t) (second on bottom row), and close match of prediction result µ(t) and ground truth µ(t) (MCMC) (third on bottom row) in Fig. 4. The small errors inρ k (t) andμ(t) in our numerical tests suggest that the theoretical bound on the estimation error in q k (t) in (12) may be further relaxed without degrading solution quality.
To show the great potential of the proposed method for influence prediction on large sized networks, we plot the CPU time (in seconds) for solving the Fokker-Planck equation (4) numerically using MATLAB with single core computation on a regular desktop computer (Intel Core 3.4GHz CPU) in the bottom rightmost panel of Fig. 4. In contrast, most state-of-the-arts learning-based approaches suffer drastic increase of computational cost for larger or denser networks due to the significantly amplified number of simulations required to achieve acceptable level of accuracy [10]. On the other hand, the proposed method possesses low computation complexity and is scalable for large and dense networks. Bottom row from left to right: |ρ k (t) − ρ k (t)|/ρ k (t); |μ(t) − µ(t)|/µ(t) and plot of µ(t) andμ(t) for this K = 300 network; and CPU time (in seconds) of FPE-dist using Runge-Kutta 4th order ODE solver on networks with K range from 10 4 to 10 8 .

Concluding remarks
We consider the important influence (expected number of activated nodes) prediction problem on general heterogeneous networks. The problem is significantly different from those in classical mathematical epidemics theory where individual contact network is not considered nor those in statistical physics where networks are statistically homogeneous and nodes are not exactly distinguishable. In our problem, the influence depends on the following factors which all play critical roles in computations: the structure of network (directed graph) G = (V, E), the activation rates {α ij } between every pair of nodes i and j (and self-activation rates {β i } and recovery rates {γ i } if applicable), and the source set S. In this paper, we proposed a novel approach by calculating the probability ρ k (t) (k nodes are activated at time t) for all influence sizes k to obtain influence µ(t) = k kρ k (t). To this end, we establish the Fokker-Planck equation as a system of deterministic differential equations that governs the dynamical evolution of {ρ k (t)}. We provide a few instances for estimating the coefficients in the Fokker-Planck equations, and establish the relation between coefficient estimation error and the final influence prediction error, which apply to all types of propagation models on general networks. We conducted a number of numerical experiments which justify the very promising performance of the proposed approach in terms of accuracy, efficiency and robustness. Our novel approach also gives rise to a number of new research problems. For example: How to approximate the transition rates q k and r k accurately for general propagation models (e.g., activation time is not exponentially distributed and hence the propagation is not Markov)? How to apply the Fokker-Planck equation approach to influence prediction when only propagation cascade data is available (i.e., only the activation times and identities are observed during a number of propagations but not the actual network G = (V, E) and/or activation parameters in practice)? These problems are important from both of theoretical and practical points of view, and we plan to investigate them in our future research. rate of point process associated to time T Y is α T Y (t) = ( n i=1 p i α i e −αit )/( n i=1 p i e −αit ). In particular, α T Y (0) = n i=1 p i α i . Proof. We use the rule of total probability to obtain Hence the cumulative distribution function of T Y is F T Y (t) = 1 − Pr(T Y ≥ t) and probability density function is f T Y (t) = F T Y (t) = n i=1 p i α i e −αit . The instantaneous hazard rate is then given by α T Y (t) = f T Y (t)/ Pr(T Y ≥ t).