Destination |
|
||||||
Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.
Citation: |
Figure 1. Cartoon example of the Chinese Restaurant metaphor for the Pitman-Yor process, where $ C_j $ represents the $ j $th customer and $ x_j^\star $ is the unique dish served at table $ T_j $. The vector $ \boldsymbol x_n = (x_1, \dots, x_n) $ denotes the observed sequence of dishes eaten by each customer. Customer $ C_i $ being seated at table $ T_j $ is denoted $ C_i\to T_j $. Then, for example, the tenth customer will sit at $ T_1 $ with probability $ (3-d)/(\alpha+9) $
Figure 5. Plot of the corrected $ K_n $ (a), $ H_{1n} $ (b) and their ratio $ H_{1n}/K_n $ (c) as a function of $ n $ for the connections to the destination computer ${\mathtt C1438} $ , and averaged sample paths from $ 100 $ samples from a Pitman-Yor process, obtained using different estimates of the parameters. The grey lines correspond to 10 realised trajectories of simulated Pitman-Yor processes, obtained using the corrected estimate (5)
Table 1. Estimated Pitman-Yor parameters for 6 destination nodes
Destination |
|
||||||
Table 2. Anomaly rankings for the four red-team source computers
Events |
Events |
||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Event level combiner: Tippett. | ||||||
Mid |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Event level combiner: Fisher. | Event level combiner: Fisher. | ||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
[1] | D. Aldous, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII-1983, 1117 (1985), 1–198. doi: 10.1007/BFb0099421. |
[2] | A. L. Barabási, The origin of bursts and heavy tails in human dynamics, Nature, 435 (2005), 207-211. |
[3] | Y. Bengio, R. Ducharme, P. Vincent and C. Janvin, A neural probabilistic language model, Journal of Machine Learning Research, 3 (2003), 1137-1155. |
[4] | W. Buntine and M. Hutter, A Bayesian view of the Poisson-Dirichlet process, preprint, arXiv: 1007.0296. |
[5] | C. Chen, L. Du and W. Buntine, Sampling table configurations for the hierarchical Poisson-Dirichlet process, in Machine Learning and Knowledge Discovery in Databases (eds. D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis), Springer Berlin Heidelberg, 2011, 296–311. doi: 10.1007/978-3-642-23780-5_29. |
[6] | T. S. Ferguson, A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1 (1973), 209-230. doi: 10.1214/aos/1176342360. |
[7] | R. A. Fisher, Statistical Methods for Research Workers, Fourteenth edition. Hafner Publishing Co., New York, 1973. |
[8] | A. Goldenberg, A. X. Zheng, S. E. Fienberg and E. M. Airoldi, A survey of statistical network models, Foundations and Trends in Machine Learning, 2 (2009), 129-233. |
[9] | S. Goldwater, T. L. Griffiths and M. Johnson, Interpolating between types and tokens by estimating power-law generators, in Proceedings of the 18th International Conference on Neural Information Processing Systems, MIT Press, 2005, 459–466. |
[10] | N. A. Heard and P. Rubin-Delanchy, Network-wide anomaly detection via the Dirichlet process, in Proceedings of the IEEE workshop on Big Data Analytics for Cyber-security Computing, 2016. |
[11] | N. A. Heard and P. Rubin-Delanchy, Choosing between methods of combining $p$-values, Biometrika, 105 (2018), 239-246. doi: 10.1093/biomet/asx076. |
[12] | H. Ishwaran and L. F. James, Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96 (2001), 161-173. doi: 10.1198/016214501750332758. |
[13] | D. Jurafsky, J. H. Martin, P. Norvig and S. Russell, Speech and Language Processing, Pearson Education, 2014. |
[14] | A. D. Kent, Cybersecurity data sources for dynamic network research, in Dynamic Networks and Cyber-Security, World Scientific, 2016. |
[15] | H. Lancaster, Statistical control of counting experiments, Biometrika, 39 (1952), 419-422. |
[16] | Y. Lv and C. X. Zhai, Positional language models for information retrieval, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2009,299–306. doi: 10.1145/1571941.1571994. |
[17] | C. Matias and V. Miele, Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), 1119-1141. doi: 10.1111/rssb.12200. |
[18] | T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, Recurrent neural network based language model, in Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), International Speech Communication Association, 2010, 1045–1048. |
[19] | M. E. J. Newman, Power laws, Pareto distributions and Zipf's law, Contemporary Physics, 46 (2005), 323-351. doi: 10.1080/00107510500052444. |
[20] | K. Pearson, On a method of determining whether a sample of size $n$ supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25 (1933), 379-410. |
[21] | P. O. Perry and P. J. Wolfe, Point process modelling for directed interaction networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75 (2013), 821-849. doi: 10.1111/rssb.12013. |
[22] | J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. |
[23] | J. Pitman and M. Yor, The two-parameter Poisson-Dirichlet distribution derived from a stable sub-ordinator, Annals of Probability, 25 (1997), 855-900. doi: 10.1214/aop/1024404422. |
[24] | M. Price-Williams and N. A. Heard, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, 2019, 1–12. doi: 10.1007/s11222-019-09875-z. |
[25] | R. Rosenfeld, A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, 10 (1996), 187-228. doi: 10.1006/csla.1996.0011. |
[26] | P. Rubin-Delanchy, N. A. Heard and D. J. Lawson, Meta analysis of mid-$p$-values: some new results based on the convex order, Journal of the American Statistical Association, 2018. doi: 10.1080/01621459.2018.1469994. |
[27] | B. W. Silverman, Density Estimation, London: Chapman and Hall, 1986. |
[28] | S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams, The American Soldier. Adjustment During Army Life, Princeton, New Jersey: Princeton University Press, 1949. |
[29] | W. Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association of Computational Linguistics, 2006,985–992. doi: 10.3115/1220175.1220299. |
[30] | L. H. C. Tippett, The Methods of Statistics, 4th ed. John Wiley & Sons, Inc., New York, N. Y.; Williams & Norgate, Ltd., London, 1952. |
[31] | H. M. Wallach, S. T. Jensen, L. Dicker and K. A. Heller, An alternative prior process for nonparametric Bayesian clustering, in Proceedings of the Thireenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010,892–899. |
Cartoon example of the Chinese Restaurant metaphor for the Pitman-Yor process, where
Kernel density estimates of the parameter estimates from 10000 simulations,
Kernel density estimates of the parameter estimates from 10000 simulations,
Uniform Q-Q plot for the Pitman-Yor process fitted to six destination nodes
Plot of the corrected