
-
Previous Article
Power weighted shortest paths for clustering Euclidean data
- FoDS Home
- This Issue
-
Next Article
Learning by active nonlinear diffusion
Modelling dynamic network evolution as a Pitman-Yor process
Department of Mathematics, Imperial College London, 180 Queen's Gate, London – SW7 2AZ, United Kingdom |
Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.
References:
[1] |
D. Aldous, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII-1983, 1117 (1985), 1–198.
doi: 10.1007/BFb0099421. |
[2] |
A. L. Barabási,
The origin of bursts and heavy tails in human dynamics, Nature, 435 (2005), 207-211.
|
[3] |
Y. Bengio, R. Ducharme, P. Vincent and C. Janvin,
A neural probabilistic language model, Journal of Machine Learning Research, 3 (2003), 1137-1155.
|
[4] |
W. Buntine and M. Hutter, A Bayesian view of the Poisson-Dirichlet process, preprint, arXiv: 1007.0296. |
[5] |
C. Chen, L. Du and W. Buntine, Sampling table configurations for the hierarchical Poisson-Dirichlet process, in Machine Learning and Knowledge Discovery in Databases (eds. D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis), Springer Berlin Heidelberg, 2011, 296–311.
doi: 10.1007/978-3-642-23780-5_29. |
[6] |
T. S. Ferguson,
A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1 (1973), 209-230.
doi: 10.1214/aos/1176342360. |
[7] |
R. A. Fisher, Statistical Methods for Research Workers, Fourteenth edition. Hafner Publishing Co., New York, 1973. |
[8] |
A. Goldenberg, A. X. Zheng, S. E. Fienberg and E. M. Airoldi,
A survey of statistical network models, Foundations and Trends in Machine Learning, 2 (2009), 129-233.
|
[9] |
S. Goldwater, T. L. Griffiths and M. Johnson, Interpolating between types and tokens by
estimating power-law generators, in Proceedings of the 18th International Conference on
Neural Information Processing Systems, MIT Press, 2005, 459–466. |
[10] |
N. A. Heard and P. Rubin-Delanchy, Network-wide anomaly detection via the Dirichlet process, in Proceedings of the IEEE workshop on Big Data Analytics for Cyber-security Computing, 2016. |
[11] |
N. A. Heard and P. Rubin-Delanchy,
Choosing between methods of combining $p$-values, Biometrika, 105 (2018), 239-246.
doi: 10.1093/biomet/asx076. |
[12] |
H. Ishwaran and L. F. James,
Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96 (2001), 161-173.
doi: 10.1198/016214501750332758. |
[13] |
D. Jurafsky, J. H. Martin, P. Norvig and S. Russell, Speech and Language Processing, Pearson Education, 2014. |
[14] |
A. D. Kent, Cybersecurity data sources for dynamic network research, in Dynamic Networks and Cyber-Security, World Scientific, 2016. |
[15] |
H. Lancaster,
Statistical control of counting experiments, Biometrika, 39 (1952), 419-422.
|
[16] |
Y. Lv and C. X. Zhai, Positional language models for information retrieval, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2009,299–306.
doi: 10.1145/1571941.1571994. |
[17] |
C. Matias and V. Miele,
Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), 1119-1141.
doi: 10.1111/rssb.12200. |
[18] |
T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, Recurrent neural network based language model, in Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), International Speech Communication Association, 2010, 1045–1048. |
[19] |
M. E. J. Newman,
Power laws, Pareto distributions and Zipf's law, Contemporary Physics, 46 (2005), 323-351.
doi: 10.1080/00107510500052444. |
[20] |
K. Pearson,
On a method of determining whether a sample of size $n$ supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25 (1933), 379-410.
|
[21] |
P. O. Perry and P. J. Wolfe,
Point process modelling for directed interaction networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75 (2013), 821-849.
doi: 10.1111/rssb.12013. |
[22] |
J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. |
[23] |
J. Pitman and M. Yor,
The two-parameter Poisson-Dirichlet distribution derived from a stable sub-ordinator, Annals of Probability, 25 (1997), 855-900.
doi: 10.1214/aop/1024404422. |
[24] |
M. Price-Williams and N. A. Heard, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, 2019, 1–12.
doi: 10.1007/s11222-019-09875-z. |
[25] |
R. Rosenfeld,
A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, 10 (1996), 187-228.
doi: 10.1006/csla.1996.0011. |
[26] |
P. Rubin-Delanchy, N. A. Heard and D. J. Lawson, Meta analysis of mid-$p$-values: some new results based on the convex order, Journal of the American Statistical Association, 2018.
doi: 10.1080/01621459.2018.1469994. |
[27] |
B. W. Silverman, Density Estimation, London: Chapman and Hall, 1986. |
[28] |
S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams, The American
Soldier. Adjustment During Army Life, Princeton, New Jersey: Princeton University Press,
1949. |
[29] |
W. Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association of Computational Linguistics, 2006,985–992.
doi: 10.3115/1220175.1220299. |
[30] |
L. H. C. Tippett, The Methods of Statistics, 4th ed. John Wiley & Sons, Inc., New York, N. Y.; Williams & Norgate, Ltd., London, 1952. |
[31] |
H. M. Wallach, S. T. Jensen, L. Dicker and K. A. Heller, An alternative prior process for nonparametric Bayesian clustering, in Proceedings of the Thireenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010,892–899. |
show all references
References:
[1] |
D. Aldous, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII-1983, 1117 (1985), 1–198.
doi: 10.1007/BFb0099421. |
[2] |
A. L. Barabási,
The origin of bursts and heavy tails in human dynamics, Nature, 435 (2005), 207-211.
|
[3] |
Y. Bengio, R. Ducharme, P. Vincent and C. Janvin,
A neural probabilistic language model, Journal of Machine Learning Research, 3 (2003), 1137-1155.
|
[4] |
W. Buntine and M. Hutter, A Bayesian view of the Poisson-Dirichlet process, preprint, arXiv: 1007.0296. |
[5] |
C. Chen, L. Du and W. Buntine, Sampling table configurations for the hierarchical Poisson-Dirichlet process, in Machine Learning and Knowledge Discovery in Databases (eds. D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis), Springer Berlin Heidelberg, 2011, 296–311.
doi: 10.1007/978-3-642-23780-5_29. |
[6] |
T. S. Ferguson,
A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1 (1973), 209-230.
doi: 10.1214/aos/1176342360. |
[7] |
R. A. Fisher, Statistical Methods for Research Workers, Fourteenth edition. Hafner Publishing Co., New York, 1973. |
[8] |
A. Goldenberg, A. X. Zheng, S. E. Fienberg and E. M. Airoldi,
A survey of statistical network models, Foundations and Trends in Machine Learning, 2 (2009), 129-233.
|
[9] |
S. Goldwater, T. L. Griffiths and M. Johnson, Interpolating between types and tokens by
estimating power-law generators, in Proceedings of the 18th International Conference on
Neural Information Processing Systems, MIT Press, 2005, 459–466. |
[10] |
N. A. Heard and P. Rubin-Delanchy, Network-wide anomaly detection via the Dirichlet process, in Proceedings of the IEEE workshop on Big Data Analytics for Cyber-security Computing, 2016. |
[11] |
N. A. Heard and P. Rubin-Delanchy,
Choosing between methods of combining $p$-values, Biometrika, 105 (2018), 239-246.
doi: 10.1093/biomet/asx076. |
[12] |
H. Ishwaran and L. F. James,
Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96 (2001), 161-173.
doi: 10.1198/016214501750332758. |
[13] |
D. Jurafsky, J. H. Martin, P. Norvig and S. Russell, Speech and Language Processing, Pearson Education, 2014. |
[14] |
A. D. Kent, Cybersecurity data sources for dynamic network research, in Dynamic Networks and Cyber-Security, World Scientific, 2016. |
[15] |
H. Lancaster,
Statistical control of counting experiments, Biometrika, 39 (1952), 419-422.
|
[16] |
Y. Lv and C. X. Zhai, Positional language models for information retrieval, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2009,299–306.
doi: 10.1145/1571941.1571994. |
[17] |
C. Matias and V. Miele,
Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), 1119-1141.
doi: 10.1111/rssb.12200. |
[18] |
T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, Recurrent neural network based language model, in Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), International Speech Communication Association, 2010, 1045–1048. |
[19] |
M. E. J. Newman,
Power laws, Pareto distributions and Zipf's law, Contemporary Physics, 46 (2005), 323-351.
doi: 10.1080/00107510500052444. |
[20] |
K. Pearson,
On a method of determining whether a sample of size $n$ supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25 (1933), 379-410.
|
[21] |
P. O. Perry and P. J. Wolfe,
Point process modelling for directed interaction networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75 (2013), 821-849.
doi: 10.1111/rssb.12013. |
[22] |
J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. |
[23] |
J. Pitman and M. Yor,
The two-parameter Poisson-Dirichlet distribution derived from a stable sub-ordinator, Annals of Probability, 25 (1997), 855-900.
doi: 10.1214/aop/1024404422. |
[24] |
M. Price-Williams and N. A. Heard, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, 2019, 1–12.
doi: 10.1007/s11222-019-09875-z. |
[25] |
R. Rosenfeld,
A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, 10 (1996), 187-228.
doi: 10.1006/csla.1996.0011. |
[26] |
P. Rubin-Delanchy, N. A. Heard and D. J. Lawson, Meta analysis of mid-$p$-values: some new results based on the convex order, Journal of the American Statistical Association, 2018.
doi: 10.1080/01621459.2018.1469994. |
[27] |
B. W. Silverman, Density Estimation, London: Chapman and Hall, 1986. |
[28] |
S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams, The American
Soldier. Adjustment During Army Life, Princeton, New Jersey: Princeton University Press,
1949. |
[29] |
W. Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association of Computational Linguistics, 2006,985–992.
doi: 10.3115/1220175.1220299. |
[30] |
L. H. C. Tippett, The Methods of Statistics, 4th ed. John Wiley & Sons, Inc., New York, N. Y.; Williams & Norgate, Ltd., London, 1952. |
[31] |
H. M. Wallach, S. T. Jensen, L. Dicker and K. A. Heller, An alternative prior process for nonparametric Bayesian clustering, in Proceedings of the Thireenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010,892–899. |





Destination |
|
||||||
Destination |
|
||||||
Events |
Events |
||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Event level combiner: Tippett. | ||||||
Mid |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Event level combiner: Fisher. | Event level combiner: Fisher. | ||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Events |
||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Event level combiner: Tippett. | ||||||
Mid |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Event level combiner: Fisher. | Event level combiner: Fisher. | ||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
[1] |
Nana Xu, Jun Sun, Jingjing Liu, Xianchao Xiu. A novel scheme for multivariate statistical fault detection with application to the Tennessee Eastman process. Mathematical Foundations of Computing, 2021, 4 (3) : 167-184. doi: 10.3934/mfc.2021010 |
[2] |
Edward S. Canepa, Alexandre M. Bayen, Christian G. Claudel. Spoofing cyber attack detection in probe-based traffic monitoring systems using mixed integer linear programming. Networks and Heterogeneous Media, 2013, 8 (3) : 783-802. doi: 10.3934/nhm.2013.8.783 |
[3] |
Vladimir Turetsky, Valery Y. Glizer. Optimal decision in a Statistical Process Control with cubic loss. Journal of Industrial and Management Optimization, 2022, 18 (4) : 2903-2926. doi: 10.3934/jimo.2021096 |
[4] |
Fabien Caubet, Carlos Conca, Matías Godoy. On the detection of several obstacles in 2D Stokes flow: Topological sensitivity and combination with shape derivatives. Inverse Problems and Imaging, 2016, 10 (2) : 327-367. doi: 10.3934/ipi.2016003 |
[5] |
Dominique Zosso, Jing An, James Stevick, Nicholas Takaki, Morgan Weiss, Liane S. Slaughter, Huan H. Cao, Paul S. Weiss, Andrea L. Bertozzi. Image segmentation with dynamic artifacts detection and bias correction. Inverse Problems and Imaging, 2017, 11 (3) : 577-600. doi: 10.3934/ipi.2017027 |
[6] |
Hirotada Honda. On a model of target detection in molecular communication networks. Networks and Heterogeneous Media, 2019, 14 (4) : 633-657. doi: 10.3934/nhm.2019025 |
[7] |
Meenakshi Kansal, Ratna Dutta, Sourav Mukhopadhyay. Group signature from lattices preserving forward security in dynamic setting. Advances in Mathematics of Communications, 2020, 14 (4) : 535-553. doi: 10.3934/amc.2020027 |
[8] |
Alessia Marigo, Benedetto Piccoli. A model for biological dynamic networks. Networks and Heterogeneous Media, 2011, 6 (4) : 647-663. doi: 10.3934/nhm.2011.6.647 |
[9] |
Alexis Arnaudon, So Takao. Networks of coadjoint orbits: From geometric to statistical mechanics. Journal of Geometric Mechanics, 2019, 11 (4) : 447-485. doi: 10.3934/jgm.2019023 |
[10] |
Valery Y. Glizer, Vladimir Turetsky, Emil Bashkansky. Statistical process control optimization with variable sampling interval and nonlinear expected loss. Journal of Industrial and Management Optimization, 2015, 11 (1) : 105-133. doi: 10.3934/jimo.2015.11.105 |
[11] |
Jérôme Renault. General limit value in dynamic programming. Journal of Dynamics and Games, 2014, 1 (3) : 471-484. doi: 10.3934/jdg.2014.1.471 |
[12] |
Xianchao Xiu, Ying Yang, Wanquan Liu, Lingchen Kong, Meijuan Shang. An improved total variation regularized RPCA for moving object detection with dynamic background. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1685-1698. doi: 10.3934/jimo.2019024 |
[13] |
Manel Hmimida, Rushed Kanawati. Community detection in multiplex networks: A seed-centric approach. Networks and Heterogeneous Media, 2015, 10 (1) : 71-85. doi: 10.3934/nhm.2015.10.71 |
[14] |
Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems and Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045 |
[15] |
Michele La Rocca, Cira Perna. Designing neural networks for modeling biological data: A statistical perspective. Mathematical Biosciences & Engineering, 2014, 11 (2) : 331-342. doi: 10.3934/mbe.2014.11.331 |
[16] |
Stamatios Katsikas, Vassilli Kolokoltsov. Evolutionary, mean-field and pressure-resistance game modelling of networks security. Journal of Dynamics and Games, 2019, 6 (4) : 315-335. doi: 10.3934/jdg.2019021 |
[17] |
Genlong Guo, Shoude Li. A dynamic analysis of a monopolist's quality improvement, process innovation and goodwill. Journal of Industrial and Management Optimization, 2022 doi: 10.3934/jimo.2022014 |
[18] |
Joseph D. Skufca, Erik M. Bollt. Communication and Synchronization in Disconnected Networks with Dynamic Topology: Moving Neighborhood Networks. Mathematical Biosciences & Engineering, 2004, 1 (2) : 347-359. doi: 10.3934/mbe.2004.1.347 |
[19] |
M. B. Short, G. O. Mohler, P. J. Brantingham, G. E. Tita. Gang rivalry dynamics via coupled point process networks. Discrete and Continuous Dynamical Systems - B, 2014, 19 (5) : 1459-1477. doi: 10.3934/dcdsb.2014.19.1459 |
[20] |
Qian Zhang, Huaicheng Yan, Jun Cheng, Xisheng Zhan, Kaibo Shi. Fault detection filtering for continuous-time singular systems under a dynamic event-triggered mechanism. Discrete and Continuous Dynamical Systems - S, 2022 doi: 10.3934/dcdss.2022023 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]