
-
Previous Article
Power weighted shortest paths for clustering Euclidean data
- FoDS Home
- This Issue
-
Next Article
Learning by active nonlinear diffusion
Modelling dynamic network evolution as a Pitman-Yor process
Department of Mathematics, Imperial College London, 180 Queen's Gate, London – SW7 2AZ, United Kingdom |
Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.
References:
[1] |
D. Aldous, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII-1983, 1117 (1985), 1–198.
doi: 10.1007/BFb0099421. |
[2] |
A. L. Barabási, The origin of bursts and heavy tails in human dynamics, Nature, 435 (2005), 207-211. Google Scholar |
[3] |
Y. Bengio, R. Ducharme, P. Vincent and C. Janvin, A neural probabilistic language model, Journal of Machine Learning Research, 3 (2003), 1137-1155. Google Scholar |
[4] |
W. Buntine and M. Hutter, A Bayesian view of the Poisson-Dirichlet process, preprint, arXiv: 1007.0296. Google Scholar |
[5] |
C. Chen, L. Du and W. Buntine, Sampling table configurations for the hierarchical Poisson-Dirichlet process, in Machine Learning and Knowledge Discovery in Databases (eds. D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis), Springer Berlin Heidelberg, 2011, 296–311.
doi: 10.1007/978-3-642-23780-5_29. |
[6] |
T. S. Ferguson,
A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1 (1973), 209-230.
doi: 10.1214/aos/1176342360. |
[7] |
R. A. Fisher, Statistical Methods for Research Workers, Fourteenth edition. Hafner Publishing Co., New York, 1973. |
[8] |
A. Goldenberg, A. X. Zheng, S. E. Fienberg and E. M. Airoldi, A survey of statistical network models, Foundations and Trends in Machine Learning, 2 (2009), 129-233. Google Scholar |
[9] |
S. Goldwater, T. L. Griffiths and M. Johnson, Interpolating between types and tokens by estimating power-law generators, in Proceedings of the 18th International Conference on Neural Information Processing Systems, MIT Press, 2005, 459–466. Google Scholar |
[10] |
N. A. Heard and P. Rubin-Delanchy, Network-wide anomaly detection via the Dirichlet process, in Proceedings of the IEEE workshop on Big Data Analytics for Cyber-security Computing, 2016. Google Scholar |
[11] |
N. A. Heard and P. Rubin-Delanchy,
Choosing between methods of combining $p$-values, Biometrika, 105 (2018), 239-246.
doi: 10.1093/biomet/asx076. |
[12] |
H. Ishwaran and L. F. James,
Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96 (2001), 161-173.
doi: 10.1198/016214501750332758. |
[13] |
D. Jurafsky, J. H. Martin, P. Norvig and S. Russell, Speech and Language Processing, Pearson Education, 2014. Google Scholar |
[14] |
A. D. Kent, Cybersecurity data sources for dynamic network research, in Dynamic Networks and Cyber-Security, World Scientific, 2016. Google Scholar |
[15] |
H. Lancaster, Statistical control of counting experiments, Biometrika, 39 (1952), 419-422. Google Scholar |
[16] |
Y. Lv and C. X. Zhai, Positional language models for information retrieval, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2009,299–306.
doi: 10.1145/1571941.1571994. |
[17] |
C. Matias and V. Miele,
Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), 1119-1141.
doi: 10.1111/rssb.12200. |
[18] |
T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, Recurrent neural network based language model, in Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), International Speech Communication Association, 2010, 1045–1048. Google Scholar |
[19] |
M. E. J. Newman,
Power laws, Pareto distributions and Zipf's law, Contemporary Physics, 46 (2005), 323-351.
doi: 10.1080/00107510500052444. |
[20] |
K. Pearson, On a method of determining whether a sample of size $n$ supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25 (1933), 379-410. Google Scholar |
[21] |
P. O. Perry and P. J. Wolfe,
Point process modelling for directed interaction networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75 (2013), 821-849.
doi: 10.1111/rssb.12013. |
[22] |
J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. |
[23] |
J. Pitman and M. Yor,
The two-parameter Poisson-Dirichlet distribution derived from a stable sub-ordinator, Annals of Probability, 25 (1997), 855-900.
doi: 10.1214/aop/1024404422. |
[24] |
M. Price-Williams and N. A. Heard, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, 2019, 1–12.
doi: 10.1007/s11222-019-09875-z. |
[25] |
R. Rosenfeld,
A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, 10 (1996), 187-228.
doi: 10.1006/csla.1996.0011. |
[26] |
P. Rubin-Delanchy, N. A. Heard and D. J. Lawson, Meta analysis of mid-$p$-values: some new results based on the convex order, Journal of the American Statistical Association, 2018.
doi: 10.1080/01621459.2018.1469994. |
[27] |
B. W. Silverman, Density Estimation, London: Chapman and Hall, 1986. |
[28] |
S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams, The American Soldier. Adjustment During Army Life, Princeton, New Jersey: Princeton University Press, 1949. Google Scholar |
[29] |
W. Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association of Computational Linguistics, 2006,985–992.
doi: 10.3115/1220175.1220299. |
[30] |
L. H. C. Tippett, The Methods of Statistics, 4th ed. John Wiley & Sons, Inc., New York, N. Y.; Williams & Norgate, Ltd., London, 1952. |
[31] |
H. M. Wallach, S. T. Jensen, L. Dicker and K. A. Heller, An alternative prior process for nonparametric Bayesian clustering, in Proceedings of the Thireenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010,892–899. Google Scholar |
show all references
References:
[1] |
D. Aldous, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII-1983, 1117 (1985), 1–198.
doi: 10.1007/BFb0099421. |
[2] |
A. L. Barabási, The origin of bursts and heavy tails in human dynamics, Nature, 435 (2005), 207-211. Google Scholar |
[3] |
Y. Bengio, R. Ducharme, P. Vincent and C. Janvin, A neural probabilistic language model, Journal of Machine Learning Research, 3 (2003), 1137-1155. Google Scholar |
[4] |
W. Buntine and M. Hutter, A Bayesian view of the Poisson-Dirichlet process, preprint, arXiv: 1007.0296. Google Scholar |
[5] |
C. Chen, L. Du and W. Buntine, Sampling table configurations for the hierarchical Poisson-Dirichlet process, in Machine Learning and Knowledge Discovery in Databases (eds. D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis), Springer Berlin Heidelberg, 2011, 296–311.
doi: 10.1007/978-3-642-23780-5_29. |
[6] |
T. S. Ferguson,
A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1 (1973), 209-230.
doi: 10.1214/aos/1176342360. |
[7] |
R. A. Fisher, Statistical Methods for Research Workers, Fourteenth edition. Hafner Publishing Co., New York, 1973. |
[8] |
A. Goldenberg, A. X. Zheng, S. E. Fienberg and E. M. Airoldi, A survey of statistical network models, Foundations and Trends in Machine Learning, 2 (2009), 129-233. Google Scholar |
[9] |
S. Goldwater, T. L. Griffiths and M. Johnson, Interpolating between types and tokens by estimating power-law generators, in Proceedings of the 18th International Conference on Neural Information Processing Systems, MIT Press, 2005, 459–466. Google Scholar |
[10] |
N. A. Heard and P. Rubin-Delanchy, Network-wide anomaly detection via the Dirichlet process, in Proceedings of the IEEE workshop on Big Data Analytics for Cyber-security Computing, 2016. Google Scholar |
[11] |
N. A. Heard and P. Rubin-Delanchy,
Choosing between methods of combining $p$-values, Biometrika, 105 (2018), 239-246.
doi: 10.1093/biomet/asx076. |
[12] |
H. Ishwaran and L. F. James,
Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96 (2001), 161-173.
doi: 10.1198/016214501750332758. |
[13] |
D. Jurafsky, J. H. Martin, P. Norvig and S. Russell, Speech and Language Processing, Pearson Education, 2014. Google Scholar |
[14] |
A. D. Kent, Cybersecurity data sources for dynamic network research, in Dynamic Networks and Cyber-Security, World Scientific, 2016. Google Scholar |
[15] |
H. Lancaster, Statistical control of counting experiments, Biometrika, 39 (1952), 419-422. Google Scholar |
[16] |
Y. Lv and C. X. Zhai, Positional language models for information retrieval, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2009,299–306.
doi: 10.1145/1571941.1571994. |
[17] |
C. Matias and V. Miele,
Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), 1119-1141.
doi: 10.1111/rssb.12200. |
[18] |
T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, Recurrent neural network based language model, in Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), International Speech Communication Association, 2010, 1045–1048. Google Scholar |
[19] |
M. E. J. Newman,
Power laws, Pareto distributions and Zipf's law, Contemporary Physics, 46 (2005), 323-351.
doi: 10.1080/00107510500052444. |
[20] |
K. Pearson, On a method of determining whether a sample of size $n$ supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25 (1933), 379-410. Google Scholar |
[21] |
P. O. Perry and P. J. Wolfe,
Point process modelling for directed interaction networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75 (2013), 821-849.
doi: 10.1111/rssb.12013. |
[22] |
J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. |
[23] |
J. Pitman and M. Yor,
The two-parameter Poisson-Dirichlet distribution derived from a stable sub-ordinator, Annals of Probability, 25 (1997), 855-900.
doi: 10.1214/aop/1024404422. |
[24] |
M. Price-Williams and N. A. Heard, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, 2019, 1–12.
doi: 10.1007/s11222-019-09875-z. |
[25] |
R. Rosenfeld,
A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, 10 (1996), 187-228.
doi: 10.1006/csla.1996.0011. |
[26] |
P. Rubin-Delanchy, N. A. Heard and D. J. Lawson, Meta analysis of mid-$p$-values: some new results based on the convex order, Journal of the American Statistical Association, 2018.
doi: 10.1080/01621459.2018.1469994. |
[27] |
B. W. Silverman, Density Estimation, London: Chapman and Hall, 1986. |
[28] |
S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams, The American Soldier. Adjustment During Army Life, Princeton, New Jersey: Princeton University Press, 1949. Google Scholar |
[29] |
W. Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association of Computational Linguistics, 2006,985–992.
doi: 10.3115/1220175.1220299. |
[30] |
L. H. C. Tippett, The Methods of Statistics, 4th ed. John Wiley & Sons, Inc., New York, N. Y.; Williams & Norgate, Ltd., London, 1952. |
[31] |
H. M. Wallach, S. T. Jensen, L. Dicker and K. A. Heller, An alternative prior process for nonparametric Bayesian clustering, in Proceedings of the Thireenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010,892–899. Google Scholar |





Destination |
|
||||||
Destination |
|
||||||
Events |
Events |
||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Event level combiner: Tippett. | ||||||
Mid |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Event level combiner: Fisher. | Event level combiner: Fisher. | ||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Events |
||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Events |
Event level combiner: Tippett. | ||||||
Mid |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
Event level combiner: Fisher. | Event level combiner: Fisher. | ||||||
Standard |
Mid- |
||||||
Edge level combiner: | Node level combiner: | Node level combiner: | |||||
Tippett | Fisher | Pearson | Stouffer | Fisher | Pearson | Stouffer | |
Source | |||||||
computer | |||||||
[1] |
Xianchao Xiu, Ying Yang, Wanquan Liu, Lingchen Kong, Meijuan Shang. An improved total variation regularized RPCA for moving object detection with dynamic background. Journal of Industrial & Management Optimization, 2020, 16 (4) : 1685-1698. doi: 10.3934/jimo.2019024 |
[2] |
Sarra Delladji, Mohammed Belloufi, Badreddine Sellami. Behavior of the combination of PRP and HZ methods for unconstrained optimization. Numerical Algebra, Control & Optimization, 2021, 11 (3) : 377-389. doi: 10.3934/naco.2020032 |
[3] |
Jan Rychtář, Dewey T. Taylor. Moran process and Wright-Fisher process favor low variability. Discrete & Continuous Dynamical Systems - B, 2021, 26 (7) : 3491-3504. doi: 10.3934/dcdsb.2020242 |
[4] |
Braxton Osting, Jérôme Darbon, Stanley Osher. Statistical ranking using the $l^{1}$-norm on graphs. Inverse Problems & Imaging, 2013, 7 (3) : 907-926. doi: 10.3934/ipi.2013.7.907 |
[5] |
Palash Sarkar, Subhadip Singha. Verifying solutions to LWE with implications for concrete security. Advances in Mathematics of Communications, 2021, 15 (2) : 257-266. doi: 10.3934/amc.2020057 |
[6] |
Roberto Civino, Riccardo Longo. Formal security proof for a scheme on a topological network. Advances in Mathematics of Communications, 2021 doi: 10.3934/amc.2021009 |
[7] |
Juan Manuel Pastor, Javier García-Algarra, José M. Iriondo, José J. Ramasco, Javier Galeano. Dragging in mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 37-52. doi: 10.3934/nhm.2015.10.37 |
[8] |
Gheorghe Craciun, Jiaxin Jin, Polly Y. Yu. Single-target networks. Discrete & Continuous Dynamical Systems - B, 2021 doi: 10.3934/dcdsb.2021065 |
[9] |
Chris Guiver, Nathan Poppelreiter, Richard Rebarber, Brigitte Tenhumberg, Stuart Townley. Dynamic observers for unknown populations. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3279-3302. doi: 10.3934/dcdsb.2020232 |
[10] |
Alexander Tolstonogov. BV solutions of a convex sweeping process with a composed perturbation. Evolution Equations & Control Theory, 2021 doi: 10.3934/eect.2021012 |
[11] |
Alexandr Mikhaylov, Victor Mikhaylov. Dynamic inverse problem for Jacobi matrices. Inverse Problems & Imaging, 2019, 13 (3) : 431-447. doi: 10.3934/ipi.2019021 |
[12] |
Mikhail Gilman, Semyon Tsynkov. Statistical characterization of scattering delay in synthetic aperture radar imaging. Inverse Problems & Imaging, 2020, 14 (3) : 511-533. doi: 10.3934/ipi.2020024 |
[13] |
Habib Ammari, Josselin Garnier, Vincent Jugnon. Detection, reconstruction, and characterization algorithms from noisy data in multistatic wave imaging. Discrete & Continuous Dynamical Systems - S, 2015, 8 (3) : 389-417. doi: 10.3934/dcdss.2015.8.389 |
[14] |
Suzete Maria Afonso, Vanessa Ramos, Jaqueline Siqueira. Equilibrium states for non-uniformly hyperbolic systems: Statistical properties and analyticity. Discrete & Continuous Dynamical Systems, 2021 doi: 10.3934/dcds.2021045 |
[15] |
Yongkun Wang, Fengshou He, Xiaobo Deng. Multi-aircraft cooperative path planning for maneuvering target detection. Journal of Industrial & Management Optimization, 2021 doi: 10.3934/jimo.2021050 |
[16] |
Palash Sarkar, Subhadip Singha. Classical reduction of gap SVP to LWE: A concrete security analysis. Advances in Mathematics of Communications, 2021 doi: 10.3934/amc.2021004 |
[17] |
Samira Shahsavari, Saeed Ketabchi. The proximal methods for solving absolute value equation. Numerical Algebra, Control & Optimization, 2021, 11 (3) : 449-460. doi: 10.3934/naco.2020037 |
[18] |
Alessandro Gondolo, Fernando Guevara Vasquez. Characterization and synthesis of Rayleigh damped elastodynamic networks. Networks & Heterogeneous Media, 2014, 9 (2) : 299-314. doi: 10.3934/nhm.2014.9.299 |
[19] |
Juan Manuel Pastor, Javier García-Algarra, Javier Galeano, José María Iriondo, José J. Ramasco. A simple and bounded model of population dynamics for mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 53-70. doi: 10.3934/nhm.2015.10.53 |
[20] |
Simone Cacace, Maurizio Falcone. A dynamic domain decomposition for the eikonal-diffusion equation. Discrete & Continuous Dynamical Systems - S, 2016, 9 (1) : 109-123. doi: 10.3934/dcdss.2016.9.109 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]