# American Institute of Mathematical Sciences

May  2019, 2(2): 127-147. doi: 10.3934/mfc.2019010

## Unsupervised robust nonparametric learning of hidden community properties

 Machine Learning and Optimization Laboratory, EPFL, Lausanne, CH-1015, Switzerland

* Corresponding author: Mikhail Langovoy

Received  April 2019 Published  June 2019

We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e.g., according to their opinions or preferences on a topic. For learning these properties, we propose a nonparametric, unsupervised, and scalable graph scan procedure that is, in addition, robust against a class of powerful adversaries. In our setup, one of the communities can fall under the influence of a knowledgeable adversarial leader, who knows the full network structure, has unlimited computational resources and can completely foresee our planned actions on the network. We prove strong consistency of our results in this setup with minimal assumptions. In particular, the learning procedure estimates the baseline activity of normal users asymptotically correctly with probability 1; the only assumption being the existence of a single implicit community of asymptotically negligible logarithmic size. We provide experiments on real and synthetic data to illustrate the performance of our method, including examples with adversaries.

Citation: Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010
##### References:
 [1] F. M. Aarestrup and M. G. Koopmans, Sharing data for global infectious disease surveillance and outbreak detection, Trends in Microbiology, 24 (2016), 241-245.  doi: 10.1016/j.tim.2016.01.009. [2] L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election: Divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, 36–43. [3] S. An, P. Peursum, W. Liu and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, 264–271. [4] E. Arias-Castro and G. Grimmett, Cluster detection in networks using percolation, Bernoulli, 19 (2013), 676-719.  doi: 10.3150/11-BEJ412. [5] S. N. Bernstein, On certain modifications of chebyshev's inequality, Doklady Akademii Nauk SSSR, 17 (1937), 275-277. [6] F. Chen and D. B. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166–1175. doi: 10.1145/2623330.2623619. [7] V. Cheplygina, M. de Bruijne and J. P. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical Image Analysis, 54 (2019), 280-296.  doi: 10.1016/j.media.2019.03.009. [8] C. Deng, Y. Han and B. Zhao, High-performance visual tracking with extreme learning machine framework, IEEE Transactions on Cybernetics(Early Access), 2019, 1–12. doi: 10.1109/TCYB.2018.2886580. [9] D. Geman and B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 1-14.  doi: 10.1109/34.476006. [10] G. Grimmett, Percolation, vol. 321 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition, Springer-Verlag, Berlin, 1999. doi: 10.1007/978-3-662-03981-6. [11] M. Han and Y. Li, Influence analysis: A survey of the state-of-the-art, Mathematical Foundations of Computing, 1 (2018), 201-253.  doi: 10.3934/mfc.2018010. [12] M. Kulldorff, A spatial scan statistic, Communications in Statistics-Theory and Methods, 26 (1997), 1481-1496.  doi: 10.1080/03610929708831995. [13] M. Kulldorff, Spatial scan statistics: Models, calculations, and applications, in Scan Statistics and Applications, Springer, 1999, 303–322. doi: 10.1007/978-1-4612-1578-3_14. [14] M. Kulldorff, L. Huang, L. Pickle and L. Duczmal, An elliptic spatial scan statistic, Statistics in medicine, 25 (2006), 3929-3943.  doi: 10.1002/sim.2490. [15] M. Langovoy, M. Habeck and B. Schoelkopf, Adaptive nonparametric detection in cryo-electron microscopy, in Proceedings of the 58-th World Statistical Congress, Session: High Dimensional Data, 2011, 4456–4461. [16] M. Langovoy, M. Habeck and B. Schoelkopf, Spatial statistics, image analysis and percolation theory, in The Joint Statistical Meetings Proceedings, Time Series and Network Section, American Statistical Association, Alexandria, VA, 2011, 5571–5581. [17] M. Langovoy and O. Wittich, Detection of Objects in Noisy Images and Site Percolation on Square Lattices, EURANDOM Report No. 2009-035, EURANDOM, Eindhoven, 2009. [18] M. Langovoy and O. Wittich, Randomized algorithms for statistical image analysis and site percolation on square lattices, Statistica Neerlandica, 67 (2013), 337-353.  doi: 10.1111/stan.12010. [19] M. Langovoy and O. Wittich, Robust nonparametric detection of objects in noisy images, Journal of Nonparametric Statistics, 25 (2013), 409-426.  doi: 10.1080/10485252.2012.759570. [20] Y. Liu, B. Zhou, F. Chen and D. W. Cheung, Graph topic scan statistic for spatial event detection, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 489–498. doi: 10.1145/2983323.2983744. [21] T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: A survey, Medical Image Analysis, 1 (1996), 91-108.  doi: 10.1016/S1361-8415(96)80007-7. [22] D. B. Neill, A. W. Moore and G. F. Cooper, A bayesian spatial scan statistic, Advances in Neural Information Processing Systems (NIPS), 18 (2006), 1003. [23] G. P. Patil and C. Taillie, Upper level set scan statistic for detecting arbitrarily shaped hotspots, Environmental and Ecological statistics, 11 (2004), 183-197.  doi: 10.1023/B:EEST.0000027208.48919.7e. [24] L. D. Rotz and J. M. Hughes, Advances in detecting and responding to threats from bioterrorism and emerging infectious disease, Nature Medicine, 10 (2004), S130–S136. doi: 10.1038/nm1152. [25] Y. Ruan, D. Fuhry and S. Parthasarathy, Efficient community detection in large networks using content and links, in Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, 1089–1098. doi: 10.1145/2488388.2488483. [26] J. Sharpnack, A. Singh and A. Rinaldo, Changepoint detection over graphs with the spectral scan statistic, AISTATS, 13 (2013), 545-553. [27] J. L. Sharpnack, A. Krishnamurthy and A. Singh, Near-optimal anomaly detection in graphs using Lovasz extended scan statistic, in Advances in Neural Information Processing Systems (NIPS), 2013, 1959–1967. [28] T. Tango and K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics, 4 (2005), 11. [29] J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 1937. [30] Y. Wang, Y. Feng, J. Luo and X. Zhang, Voting with feet: Who are leaving hillary clinton and donald trump, in 2016 IEEE International Symposium on Multimedia (ISM), IEEE, 2016, 71–76. doi: 10.1109/ISM.2016.0022. [31] J. Yang, J. McAuley and J. Leskovec, Community detection in networks with node attributes, in Data Mining (ICDM), 2013 IEEE 13th International Conference on, IEEE, 2013, 1151–1156. doi: 10.1109/ICDM.2013.167.

show all references

##### References:
 [1] F. M. Aarestrup and M. G. Koopmans, Sharing data for global infectious disease surveillance and outbreak detection, Trends in Microbiology, 24 (2016), 241-245.  doi: 10.1016/j.tim.2016.01.009. [2] L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election: Divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, 36–43. [3] S. An, P. Peursum, W. Liu and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, 264–271. [4] E. Arias-Castro and G. Grimmett, Cluster detection in networks using percolation, Bernoulli, 19 (2013), 676-719.  doi: 10.3150/11-BEJ412. [5] S. N. Bernstein, On certain modifications of chebyshev's inequality, Doklady Akademii Nauk SSSR, 17 (1937), 275-277. [6] F. Chen and D. B. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166–1175. doi: 10.1145/2623330.2623619. [7] V. Cheplygina, M. de Bruijne and J. P. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical Image Analysis, 54 (2019), 280-296.  doi: 10.1016/j.media.2019.03.009. [8] C. Deng, Y. Han and B. Zhao, High-performance visual tracking with extreme learning machine framework, IEEE Transactions on Cybernetics(Early Access), 2019, 1–12. doi: 10.1109/TCYB.2018.2886580. [9] D. Geman and B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 1-14.  doi: 10.1109/34.476006. [10] G. Grimmett, Percolation, vol. 321 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition, Springer-Verlag, Berlin, 1999. doi: 10.1007/978-3-662-03981-6. [11] M. Han and Y. Li, Influence analysis: A survey of the state-of-the-art, Mathematical Foundations of Computing, 1 (2018), 201-253.  doi: 10.3934/mfc.2018010. [12] M. Kulldorff, A spatial scan statistic, Communications in Statistics-Theory and Methods, 26 (1997), 1481-1496.  doi: 10.1080/03610929708831995. [13] M. Kulldorff, Spatial scan statistics: Models, calculations, and applications, in Scan Statistics and Applications, Springer, 1999, 303–322. doi: 10.1007/978-1-4612-1578-3_14. [14] M. Kulldorff, L. Huang, L. Pickle and L. Duczmal, An elliptic spatial scan statistic, Statistics in medicine, 25 (2006), 3929-3943.  doi: 10.1002/sim.2490. [15] M. Langovoy, M. Habeck and B. Schoelkopf, Adaptive nonparametric detection in cryo-electron microscopy, in Proceedings of the 58-th World Statistical Congress, Session: High Dimensional Data, 2011, 4456–4461. [16] M. Langovoy, M. Habeck and B. Schoelkopf, Spatial statistics, image analysis and percolation theory, in The Joint Statistical Meetings Proceedings, Time Series and Network Section, American Statistical Association, Alexandria, VA, 2011, 5571–5581. [17] M. Langovoy and O. Wittich, Detection of Objects in Noisy Images and Site Percolation on Square Lattices, EURANDOM Report No. 2009-035, EURANDOM, Eindhoven, 2009. [18] M. Langovoy and O. Wittich, Randomized algorithms for statistical image analysis and site percolation on square lattices, Statistica Neerlandica, 67 (2013), 337-353.  doi: 10.1111/stan.12010. [19] M. Langovoy and O. Wittich, Robust nonparametric detection of objects in noisy images, Journal of Nonparametric Statistics, 25 (2013), 409-426.  doi: 10.1080/10485252.2012.759570. [20] Y. Liu, B. Zhou, F. Chen and D. W. Cheung, Graph topic scan statistic for spatial event detection, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 489–498. doi: 10.1145/2983323.2983744. [21] T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: A survey, Medical Image Analysis, 1 (1996), 91-108.  doi: 10.1016/S1361-8415(96)80007-7. [22] D. B. Neill, A. W. Moore and G. F. Cooper, A bayesian spatial scan statistic, Advances in Neural Information Processing Systems (NIPS), 18 (2006), 1003. [23] G. P. Patil and C. Taillie, Upper level set scan statistic for detecting arbitrarily shaped hotspots, Environmental and Ecological statistics, 11 (2004), 183-197.  doi: 10.1023/B:EEST.0000027208.48919.7e. [24] L. D. Rotz and J. M. Hughes, Advances in detecting and responding to threats from bioterrorism and emerging infectious disease, Nature Medicine, 10 (2004), S130–S136. doi: 10.1038/nm1152. [25] Y. Ruan, D. Fuhry and S. Parthasarathy, Efficient community detection in large networks using content and links, in Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, 1089–1098. doi: 10.1145/2488388.2488483. [26] J. Sharpnack, A. Singh and A. Rinaldo, Changepoint detection over graphs with the spectral scan statistic, AISTATS, 13 (2013), 545-553. [27] J. L. Sharpnack, A. Krishnamurthy and A. Singh, Near-optimal anomaly detection in graphs using Lovasz extended scan statistic, in Advances in Neural Information Processing Systems (NIPS), 2013, 1959–1967. [28] T. Tango and K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics, 4 (2005), 11. [29] J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 1937. [30] Y. Wang, Y. Feng, J. Luo and X. Zhang, Voting with feet: Who are leaving hillary clinton and donald trump, in 2016 IEEE International Symposium on Multimedia (ISM), IEEE, 2016, 71–76. doi: 10.1109/ISM.2016.0022. [31] J. Yang, J. McAuley and J. Leskovec, Community detection in networks with node attributes, in Data Mining (ICDM), 2013 IEEE 13th International Conference on, IEEE, 2013, 1151–1156. doi: 10.1109/ICDM.2013.167.
Graph scan estimators for the artificial large graph
Political blogosphere graph for the 2004 Elections
Graph scan estimators for the 2004 Elections graph
 $k$ 100 200 500 700 800 $\#(N_w=0)$ 98 98 78 56 55 $\#(N_w=1)$ 2 2 16 22 17 $\#(N_w=2)$ 0 0 3 16 12 $\#(N_w = 3)$ 0 0 3 5 9 $\#(N_w \geq 4)$ 0 0 0 1 7 $\mu_{\hat{a}}$ 1.705 1.815 1.913 1.949 1.961 $\max{\hat{a}}$ 1.806 1.903 1.98 2.03 2.031 $\min{\hat{a}}$ 1.603 1.686 1.824 1.848 1.879 $\sigma_{\hat{a}}$ 0.048 0.04 0.033 0.034 0.029
 $k$ 100 200 500 700 800 $\#(N_w=0)$ 98 98 78 56 55 $\#(N_w=1)$ 2 2 16 22 17 $\#(N_w=2)$ 0 0 3 16 12 $\#(N_w = 3)$ 0 0 3 5 9 $\#(N_w \geq 4)$ 0 0 0 1 7 $\mu_{\hat{a}}$ 1.705 1.815 1.913 1.949 1.961 $\max{\hat{a}}$ 1.806 1.903 1.98 2.03 2.031 $\min{\hat{a}}$ 1.603 1.686 1.824 1.848 1.879 $\sigma_{\hat{a}}$ 0.048 0.04 0.033 0.034 0.029
 [1] Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete and Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17 [2] Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045 [3] Roberto C. Alamino, Nestor Caticha. Bayesian online algorithms for learning in discrete hidden Markov models. Discrete and Continuous Dynamical Systems - B, 2008, 9 (1) : 1-10. doi: 10.3934/dcdsb.2008.9.1 [4] Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics and Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117 [5] Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems and Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901 [6] Christian Soize, Roger Ghanem. Probabilistic learning on manifolds. Foundations of Data Science, 2020, 2 (3) : 279-307. doi: 10.3934/fods.2020013 [7] Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012 [8] Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control and Related Fields, 2022, 12 (1) : 81-114. doi: 10.3934/mcrf.2021003 [9] Sriram Nagaraj. Optimization and learning with nonlocal calculus. Foundations of Data Science, 2022  doi: 10.3934/fods.2022009 [10] Yossi Bokor, Katharine Turner, Christopher Williams. Reconstructing linearly embedded graphs: A first step to stratified space learning. Foundations of Data Science, 2021  doi: 10.3934/fods.2021026 [11] Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005 [12] Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008 [13] Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008 [14] Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $\ell^1$ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020 [15] Changming Song, Yun Wang. Nonlocal latent low rank sparse representation for single image super resolution via self-similarity learning. Inverse Problems and Imaging, 2021, 15 (6) : 1347-1362. doi: 10.3934/ipi.2021017 [16] Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete and Continuous Dynamical Systems, 2021, 41 (9) : 4351-4373. doi: 10.3934/dcds.2021039 [17] Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178 [18] Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008 [19] Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011 [20] Yang Wang, Zhengfang Zhou. Source extraction in audio via background learning. Inverse Problems and Imaging, 2013, 7 (1) : 283-290. doi: 10.3934/ipi.2013.7.283

Impact Factor: