May  2019, 2(2): 127-147. doi: 10.3934/mfc.2019010

Unsupervised robust nonparametric learning of hidden community properties

Machine Learning and Optimization Laboratory, EPFL, Lausanne, CH-1015, Switzerland

* Corresponding author: Mikhail Langovoy

Received  April 2019 Published  June 2019

We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e.g., according to their opinions or preferences on a topic. For learning these properties, we propose a nonparametric, unsupervised, and scalable graph scan procedure that is, in addition, robust against a class of powerful adversaries. In our setup, one of the communities can fall under the influence of a knowledgeable adversarial leader, who knows the full network structure, has unlimited computational resources and can completely foresee our planned actions on the network. We prove strong consistency of our results in this setup with minimal assumptions. In particular, the learning procedure estimates the baseline activity of normal users asymptotically correctly with probability 1; the only assumption being the existence of a single implicit community of asymptotically negligible logarithmic size. We provide experiments on real and synthetic data to illustrate the performance of our method, including examples with adversaries.

Citation: Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010
References:
[1]

F. M. Aarestrup and M. G. Koopmans, Sharing data for global infectious disease surveillance and outbreak detection, Trends in Microbiology, 24 (2016), 241-245.  doi: 10.1016/j.tim.2016.01.009.  Google Scholar

[2]

L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election: Divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, 36–43. Google Scholar

[3]

S. An, P. Peursum, W. Liu and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, 264–271. Google Scholar

[4]

E. Arias-Castro and G. Grimmett, Cluster detection in networks using percolation, Bernoulli, 19 (2013), 676-719.  doi: 10.3150/11-BEJ412.  Google Scholar

[5]

S. N. Bernstein, On certain modifications of chebyshev's inequality, Doklady Akademii Nauk SSSR, 17 (1937), 275-277.   Google Scholar

[6]

F. Chen and D. B. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166–1175. doi: 10.1145/2623330.2623619.  Google Scholar

[7]

V. CheplyginaM. de Bruijne and J. P. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical Image Analysis, 54 (2019), 280-296.  doi: 10.1016/j.media.2019.03.009.  Google Scholar

[8]

C. Deng, Y. Han and B. Zhao, High-performance visual tracking with extreme learning machine framework, IEEE Transactions on Cybernetics(Early Access), 2019, 1–12. doi: 10.1109/TCYB.2018.2886580.  Google Scholar

[9]

D. Geman and B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 1-14.  doi: 10.1109/34.476006.  Google Scholar

[10]

G. Grimmett, Percolation, vol. 321 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition, Springer-Verlag, Berlin, 1999. doi: 10.1007/978-3-662-03981-6.  Google Scholar

[11]

M. Han and Y. Li, Influence analysis: A survey of the state-of-the-art, Mathematical Foundations of Computing, 1 (2018), 201-253.  doi: 10.3934/mfc.2018010.  Google Scholar

[12]

M. Kulldorff, A spatial scan statistic, Communications in Statistics-Theory and Methods, 26 (1997), 1481-1496.  doi: 10.1080/03610929708831995.  Google Scholar

[13]

M. Kulldorff, Spatial scan statistics: Models, calculations, and applications, in Scan Statistics and Applications, Springer, 1999, 303–322. doi: 10.1007/978-1-4612-1578-3_14.  Google Scholar

[14]

M. KulldorffL. HuangL. Pickle and L. Duczmal, An elliptic spatial scan statistic, Statistics in medicine, 25 (2006), 3929-3943.  doi: 10.1002/sim.2490.  Google Scholar

[15]

M. Langovoy, M. Habeck and B. Schoelkopf, Adaptive nonparametric detection in cryo-electron microscopy, in Proceedings of the 58-th World Statistical Congress, Session: High Dimensional Data, 2011, 4456–4461. Google Scholar

[16]

M. Langovoy, M. Habeck and B. Schoelkopf, Spatial statistics, image analysis and percolation theory, in The Joint Statistical Meetings Proceedings, Time Series and Network Section, American Statistical Association, Alexandria, VA, 2011, 5571–5581. Google Scholar

[17]

M. Langovoy and O. Wittich, Detection of Objects in Noisy Images and Site Percolation on Square Lattices, EURANDOM Report No. 2009-035, EURANDOM, Eindhoven, 2009. Google Scholar

[18]

M. Langovoy and O. Wittich, Randomized algorithms for statistical image analysis and site percolation on square lattices, Statistica Neerlandica, 67 (2013), 337-353.  doi: 10.1111/stan.12010.  Google Scholar

[19]

M. Langovoy and O. Wittich, Robust nonparametric detection of objects in noisy images, Journal of Nonparametric Statistics, 25 (2013), 409-426.  doi: 10.1080/10485252.2012.759570.  Google Scholar

[20]

Y. Liu, B. Zhou, F. Chen and D. W. Cheung, Graph topic scan statistic for spatial event detection, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 489–498. doi: 10.1145/2983323.2983744.  Google Scholar

[21]

T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: A survey, Medical Image Analysis, 1 (1996), 91-108.  doi: 10.1016/S1361-8415(96)80007-7.  Google Scholar

[22]

D. B. Neill, A. W. Moore and G. F. Cooper, A bayesian spatial scan statistic, Advances in Neural Information Processing Systems (NIPS), 18 (2006), 1003. Google Scholar

[23]

G. P. Patil and C. Taillie, Upper level set scan statistic for detecting arbitrarily shaped hotspots, Environmental and Ecological statistics, 11 (2004), 183-197.  doi: 10.1023/B:EEST.0000027208.48919.7e.  Google Scholar

[24]

L. D. Rotz and J. M. Hughes, Advances in detecting and responding to threats from bioterrorism and emerging infectious disease, Nature Medicine, 10 (2004), S130–S136. doi: 10.1038/nm1152.  Google Scholar

[25]

Y. Ruan, D. Fuhry and S. Parthasarathy, Efficient community detection in large networks using content and links, in Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, 1089–1098. doi: 10.1145/2488388.2488483.  Google Scholar

[26]

J. SharpnackA. Singh and A. Rinaldo, Changepoint detection over graphs with the spectral scan statistic, AISTATS, 13 (2013), 545-553.   Google Scholar

[27]

J. L. Sharpnack, A. Krishnamurthy and A. Singh, Near-optimal anomaly detection in graphs using Lovasz extended scan statistic, in Advances in Neural Information Processing Systems (NIPS), 2013, 1959–1967. Google Scholar

[28]

T. Tango and K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics, 4 (2005), 11. Google Scholar

[29]

J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 1937. Google Scholar

[30]

Y. Wang, Y. Feng, J. Luo and X. Zhang, Voting with feet: Who are leaving hillary clinton and donald trump, in 2016 IEEE International Symposium on Multimedia (ISM), IEEE, 2016, 71–76. doi: 10.1109/ISM.2016.0022.  Google Scholar

[31]

J. Yang, J. McAuley and J. Leskovec, Community detection in networks with node attributes, in Data Mining (ICDM), 2013 IEEE 13th International Conference on, IEEE, 2013, 1151–1156. doi: 10.1109/ICDM.2013.167.  Google Scholar

show all references

References:
[1]

F. M. Aarestrup and M. G. Koopmans, Sharing data for global infectious disease surveillance and outbreak detection, Trends in Microbiology, 24 (2016), 241-245.  doi: 10.1016/j.tim.2016.01.009.  Google Scholar

[2]

L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election: Divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, 36–43. Google Scholar

[3]

S. An, P. Peursum, W. Liu and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, 264–271. Google Scholar

[4]

E. Arias-Castro and G. Grimmett, Cluster detection in networks using percolation, Bernoulli, 19 (2013), 676-719.  doi: 10.3150/11-BEJ412.  Google Scholar

[5]

S. N. Bernstein, On certain modifications of chebyshev's inequality, Doklady Akademii Nauk SSSR, 17 (1937), 275-277.   Google Scholar

[6]

F. Chen and D. B. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166–1175. doi: 10.1145/2623330.2623619.  Google Scholar

[7]

V. CheplyginaM. de Bruijne and J. P. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical Image Analysis, 54 (2019), 280-296.  doi: 10.1016/j.media.2019.03.009.  Google Scholar

[8]

C. Deng, Y. Han and B. Zhao, High-performance visual tracking with extreme learning machine framework, IEEE Transactions on Cybernetics(Early Access), 2019, 1–12. doi: 10.1109/TCYB.2018.2886580.  Google Scholar

[9]

D. Geman and B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 1-14.  doi: 10.1109/34.476006.  Google Scholar

[10]

G. Grimmett, Percolation, vol. 321 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition, Springer-Verlag, Berlin, 1999. doi: 10.1007/978-3-662-03981-6.  Google Scholar

[11]

M. Han and Y. Li, Influence analysis: A survey of the state-of-the-art, Mathematical Foundations of Computing, 1 (2018), 201-253.  doi: 10.3934/mfc.2018010.  Google Scholar

[12]

M. Kulldorff, A spatial scan statistic, Communications in Statistics-Theory and Methods, 26 (1997), 1481-1496.  doi: 10.1080/03610929708831995.  Google Scholar

[13]

M. Kulldorff, Spatial scan statistics: Models, calculations, and applications, in Scan Statistics and Applications, Springer, 1999, 303–322. doi: 10.1007/978-1-4612-1578-3_14.  Google Scholar

[14]

M. KulldorffL. HuangL. Pickle and L. Duczmal, An elliptic spatial scan statistic, Statistics in medicine, 25 (2006), 3929-3943.  doi: 10.1002/sim.2490.  Google Scholar

[15]

M. Langovoy, M. Habeck and B. Schoelkopf, Adaptive nonparametric detection in cryo-electron microscopy, in Proceedings of the 58-th World Statistical Congress, Session: High Dimensional Data, 2011, 4456–4461. Google Scholar

[16]

M. Langovoy, M. Habeck and B. Schoelkopf, Spatial statistics, image analysis and percolation theory, in The Joint Statistical Meetings Proceedings, Time Series and Network Section, American Statistical Association, Alexandria, VA, 2011, 5571–5581. Google Scholar

[17]

M. Langovoy and O. Wittich, Detection of Objects in Noisy Images and Site Percolation on Square Lattices, EURANDOM Report No. 2009-035, EURANDOM, Eindhoven, 2009. Google Scholar

[18]

M. Langovoy and O. Wittich, Randomized algorithms for statistical image analysis and site percolation on square lattices, Statistica Neerlandica, 67 (2013), 337-353.  doi: 10.1111/stan.12010.  Google Scholar

[19]

M. Langovoy and O. Wittich, Robust nonparametric detection of objects in noisy images, Journal of Nonparametric Statistics, 25 (2013), 409-426.  doi: 10.1080/10485252.2012.759570.  Google Scholar

[20]

Y. Liu, B. Zhou, F. Chen and D. W. Cheung, Graph topic scan statistic for spatial event detection, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 489–498. doi: 10.1145/2983323.2983744.  Google Scholar

[21]

T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: A survey, Medical Image Analysis, 1 (1996), 91-108.  doi: 10.1016/S1361-8415(96)80007-7.  Google Scholar

[22]

D. B. Neill, A. W. Moore and G. F. Cooper, A bayesian spatial scan statistic, Advances in Neural Information Processing Systems (NIPS), 18 (2006), 1003. Google Scholar

[23]

G. P. Patil and C. Taillie, Upper level set scan statistic for detecting arbitrarily shaped hotspots, Environmental and Ecological statistics, 11 (2004), 183-197.  doi: 10.1023/B:EEST.0000027208.48919.7e.  Google Scholar

[24]

L. D. Rotz and J. M. Hughes, Advances in detecting and responding to threats from bioterrorism and emerging infectious disease, Nature Medicine, 10 (2004), S130–S136. doi: 10.1038/nm1152.  Google Scholar

[25]

Y. Ruan, D. Fuhry and S. Parthasarathy, Efficient community detection in large networks using content and links, in Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, 1089–1098. doi: 10.1145/2488388.2488483.  Google Scholar

[26]

J. SharpnackA. Singh and A. Rinaldo, Changepoint detection over graphs with the spectral scan statistic, AISTATS, 13 (2013), 545-553.   Google Scholar

[27]

J. L. Sharpnack, A. Krishnamurthy and A. Singh, Near-optimal anomaly detection in graphs using Lovasz extended scan statistic, in Advances in Neural Information Processing Systems (NIPS), 2013, 1959–1967. Google Scholar

[28]

T. Tango and K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics, 4 (2005), 11. Google Scholar

[29]

J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 1937. Google Scholar

[30]

Y. Wang, Y. Feng, J. Luo and X. Zhang, Voting with feet: Who are leaving hillary clinton and donald trump, in 2016 IEEE International Symposium on Multimedia (ISM), IEEE, 2016, 71–76. doi: 10.1109/ISM.2016.0022.  Google Scholar

[31]

J. Yang, J. McAuley and J. Leskovec, Community detection in networks with node attributes, in Data Mining (ICDM), 2013 IEEE 13th International Conference on, IEEE, 2013, 1151–1156. doi: 10.1109/ICDM.2013.167.  Google Scholar

Figure 1.  Graph scan estimators for the artificial large graph
Figure 2.  Political blogosphere graph for the 2004 Elections
Figure 3.  Graph scan estimators for the 2004 Elections graph
Table 1.  Multi-step adversary
$ k $ 100 200 500 700 800
$ \#(N_w=0) $ 98 98 78 56 55
$ \#(N_w=1) $ 2 2 16 22 17
$ \#(N_w=2) $ 0 0 3 16 12
$ \#(N_w = 3) $ 0 0 3 5 9
$ \#(N_w \geq 4) $ 0 0 0 1 7
$ \mu_{\hat{a}} $ 1.705 1.815 1.913 1.949 1.961
$ \max{\hat{a}} $ 1.806 1.903 1.980 2.030 2.031
$ \min{\hat{a}} $ 1.603 1.686 1.824 1.848 1.879
$ \sigma_{\hat{a}} $ 0.048 0.040 0.033 0.034 0.029
$ k $ 100 200 500 700 800
$ \#(N_w=0) $ 98 98 78 56 55
$ \#(N_w=1) $ 2 2 16 22 17
$ \#(N_w=2) $ 0 0 3 16 12
$ \#(N_w = 3) $ 0 0 3 5 9
$ \#(N_w \geq 4) $ 0 0 0 1 7
$ \mu_{\hat{a}} $ 1.705 1.815 1.913 1.949 1.961
$ \max{\hat{a}} $ 1.806 1.903 1.980 2.030 2.031
$ \min{\hat{a}} $ 1.603 1.686 1.824 1.848 1.879
$ \sigma_{\hat{a}} $ 0.048 0.040 0.033 0.034 0.029
[1]

Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17

[2]

Roberto C. Alamino, Nestor Caticha. Bayesian online algorithms for learning in discrete hidden Markov models. Discrete & Continuous Dynamical Systems - B, 2008, 9 (1) : 1-10. doi: 10.3934/dcdsb.2008.9.1

[3]

Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics & Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117

[4]

Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems & Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901

[5]

Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012

[6]

Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005

[7]

Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008

[8]

Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008

[9]

Yang Wang, Zhengfang Zhou. Source extraction in audio via background learning. Inverse Problems & Imaging, 2013, 7 (1) : 283-290. doi: 10.3934/ipi.2013.7.283

[10]

Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial & Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071

[11]

G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156

[12]

Miguel A. Dumett, Roberto Cominetti. On the stability of an adaptive learning dynamics in traffic games. Journal of Dynamics & Games, 2018, 5 (4) : 265-282. doi: 10.3934/jdg.2018017

[13]

Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011

[14]

Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111

[15]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[16]

Ta-Wei Hung, Ping-Ting Chen. On the optimal replenishment in a finite planning horizon with learning effect of setup costs. Journal of Industrial & Management Optimization, 2010, 6 (2) : 425-433. doi: 10.3934/jimo.2010.6.425

[17]

Mingbao Cheng, Shuxian Xiao, Guosheng Liu. Single-machine rescheduling problems with learning effect under disruptions. Journal of Industrial & Management Optimization, 2018, 14 (3) : 967-980. doi: 10.3934/jimo.2017085

[18]

A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429

[19]

Michael K. Ng, Chi-Pan Tam, Fan Wang. Multi-view foreground segmentation via fourth order tensor learning. Inverse Problems & Imaging, 2013, 7 (3) : 885-906. doi: 10.3934/ipi.2013.7.885

[20]

Marcello Delitala, Tommaso Lorenzi. Recognition and learning in a mathematical model for immune response against cancer. Discrete & Continuous Dynamical Systems - B, 2013, 18 (4) : 891-914. doi: 10.3934/dcdsb.2013.18.891

 Impact Factor: 

Metrics

  • PDF downloads (23)
  • HTML views (315)
  • Cited by (0)

[Back to Top]