# American Institute of Mathematical Sciences

March  2020, 2(1): 19-33. doi: 10.3934/fods.2020002

## Semi-supervised classification on graphs using explicit diffusion dynamics

 1 Department of Mathematics and Imperial College Business School, Imperial College London, London SW7 2AZ, UK 2 Department of Mathematics, Imperial College London, London SW7 2AZ, UK

* Corresponding author: Mauricio Barahona

Current address: Blue Brain Project, Éole polytechnique fédérale de Lausanne (EPFL), Campus Biotech, 1202 Geneva, Switzerland.

Published  February 2020

Fund Project: All authors acknowledge funding through EPSRC award EP/N014529/1 supporting the EPSRC Centre for Mathematics of Precision Healthcare at Imperial

Classification tasks based on feature vectors can be significantly improved by including within deep learning a graph that summarises pairwise relationships between the samples. Intuitively, the graph acts as a conduit to channel and bias the inference of class labels. Here, we study classification methods that consider the graph as the originator of an explicit graph diffusion. We show that appending graph diffusion to feature-based learning as an a posteriori refinement achieves state-of-the-art classification accuracy. This method, which we call Graph Diffusion Reclassification (GDR), uses overshooting events of a diffusive graph dynamics to reclassify individual nodes. The method uses intrinsic measures of node influence, which are distinct for each node, and allows the evaluation of the relationship and importance of features and graph for classification. We also present diff-GCN, a simple extension of Graph Convolutional Neural Network (GCN) architectures that leverages explicit diffusion dynamics, and allows the natural use of directed graphs. To showcase our methods, we use benchmark datasets of documents with associated citation data.

Citation: Robert L. Peach, Alexis Arnaudon, Mauricio Barahona. Semi-supervised classification on graphs using explicit diffusion dynamics. Foundations of Data Science, 2020, 2 (1) : 19-33. doi: 10.3934/fods.2020002
##### References:

show all references

##### References:
Statistics of datasets as reported in [35] and [30]
 Datasets Nodes Edges Classes Features Citeseer $3,327$ $4,732$ $6$ $3,703$ Cora $2,708$ $5,429$ $7$ $1,433$ Pubmed $19,717$ $44,338$ $3$ $500$ Wikipedia $20,525$ $215,056$ $12$ $100$
 Datasets Nodes Edges Classes Features Citeseer $3,327$ $4,732$ $6$ $3,703$ Cora $2,708$ $5,429$ $7$ $1,433$ Pubmed $19,717$ $44,338$ $3$ $500$ Wikipedia $20,525$ $215,056$ $12$ $100$
Percentage classification accuracy before and after application of relabelling by GDR for various classifiers. We present the improvement of GDR on the uniform prediction (which ignores features). We also consider four supervised classifiers (which learn from features without the graph): projection, RF, SVM and MLP. For RF, we use a maximum depth of $20$; for SVM, we set $C = 50$; for MLP, we implement the same architecture as GCN ($d_1 = 16$-unit hidden layer, $0.5$ dropout, $200$ epochs, $0.01$ learning rate, $L^2$ loss function). Finally, we compare with two semi-supervised graph classifiers: GCN [20] and Planetoid [35]. The numbers in brackets record the change in accuracy accomplished by applying GDR on the corresponding prior classifier. Boldface indicates the method with highest accuracy for each dataset
 Method Citeseer Cora Pubmed Wikipedia Uniform 7.7 13.0 18.0 28.7 GDR (Uniform) 50.6 (+42.9) 71.8 (+58.8) 73.2 (+55.2) 31.4 (+2.7) Projection 61.8 59.0 72.0 32.5 RF 60.3 58.9 68.8 50.8 SVM 61.1 58.0 49.9 31.0 MLP 57.0 56.0 70.7 43.0 GDR (Projection) 70.4 (+8.7) 79.7 (+20.7) 75.8 (+3.8) 36.9 (+4.4) GDR (RF) 70.5 (+10.2) 78.7 (+19.8) 72.2 (+3.2) 50.8 (+0.0) GDR (SVM) 70.3 (+9.2) 81.2 (+23.2) 52.4 (+2.5) 41.9 (+10.8) GDR (MLP) 69.7(+12.7) 78.5 (+22.5) 75.5 (+4.8) 40.5 (-2.5) Planetoid 64.7 75.7 72.2 - GCN 70.3 81.1 79.0 39.2 GDR (GCN) 70.8 (+0.5) 82.2 (+1.1) 79.4 (+0.4) 39.5 (+0.3)
 Method Citeseer Cora Pubmed Wikipedia Uniform 7.7 13.0 18.0 28.7 GDR (Uniform) 50.6 (+42.9) 71.8 (+58.8) 73.2 (+55.2) 31.4 (+2.7) Projection 61.8 59.0 72.0 32.5 RF 60.3 58.9 68.8 50.8 SVM 61.1 58.0 49.9 31.0 MLP 57.0 56.0 70.7 43.0 GDR (Projection) 70.4 (+8.7) 79.7 (+20.7) 75.8 (+3.8) 36.9 (+4.4) GDR (RF) 70.5 (+10.2) 78.7 (+19.8) 72.2 (+3.2) 50.8 (+0.0) GDR (SVM) 70.3 (+9.2) 81.2 (+23.2) 52.4 (+2.5) 41.9 (+10.8) GDR (MLP) 69.7(+12.7) 78.5 (+22.5) 75.5 (+4.8) 40.5 (-2.5) Planetoid 64.7 75.7 72.2 - GCN 70.3 81.1 79.0 39.2 GDR (GCN) 70.8 (+0.5) 82.2 (+1.1) 79.4 (+0.4) 39.5 (+0.3)
Percentage classification accuracy of GCN and its extension diff-GCN, which has an explicit diffusion operator (16)
 Model Citeseer Cora Pubmed Wikipedia GCN 70.3 81.1 79.0 34.1 diff-GCN 71.9 82.3 79.3 45.9
 Model Citeseer Cora Pubmed Wikipedia GCN 70.3 81.1 79.0 34.1 diff-GCN 71.9 82.3 79.3 45.9
Accuracy of GDR using the undirected, directed, and reverse directed graphs of the Cora dataset
 Undirected Directed (fw) Directed (bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ GDR (Projection) 79.7 62.1 64.6 GDR (RF) 78.7 58.0 57.6 GDR (SVM) 81.2 63.6 62.1 GDR (MLP) 78.5 57.3 56.4
 Undirected Directed (fw) Directed (bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ GDR (Projection) 79.7 62.1 64.6 GDR (RF) 78.7 58.0 57.6 GDR (SVM) 81.2 63.6 62.1 GDR (MLP) 78.5 57.3 56.4
Accuracy of GCN and diff-GCN using the undirected, directed, reverse directed, and bidirectional (augmented) graphs of the Cora dataset. The highest accuracy is achieved by diff-GCN with the augmented graph (boldface)
 Undirected Directed (fw) Directed (bw) Augmented (fw, bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ $\begin{bmatrix} A_\text{dir} \, A_\text{dir}^T \end{bmatrix}$ GCN 81.1 67.4 79.8 79.9 diff-GCN 82.3 80.3 81.7 83.0
 Undirected Directed (fw) Directed (bw) Augmented (fw, bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ $\begin{bmatrix} A_\text{dir} \, A_\text{dir}^T \end{bmatrix}$ GCN 81.1 67.4 79.8 79.9 diff-GCN 82.3 80.3 81.7 83.0
 [1] Eric Babson and Dmitry N. Kozlov. Topological obstructions to graph colorings. Electronic Research Announcements, 2003, 9: 61-68. [2] Mehmet Duran Toksari, Emel Kizilkaya Aydogan, Berrin Atalay, Saziye Sari. Some scheduling problems with sum of logarithm processing times based learning effect and exponential past sequence dependent delivery times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021044 [3] Juan Manuel Pastor, Javier García-Algarra, Javier Galeano, José María Iriondo, José J. Ramasco. A simple and bounded model of population dynamics for mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 53-70. doi: 10.3934/nhm.2015.10.53 [4] Rui Hu, Yuan Yuan. Stability, bifurcation analysis in a neural network model with delay and diffusion. Conference Publications, 2009, 2009 (Special) : 367-376. doi: 10.3934/proc.2009.2009.367 [5] Zengyun Wang, Jinde Cao, Zuowei Cai, Lihong Huang. Finite-time stability of impulsive differential inclusion: Applications to discontinuous impulsive neural networks. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2677-2692. doi: 10.3934/dcdsb.2020200 [6] Quan Hai, Shutang Liu. Mean-square delay-distribution-dependent exponential synchronization of chaotic neural networks with mixed random time-varying delays and restricted disturbances. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3097-3118. doi: 10.3934/dcdsb.2020221 [7] Juan Manuel Pastor, Javier García-Algarra, José M. Iriondo, José J. Ramasco, Javier Galeano. Dragging in mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 37-52. doi: 10.3934/nhm.2015.10.37 [8] Gheorghe Craciun, Jiaxin Jin, Polly Y. Yu. Single-target networks. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021065 [9] Masahiro Ikeda, Ziheng Tu, Kyouhei Wakasa. Small data blow-up of semi-linear wave equation with scattering dissipation and time-dependent mass. Evolution Equations & Control Theory, 2021  doi: 10.3934/eect.2021011 [10] Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225 [11] Simone Calogero, Juan Calvo, Óscar Sánchez, Juan Soler. Dispersive behavior in galactic dynamics. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 1-16. doi: 10.3934/dcdsb.2010.14.1 [12] Gloria Paoli, Gianpaolo Piscitelli, Rossanno Sannipoli. A stability result for the Steklov Laplacian Eigenvalue Problem with a spherical obstacle. Communications on Pure & Applied Analysis, 2021, 20 (1) : 145-158. doi: 10.3934/cpaa.2020261 [13] Meiqiang Feng, Yichen Zhang. Positive solutions of singular multiparameter p-Laplacian elliptic systems. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021083 [14] Alessandro Gondolo, Fernando Guevara Vasquez. Characterization and synthesis of Rayleigh damped elastodynamic networks. Networks & Heterogeneous Media, 2014, 9 (2) : 299-314. doi: 10.3934/nhm.2014.9.299 [15] Hsin-Lun Li. Mixed Hegselmann-Krause dynamics. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021084 [16] Changpin Li, Zhiqiang Li. Asymptotic behaviors of solution to partial differential equation with Caputo–Hadamard derivative and fractional Laplacian: Hyperbolic case. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021023 [17] Simone Cacace, Maurizio Falcone. A dynamic domain decomposition for the eikonal-diffusion equation. Discrete & Continuous Dynamical Systems - S, 2016, 9 (1) : 109-123. doi: 10.3934/dcdss.2016.9.109 [18] Guangying Lv, Jinlong Wei, Guang-an Zou. Noise and stability in reaction-diffusion equations. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021005 [19] Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028 [20] Paula A. González-Parra, Sunmi Lee, Leticia Velázquez, Carlos Castillo-Chavez. A note on the use of optimal control on a discrete time model of influenza dynamics. Mathematical Biosciences & Engineering, 2011, 8 (1) : 183-197. doi: 10.3934/mbe.2011.8.183

Impact Factor:

## Tools

Article outline

Figures and Tables