# American Institute of Mathematical Sciences

March  2020, 2(1): 19-33. doi: 10.3934/fods.2020002

## Semi-supervised classification on graphs using explicit diffusion dynamics

 1 Department of Mathematics and Imperial College Business School, Imperial College London, London SW7 2AZ, UK 2 Department of Mathematics, Imperial College London, London SW7 2AZ, UK

* Corresponding author: Mauricio Barahona

Current address: Blue Brain Project, Éole polytechnique fédérale de Lausanne (EPFL), Campus Biotech, 1202 Geneva, Switzerland.

Published  February 2020

Fund Project: All authors acknowledge funding through EPSRC award EP/N014529/1 supporting the EPSRC Centre for Mathematics of Precision Healthcare at Imperial

Classification tasks based on feature vectors can be significantly improved by including within deep learning a graph that summarises pairwise relationships between the samples. Intuitively, the graph acts as a conduit to channel and bias the inference of class labels. Here, we study classification methods that consider the graph as the originator of an explicit graph diffusion. We show that appending graph diffusion to feature-based learning as an a posteriori refinement achieves state-of-the-art classification accuracy. This method, which we call Graph Diffusion Reclassification (GDR), uses overshooting events of a diffusive graph dynamics to reclassify individual nodes. The method uses intrinsic measures of node influence, which are distinct for each node, and allows the evaluation of the relationship and importance of features and graph for classification. We also present diff-GCN, a simple extension of Graph Convolutional Neural Network (GCN) architectures that leverages explicit diffusion dynamics, and allows the natural use of directed graphs. To showcase our methods, we use benchmark datasets of documents with associated citation data.

Citation: Robert L. Peach, Alexis Arnaudon, Mauricio Barahona. Semi-supervised classification on graphs using explicit diffusion dynamics. Foundations of Data Science, 2020, 2 (1) : 19-33. doi: 10.3934/fods.2020002
##### References:

show all references

##### References:
Statistics of datasets as reported in [35] and [30]
 Datasets Nodes Edges Classes Features Citeseer $3,327$ $4,732$ $6$ $3,703$ Cora $2,708$ $5,429$ $7$ $1,433$ Pubmed $19,717$ $44,338$ $3$ $500$ Wikipedia $20,525$ $215,056$ $12$ $100$
 Datasets Nodes Edges Classes Features Citeseer $3,327$ $4,732$ $6$ $3,703$ Cora $2,708$ $5,429$ $7$ $1,433$ Pubmed $19,717$ $44,338$ $3$ $500$ Wikipedia $20,525$ $215,056$ $12$ $100$
Percentage classification accuracy before and after application of relabelling by GDR for various classifiers. We present the improvement of GDR on the uniform prediction (which ignores features). We also consider four supervised classifiers (which learn from features without the graph): projection, RF, SVM and MLP. For RF, we use a maximum depth of $20$; for SVM, we set $C = 50$; for MLP, we implement the same architecture as GCN ($d_1 = 16$-unit hidden layer, $0.5$ dropout, $200$ epochs, $0.01$ learning rate, $L^2$ loss function). Finally, we compare with two semi-supervised graph classifiers: GCN [20] and Planetoid [35]. The numbers in brackets record the change in accuracy accomplished by applying GDR on the corresponding prior classifier. Boldface indicates the method with highest accuracy for each dataset
 Method Citeseer Cora Pubmed Wikipedia Uniform 7.7 13.0 18.0 28.7 GDR (Uniform) 50.6 (+42.9) 71.8 (+58.8) 73.2 (+55.2) 31.4 (+2.7) Projection 61.8 59.0 72.0 32.5 RF 60.3 58.9 68.8 50.8 SVM 61.1 58.0 49.9 31.0 MLP 57.0 56.0 70.7 43.0 GDR (Projection) 70.4 (+8.7) 79.7 (+20.7) 75.8 (+3.8) 36.9 (+4.4) GDR (RF) 70.5 (+10.2) 78.7 (+19.8) 72.2 (+3.2) 50.8 (+0.0) GDR (SVM) 70.3 (+9.2) 81.2 (+23.2) 52.4 (+2.5) 41.9 (+10.8) GDR (MLP) 69.7(+12.7) 78.5 (+22.5) 75.5 (+4.8) 40.5 (-2.5) Planetoid 64.7 75.7 72.2 - GCN 70.3 81.1 79.0 39.2 GDR (GCN) 70.8 (+0.5) 82.2 (+1.1) 79.4 (+0.4) 39.5 (+0.3)
 Method Citeseer Cora Pubmed Wikipedia Uniform 7.7 13.0 18.0 28.7 GDR (Uniform) 50.6 (+42.9) 71.8 (+58.8) 73.2 (+55.2) 31.4 (+2.7) Projection 61.8 59.0 72.0 32.5 RF 60.3 58.9 68.8 50.8 SVM 61.1 58.0 49.9 31.0 MLP 57.0 56.0 70.7 43.0 GDR (Projection) 70.4 (+8.7) 79.7 (+20.7) 75.8 (+3.8) 36.9 (+4.4) GDR (RF) 70.5 (+10.2) 78.7 (+19.8) 72.2 (+3.2) 50.8 (+0.0) GDR (SVM) 70.3 (+9.2) 81.2 (+23.2) 52.4 (+2.5) 41.9 (+10.8) GDR (MLP) 69.7(+12.7) 78.5 (+22.5) 75.5 (+4.8) 40.5 (-2.5) Planetoid 64.7 75.7 72.2 - GCN 70.3 81.1 79.0 39.2 GDR (GCN) 70.8 (+0.5) 82.2 (+1.1) 79.4 (+0.4) 39.5 (+0.3)
Percentage classification accuracy of GCN and its extension diff-GCN, which has an explicit diffusion operator (16)
 Model Citeseer Cora Pubmed Wikipedia GCN 70.3 81.1 79.0 34.1 diff-GCN 71.9 82.3 79.3 45.9
 Model Citeseer Cora Pubmed Wikipedia GCN 70.3 81.1 79.0 34.1 diff-GCN 71.9 82.3 79.3 45.9
Accuracy of GDR using the undirected, directed, and reverse directed graphs of the Cora dataset
 Undirected Directed (fw) Directed (bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ GDR (Projection) 79.7 62.1 64.6 GDR (RF) 78.7 58.0 57.6 GDR (SVM) 81.2 63.6 62.1 GDR (MLP) 78.5 57.3 56.4
 Undirected Directed (fw) Directed (bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ GDR (Projection) 79.7 62.1 64.6 GDR (RF) 78.7 58.0 57.6 GDR (SVM) 81.2 63.6 62.1 GDR (MLP) 78.5 57.3 56.4
Accuracy of GCN and diff-GCN using the undirected, directed, reverse directed, and bidirectional (augmented) graphs of the Cora dataset. The highest accuracy is achieved by diff-GCN with the augmented graph (boldface)
 Undirected Directed (fw) Directed (bw) Augmented (fw, bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ $\begin{bmatrix} A_\text{dir} \, A_\text{dir}^T \end{bmatrix}$ GCN 81.1 67.4 79.8 79.9 diff-GCN 82.3 80.3 81.7 83.0
 Undirected Directed (fw) Directed (bw) Augmented (fw, bw) Method $A$ $A_\text{dir}$ $A_\text{dir}^T$ $\begin{bmatrix} A_\text{dir} \, A_\text{dir}^T \end{bmatrix}$ GCN 81.1 67.4 79.8 79.9 diff-GCN 82.3 80.3 81.7 83.0
 [1] Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2020  doi: 10.3934/jcd.2021006 [2] Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 [3] Gökhan Mutlu. On the quotient quantum graph with respect to the regular representation. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020295 [4] Yunping Jiang. Global graph of metric entropy on expanding Blaschke products. Discrete & Continuous Dynamical Systems - A, 2021, 41 (3) : 1469-1482. doi: 10.3934/dcds.2020325 [5] Leslaw Skrzypek, Yuncheng You. Feedback synchronization of FHN cellular neural networks. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021001 [6] Vaibhav Mehandiratta, Mani Mehra, Günter Leugering. Fractional optimal control problems on a star graph: Optimality system and numerical solution. Mathematical Control & Related Fields, 2021, 11 (1) : 189-209. doi: 10.3934/mcrf.2020033 [7] Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060 [8] Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 [9] Raffaele Folino, Ramón G. Plaza, Marta Strani. Long time dynamics of solutions to $p$-Laplacian diffusion problems with bistable reaction terms. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020403 [10] Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems & Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045 [11] Vaibhav Mehandiratta, Mani Mehra, Günter Leugering. Existence results and stability analysis for a nonlinear fractional boundary value problem on a circular ring with an attached edge : A study of fractional calculus on metric graph. Networks & Heterogeneous Media, 2021  doi: 10.3934/nhm.2021003 [12] Bingyan Liu, Xiongbing Ye, Xianzhou Dong, Lei Ni. Branching improved Deep Q Networks for solving pursuit-evasion strategy solution of spacecraft. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021016 [13] Xin Zhao, Tao Feng, Liang Wang, Zhipeng Qiu. Threshold dynamics and sensitivity analysis of a stochastic semi-Markov switched SIRS epidemic model with nonlinear incidence and vaccination. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021010 [14] Guillaume Cantin, M. A. Aziz-Alaoui. Dimension estimate of attractors for complex networks of reaction-diffusion systems applied to an ecological model. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020283 [15] Weiwei Liu, Jinliang Wang, Yuming Chen. Threshold dynamics of a delayed nonlocal reaction-diffusion cholera model. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020316 [16] Hui Zhao, Zhengrong Liu, Yiren Chen. Global dynamics of a chemotaxis model with signal-dependent diffusion and sensitivity. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021011 [17] Xueli Bai, Fang Li. Global dynamics of competition models with nonsymmetric nonlocal dispersals when one diffusion rate is small. Discrete & Continuous Dynamical Systems - A, 2020, 40 (6) : 3075-3092. doi: 10.3934/dcds.2020035 [18] Xiaoxian Tang, Jie Wang. Bistability of sequestration networks. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1337-1357. doi: 10.3934/dcdsb.2020165 [19] Rajendra K C Khatri, Brendan J Caseria, Yifei Lou, Guanghua Xiao, Yan Cao. Automatic extraction of cell nuclei using dilated convolutional network. Inverse Problems & Imaging, 2021, 15 (1) : 27-40. doi: 10.3934/ipi.2020049 [20] Kaixuan Zhu, Ji Li, Yongqin Xie, Mingji Zhang. Dynamics of non-autonomous fractional reaction-diffusion equations on $\mathbb{R}^{N}$ driven by multiplicative noise. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020376

Impact Factor: