# American Institute of Mathematical Sciences

January  2017, 2(1): 69-75. doi: 10.3934/bdia.2017009

## Multiple-instance learning for text categorization based on semantic representation

 National Key Laboratory for Novel Software Technology, Nanjing University, China

Published  September 2017

Text categorization is the fundamental bricks of other related researches in NLP. Up to now, researchers have proposed many effective text categorization methods and gained well performance. However, these methods are generally based on the raw features or low level features, e.g., tf or tfidf, while neglecting the semantic structures between words. Complex semantic information can influence the precision of text categorization. In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes. We represent the document as multiple instances based on word2vec. Experiments validate the effectiveness of proposed method compared with those state-of-the-art text categorization methods.

Citation: Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan. Multiple-instance learning for text categorization based on semantic representation. Big Data & Information Analytics, 2017, 2 (1) : 69-75. doi: 10.3934/bdia.2017009
##### References:

show all references

##### References:
The structure of Bag-of-Words and Skip-Gram
Pseudo-code for mi-SVM
Results of experiments on sougouC
 Model car finance IT health sport SVM + TF-IDF 0.8473 0.8420 0.8363 0.8326 0.8737 SVM + Word2vec 0.9303 0.8571 0.8755 0.9163 0.9828 mi-SVM + Word2vec 0.9599 0.8904 0.8943 0.9325 0.9842
 Model car finance IT health sport SVM + TF-IDF 0.8473 0.8420 0.8363 0.8326 0.8737 SVM + Word2vec 0.9303 0.8571 0.8755 0.9163 0.9828 mi-SVM + Word2vec 0.9599 0.8904 0.8943 0.9325 0.9842
Results of experiments on 20newsgroup
 Model SVM+tf-idf SVM+Word2vec mi-SVM+Word2vec Average 0.8508 0.8421 0.8619
 Model SVM+tf-idf SVM+Word2vec mi-SVM+Word2vec Average 0.8508 0.8421 0.8619
 [1] Cheng Zheng. Sparse equidistribution of unipotent orbits in finite-volume quotients of $\text{PSL}(2,\mathbb R)$. Journal of Modern Dynamics, 2016, 10: 1-21. doi: 10.3934/jmd.2016.10.1 [2] Clark Butler, Kiho Park. Thermodynamic formalism of $\text{GL}_2(\mathbb{R})$-cocycles with canonical holonomies. Discrete & Continuous Dynamical Systems, 2021, 41 (5) : 2141-2166. doi: 10.3934/dcds.2020356 [3] Zhongjie Liu, Duanzhi Zhang. Brake orbits on compact symmetric dynamically convex reversible hypersurfaces on $\mathbb{R}^\text{2n}$. Discrete & Continuous Dynamical Systems, 2019, 39 (7) : 4187-4206. doi: 10.3934/dcds.2019169 [4] J. Kent Poots, Nick Cercone. First steps in the investigation of automated text annotation with pictures. Big Data & Information Analytics, 2017, 2 (2) : 97-106. doi: 10.3934/bdia.2017001 [5] Luigi C. Berselli, Placido Longo. Classical solutions for the system $\bf {\text{curl}\, v = g}$, with vanishing Dirichlet boundary conditions. Discrete & Continuous Dynamical Systems - S, 2019, 12 (2) : 215-229. doi: 10.3934/dcdss.2019015 [6] Editorial Office. Retraction: Xiaohong Zhu, Lihe Zhou, Zili Yang and Joyati Debnath, A new text information extraction algorithm of video image under multimedia environment. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1265-1265. doi: 10.3934/dcdss.2019087 [7] Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008 [8] Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial & Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071 [9] Changming Song, Yun Wang. Nonlocal latent low rank sparse representation for single image super resolution via self-similarity learning. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021017 [10] Stefan Erickson, Michael J. Jacobson, Jr., Andreas Stein. Explicit formulas for real hyperelliptic curves of genus 2 in affine representation. Advances in Mathematics of Communications, 2011, 5 (4) : 623-666. doi: 10.3934/amc.2011.5.623 [11] Carlos Castillo-Garsow. The role of multiple modeling perspectives in students' learning of exponential growth. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1437-1453. doi: 10.3934/mbe.2013.10.1437 [12] Nikolaos S. Papageorgiou, Calogero Vetro, Francesca Vetro. Multiple solutions for (p, 2)-equations at resonance. Discrete & Continuous Dynamical Systems - S, 2019, 12 (2) : 347-374. doi: 10.3934/dcdss.2019024 [13] Yuan Xu, Xin Jin, Saiwei Wang, Yang Tang. Optimal synchronization control of multiple euler-lagrange systems via event-triggered reinforcement learning. Discrete & Continuous Dynamical Systems - S, 2021, 14 (4) : 1495-1518. doi: 10.3934/dcdss.2020377 [14] A. Alamo, J. M. Sanz-Serna. Word combinatorics for stochastic differential equations: Splitting integrators. Communications on Pure & Applied Analysis, 2019, 18 (4) : 2163-2195. doi: 10.3934/cpaa.2019097 [15] Ronnie Pavlov, Pascal Vanier. The relationship between word complexity and computational complexity in subshifts. Discrete & Continuous Dynamical Systems, 2021, 41 (4) : 1627-1648. doi: 10.3934/dcds.2020334 [16] Chuandong Li, Fali Ma, Tingwen Huang. 2-D analysis based iterative learning control for linear discrete-time systems with time delay. Journal of Industrial & Management Optimization, 2011, 7 (1) : 175-181. doi: 10.3934/jimo.2011.7.175 [17] Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057 [18] Yuan Cao, Yonglin Cao, Hai Q. Dinh, Ramakrishna Bandi, Fang-Wei Fu. An explicit representation and enumeration for negacyclic codes of length $2^kn$ over $\mathbb{Z}_4+u\mathbb{Z}_4$. Advances in Mathematics of Communications, 2021, 15 (2) : 291-309. doi: 10.3934/amc.2020067 [19] Xiaoming Yan, Ping Cao, Minghui Zhang, Ke Liu. The optimal production and sales policy for a new product with negative word-of-mouth. Journal of Industrial & Management Optimization, 2011, 7 (1) : 117-137. doi: 10.3934/jimo.2011.7.117 [20] José Gómez-Torrecillas, F. J. Lobillo, Gabriel Navarro. Convolutional codes with a matrix-algebra word-ambient. Advances in Mathematics of Communications, 2016, 10 (1) : 29-43. doi: 10.3934/amc.2016.10.29

Impact Factor: