Advanced Search
Article Contents
Article Contents

Multiple-instance learning for text categorization based on semantic representation


Abstract Full Text(HTML) Figure(2) / Table(2) Related Papers Cited by
  • Text categorization is the fundamental bricks of other related researches in NLP. Up to now, researchers have proposed many effective text categorization methods and gained well performance. However, these methods are generally based on the raw features or low level features, e.g., tf or tfidf, while neglecting the semantic structures between words. Complex semantic information can influence the precision of text categorization. In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes. We represent the document as multiple instances based on word2vec. Experiments validate the effectiveness of proposed method compared with those state-of-the-art text categorization methods.

    Mathematics Subject Classification: 97R40.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  The structure of Bag-of-Words and Skip-Gram

    Figure 2.  Pseudo-code for mi-SVM

    Table 1.  Results of experiments on sougouC

    Model car finance IT health sport
    SVM + TF-IDF 0.8473 0.8420 0.8363 0.8326 0.8737
    SVM + Word2vec 0.9303 0.8571 0.8755 0.9163 0.9828
    mi-SVM + Word2vec 0.9599 0.8904 0.8943 0.9325 0.9842
     | Show Table
    DownLoad: CSV

    Table 2.  Results of experiments on 20newsgroup

    Model SVM+tf-idf SVM+Word2vec mi-SVM+Word2vec
    Average 0.8508 0.8421 0.8619
     | Show Table
    DownLoad: CSV
  • [1] J. Amores, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence, 201 (2013), 81-105.  doi: 10.1016/j.artint.2013.06.003.
    [2] S. AndrewsI. Tsochantaridis and T. Hofmann, Support vector machines for multiple-instance learning, Advances in Neural Information Processing Systems, 15 (2002), 561-568. 
    [3] W.B. Cavnar and J.M. Trenkle, et al., N-gram-based text categorization, Ann Arbor MI, 48113 (1994), 161-175. 
    [4] Y. Chevaleyre and J. D. Zucker, Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. application to the mutagenesis problem, In Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, (2001), 204–214. doi: 10.1007/3-540-45153-6_20.
    [5] T.G. DietterichR.H. Lathrop and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, 89 (1997), 31-71.  doi: 10.1016/S0004-3702(96)00034-3.
    [6] S. Dumais, Using svms for text categorization, IEEE Expert, 13 (1998), 21-23. 
    [7] N. Ishii, T. Murai, T. Yamada and Y. Bao, Text classification by combining grouping, lsa and knn, In Ieee/acis International Conference on Computer and Information Science and Ieee/acis International Workshop on Component-Based Software Engineering, software Architecture and Reuse, (2006), 148–154. doi: 10.1109/ICIS-COMSAR.2006.81.
    [8] Q. Kuang and X. Xu, Improvement and application of tfidf method based on text classification, International Conference on Internet Technology and Applications, (2010), 1-4. 
    [9] S. LaiL. XuK. Liu and J. Zhao, Recurrent convolutional neural networks for text classification, AAAI, (2015), 2267-2273. 
    [10] O. Maron and T. Lozano-Pérez, A framework for multiple-instance learning, Advances in Neural Information Processing Systems, 200 (1998), 570-576. 
    [11] A. Mccallum and K. Nigam, A comparison of event models for naive bayes text classification, In AAAI-98 Workshop On Learning For Text Categorization, 62 (2009), 41-48. 
    [12] T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, Computer Science, 2013.
    [13] T. MikolovI. SutskeverK. ChenG. Corrado and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, 26 (2013), 3111-3119. 
    [14] J. Wang and J.D. Zucker, Solving multiple-instance problem: A lazy learning approach, Proc.international Conf.on Machine Learning, (2000), 1119-1126. 
    [15] M.L. Zhang and Z.H. Zhou, Improve multi-instance neural networks through feature selection, Neural Processing Letters, 19 (2004), 1-10.  doi: 10.1023/B:NEPL.0000016836.03614.9f.
    [16] Z. H. Zhou and M. L. Zhang, Neural networks for multi-instance learning, In International Conference on Intelligent Information Technology 2002.
  • 加载中




Article Metrics

HTML views(801) PDF downloads(199) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint