doi: 10.3934/mfc.2020022

## Word Sense disambiguation based on stretchable matching of the semantic template

 1 School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province, China 2 Faculty of Library, Information and Media Science, University of Tsukuba, Tsukuba, Japan

* Corresponding author: Degen Huang

Received  January 2020 Revised  August 2020 Published  September 2020

Fund Project: The second author is supported by National Natural Science Foundation of China grant No.6167212

It is evident that the traditional hard matching of a fixed-length template cannot satisfy the nearly indefinite variations in natural language. This issue mainly results from three major problems of the traditional matching mode: 1) in matching with a short template, the context of natural language cannot be effectively captured; 2) in matching with a long template, serious data sparsity will lead to a low success rate of template matching (i.e., low recall); and 3) due to a lack of flexible matching ability, traditional hard matching is more prone to failure. Therefore, this paper proposed a novel method of stretchable matching of the semantic template (SMOST) to deal with the above problems. We have applied this method to word sense disambiguation in the natural language processing field. In the same case of using only the SemCor corpus, the result of our system is very close to the best result of existing systems, which shows the effectiveness of new proposed method.

Citation: Wei Wang, Degen Huang, Haitao Yu. Word Sense disambiguation based on stretchable matching of the semantic template. Mathematical Foundations of Computing, doi: 10.3934/mfc.2020022
One-to-one excellent matching
One-to-one poor matching
Good matching with stretched template (two random words in test sentence)
Good matching with stretched template (two random words in template)
Good matching with stretched template (two random words and one obstructing word in test sentence)
Words of a test sentence and their sense items
A template indexed by the word in a test sentence
Matching all word senses in a test sentence for all word senses in the template
Ordering of the node numbers of all matched word senses in a test sentence
Obtaining the word sense score by the matched node chain
Obtaining the final word sense score by the max score
Obtaining the template through word sense instead of word
Matching a Sense of Word (Algorithm 1)
Obtaining the Score of a Matched Node Chain (Algorithm 2)
Obtaining the Final Word Sense (Algorithm 3)
Comparison of F1 scores on our systems with different algorithms on five test sets
 Res. Different algorithms Sen2 Sen3 Sem07 Sem13 Sem15 SemCor 3.0 SMOST Max.score P1 65.8 63.9 57.6 62.0 65.6 SMOST Max.score P2 66.3 64.6 57.8 61.7 65.5 SMOST Max.vote P1 68.0 67.9 59.8 64.2 70.0 SMOST Max.vote P2 68.8 68.3 60.2 64.2 67.5 SMOST Max.vote*score P1 67.7 67.1 58.9 64.7 69.2 SMOST Max.vote*score P2 68.9 68.0 61.1 64.4 66.6
 Res. Different algorithms Sen2 Sen3 Sem07 Sem13 Sem15 SemCor 3.0 SMOST Max.score P1 65.8 63.9 57.6 62.0 65.6 SMOST Max.score P2 66.3 64.6 57.8 61.7 65.5 SMOST Max.vote P1 68.0 67.9 59.8 64.2 70.0 SMOST Max.vote P2 68.8 68.3 60.2 64.2 67.5 SMOST Max.vote*score P1 67.7 67.1 58.9 64.7 69.2 SMOST Max.vote*score P2 68.9 68.0 61.1 64.4 66.6
Comparison of F1 scores on several systems using supervised learning method on five test sets
 Res. System Sen2 Sen3 Sem07 Sem13 Sem15 SemCor 3.0 MFS 65.6 66.0 54.5 63.8 67.1 IMS baseline(Zhong2010) 70.9 69.3 61.3 65.3 69.5 BLSTM(Raganato2017) 71.4 68.8 61.8 65.6 69.2 Seq2Seq(Raganato2017) 68.5 67.9 60.9 64.3 67.3 SMOST (this paper) 68.9 68.3 61.1 64.7 70.0
 Res. System Sen2 Sen3 Sem07 Sem13 Sem15 SemCor 3.0 MFS 65.6 66.0 54.5 63.8 67.1 IMS baseline(Zhong2010) 70.9 69.3 61.3 65.3 69.5 BLSTM(Raganato2017) 71.4 68.8 61.8 65.6 69.2 Seq2Seq(Raganato2017) 68.5 67.9 60.9 64.3 67.3 SMOST (this paper) 68.9 68.3 61.1 64.7 70.0
Comparison of F1 scores on the systems using template matching method on Sen3 test set
 Resource System Recall Precision F1 multi-res. SSI (Navigli2004) 68.40 68.50 68.45 SSI-10words context (Hwang2008) 90.96 57.30 70.31 SemCor2.1 A-RS-10words context(Hwang2008) 56.80 75.53 64.84 +WordNet2.1 SMOST (this paper) 100.0 59.84 74.87
 Resource System Recall Precision F1 multi-res. SSI (Navigli2004) 68.40 68.50 68.45 SSI-10words context (Hwang2008) 90.96 57.30 70.31 SemCor2.1 A-RS-10words context(Hwang2008) 56.80 75.53 64.84 +WordNet2.1 SMOST (this paper) 100.0 59.84 74.87
