# American Institute of Mathematical Sciences

• Previous Article
A mathematical analysis for the forecast research on tourism carrying capacity to promote the effective and sustainable development of tourism
• DCDS-S Home
• This Issue
• Next Article
Collaborative filtering recommendation algorithm towards intelligent community
August & September  2019, 12(4&5): 823-836. doi: 10.3934/dcdss.2019055

## Uyghur morphological analysis using joint conditional random fields: Based on small scaled corpus

 1 Xinjiang Technical Institute of Physical and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China 2 University of Chinese Academy of Sciences, Beijing 100049, China 3 Institute of Mathematics and Information of Hotan Teachers College, Hotan 848000, China

* Corresponding author: Ghalip Abdukerim

Received  June 2017 Revised  October 2017 Published  November 2018

As a fundamental research in the field of natural language processing, the Uyghur morphological analysis is used mainly to determine the part of speech (POS) and segmental morphemes (stem and affix) of a word in a given sentence, as well as to automatically annotate the grammatical function of the morphemes based on the context. It is necessary to provide various information for other tasks of natural language processing including syntactic analysis, machine translation, automatic summarization, and semantic analysis, etc. In order to increase the morphological analysis efficiency, this paper puts forward a hybrid approach to create a statistical model for Uyghur morphological tagging through a small-scale corpus. Experimental results show that this plan can obtain an overall accuracy of 92.58 % with a limited training corpus.

Citation: Ghalip Abdukerim, Eziz Tursun, Yating Yang, Xiao Li. Uyghur morphological analysis using joint conditional random fields: Based on small scaled corpus. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 823-836. doi: 10.3934/dcdss.2019055
##### References:

show all references

##### References:
The morphological analysis result and hierarchical relationship of a Uyghur sentence
The Architecture of a semi-supervised morphological analysis based on the hybrid approach
Morphological Tag Decoding Process of Words in the Sentence
The Relationship between Parameter $\beta$ and Accuracy
Feature Template of POS Tagging Model
 Features Description ${{w}_{i-2}}{{pos}_{i}}$, ${{w}_{i-1}}{{pos}_{i}}$, ${{w}_{i}}{{pos}_{i}}$, ${{w}_{i+1}}{{pos}_{i}}$, ${{w}_{i+2}}{{pos}_{i}}$ Unary context features of the word ${{w}_{i-2}}{{w}_{i-1}}{{pos}_{i}}$, ${{w}_{i-1}}{{w}_{i}}{{pos}_{i}}$, ${{w}_{i}}{{w}_{i+1}}{{pos}_{i}}$, ${{w}_{i+1}}{{w}_{i+2}}{{pos}_{i}}$, ${{w}_{i-1}}{{w}_{i+1}}{{pos}_{i}}$ Binary context features of the word $h_1(w_i){{pos}_{i}}$, $h_2(w_i){{pos}_{i}}$, $h_3(w_i){{pos}_{i}}$, $h_4(w_i){{pos}_{i}}$, $h_5(w_i){{pos}_{i}}$ n characters selected from the beginning of the word $t_1(w_i){{pos}_{i}}$, $t_2(w_i){{pos}_{i}}$, $t_3(w_i){{pos}_{i}}$, $t_4(w_i){{pos}_{i}}$, $t_5(w_i){{pos}_{i}}$ n characters selected from the end of the word ${{pos}_{i-1}}{{pos}_{i}}$ POS tag transition feature
 Features Description ${{w}_{i-2}}{{pos}_{i}}$, ${{w}_{i-1}}{{pos}_{i}}$, ${{w}_{i}}{{pos}_{i}}$, ${{w}_{i+1}}{{pos}_{i}}$, ${{w}_{i+2}}{{pos}_{i}}$ Unary context features of the word ${{w}_{i-2}}{{w}_{i-1}}{{pos}_{i}}$, ${{w}_{i-1}}{{w}_{i}}{{pos}_{i}}$, ${{w}_{i}}{{w}_{i+1}}{{pos}_{i}}$, ${{w}_{i+1}}{{w}_{i+2}}{{pos}_{i}}$, ${{w}_{i-1}}{{w}_{i+1}}{{pos}_{i}}$ Binary context features of the word $h_1(w_i){{pos}_{i}}$, $h_2(w_i){{pos}_{i}}$, $h_3(w_i){{pos}_{i}}$, $h_4(w_i){{pos}_{i}}$, $h_5(w_i){{pos}_{i}}$ n characters selected from the beginning of the word $t_1(w_i){{pos}_{i}}$, $t_2(w_i){{pos}_{i}}$, $t_3(w_i){{pos}_{i}}$, $t_4(w_i){{pos}_{i}}$, $t_5(w_i){{pos}_{i}}$ n characters selected from the end of the word ${{pos}_{i-1}}{{pos}_{i}}$ POS tag transition feature
Feature Template of the Morphological Tagging Model
 Features Description ${{m}_{i-2}}{{t}_{i}}$, ${{m}_{i-1}}{{t}_{i}}$, ${{m}_{i}}{{t}_{i}}$, ${{m}_{i+1}}{{t}_{i}}$, ${{m}_{i+2}}{{t}_{i}}$ Unary context features of the morpheme ${{m}_{i-2}}{{m}_{i-1}}{{t}_{i}}$, ${{m}_{i-1}}{{m}_{i}}{{t}_{i}}$, ${{m}_{i}}{{m}_{i+1}}{{t}_{i}}$, ${{m}_{i+1}}{{m}_{i+2}}{{t}_{i}}$, ${{m}_{i-1}}{{m}_{i+1}}{{t}_{i}}$ Binary context features of the morpheme ${{t}_{i-1}}{{t}_{i}}$ Morphological tag transition feature
 Features Description ${{m}_{i-2}}{{t}_{i}}$, ${{m}_{i-1}}{{t}_{i}}$, ${{m}_{i}}{{t}_{i}}$, ${{m}_{i+1}}{{t}_{i}}$, ${{m}_{i+2}}{{t}_{i}}$ Unary context features of the morpheme ${{m}_{i-2}}{{m}_{i-1}}{{t}_{i}}$, ${{m}_{i-1}}{{m}_{i}}{{t}_{i}}$, ${{m}_{i}}{{m}_{i+1}}{{t}_{i}}$, ${{m}_{i+1}}{{m}_{i+2}}{{t}_{i}}$, ${{m}_{i-1}}{{m}_{i+1}}{{t}_{i}}$ Binary context features of the morpheme ${{t}_{i-1}}{{t}_{i}}$ Morphological tag transition feature
List of Morphological Tag Candidates of Words in the Sentence
Manually Tagged Corpus Format and Content Example
Details of Experimental Data
 Number of sentences Number of words (including punctuation marks) Number of Uyghur words Training set 1000 12433 10391 Development set 200 2564 2151 Test set 200 2492 2075
 Number of sentences Number of words (including punctuation marks) Number of Uyghur words Training set 1000 12433 10391 Development set 200 2564 2151 Test set 200 2492 2075
Experimental Results
 Method Accuracy (%) Stemming Morpheme segmentation POS Overall Tag sequence Markov model 90.18 83.25 86.17 75.13 Joint CRF model 91.98 85.79 92.7 77.95 Tag sequence Markov model, $\alpha$=0.95 92.65 88.47 88.12 79.65 Joint CRF model, $\alpha$=0.9 92.85 89.76 92.6 80.73
 Method Accuracy (%) Stemming Morpheme segmentation POS Overall Tag sequence Markov model 90.18 83.25 86.17 75.13 Joint CRF model 91.98 85.79 92.7 77.95 Tag sequence Markov model, $\alpha$=0.95 92.65 88.47 88.12 79.65 Joint CRF model, $\alpha$=0.9 92.85 89.76 92.6 80.73
Analysis for the Influence of Filtering Rules on Morphological Tagging
 Method(Joint CRF model, $\alpha$=0.9, $\beta$=0.1) Accuracy (%) Stemming Morpheme segmentation POS Overall Joint CRF model, $\alpha$=0.9, $\beta$=0.1, When filtering rules are not used 92.85 89.76 92.6 80.73 Joint CRF model, $\alpha$=0.9, $\beta$=0.1, When filtering rules are used 97.4 94.58 96.35 92.58 Tag sequence transition model, $\alpha$=0.95, When filtering rules are used 94.35 93.22 94.78 91.81
 Method(Joint CRF model, $\alpha$=0.9, $\beta$=0.1) Accuracy (%) Stemming Morpheme segmentation POS Overall Joint CRF model, $\alpha$=0.9, $\beta$=0.1, When filtering rules are not used 92.85 89.76 92.6 80.73 Joint CRF model, $\alpha$=0.9, $\beta$=0.1, When filtering rules are used 97.4 94.58 96.35 92.58 Tag sequence transition model, $\alpha$=0.95, When filtering rules are used 94.35 93.22 94.78 91.81
 [1] Seung-Yeal Ha, Shi Jin. Local sensitivity analysis for the Cucker-Smale model with random inputs. Kinetic & Related Models, 2018, 11 (4) : 859-889. doi: 10.3934/krm.2018034 [2] Lekbir Afraites, Abdelghafour Atlas, Fahd Karami, Driss Meskine. Some class of parabolic systems applied to image processing. Discrete & Continuous Dynamical Systems - B, 2016, 21 (6) : 1671-1687. doi: 10.3934/dcdsb.2016017 [3] Davi Obata. Symmetries of vector fields: The diffeomorphism centralizer. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021063 [4] Jean-François Biasse. Improvements in the computation of ideal class groups of imaginary quadratic number fields. Advances in Mathematics of Communications, 2010, 4 (2) : 141-154. doi: 10.3934/amc.2010.4.141 [5] Lei Lei, Wenli Ren, Cuiling Fan. The differential spectrum of a class of power functions over finite fields. Advances in Mathematics of Communications, 2021, 15 (3) : 525-537. doi: 10.3934/amc.2020080 [6] Wei Xi Li, Chao Jiang Xu. Subellipticity of some complex vector fields related to the Witten Laplacian. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021047 [7] Scott Schmieding, Rodrigo Treviño. Random substitution tilings and deviation phenomena. Discrete & Continuous Dynamical Systems, 2021, 41 (8) : 3869-3902. doi: 10.3934/dcds.2021020 [8] Reza Lotfi, Yahia Zare Mehrjerdi, Mir Saman Pishvaee, Ahmad Sadeghieh, Gerhard-Wilhelm Weber. A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 221-253. doi: 10.3934/naco.2020023 [9] Mehmet Duran Toksari, Emel Kizilkaya Aydogan, Berrin Atalay, Saziye Sari. Some scheduling problems with sum of logarithm processing times based learning effect and exponential past sequence dependent delivery times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021044 [10] Nishant Sinha. Internal state recovery of Espresso stream cipher using conditional sampling resistance and TMDTO attack. Advances in Mathematics of Communications, 2021, 15 (3) : 539-556. doi: 10.3934/amc.2020081 [11] Guillaume Bal, Wenjia Jing. Homogenization and corrector theory for linear transport in random media. Discrete & Continuous Dynamical Systems, 2010, 28 (4) : 1311-1343. doi: 10.3934/dcds.2010.28.1311 [12] Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225 [13] Pablo D. Carrasco, Túlio Vales. A symmetric Random Walk defined by the time-one map of a geodesic flow. Discrete & Continuous Dynamical Systems, 2021, 41 (6) : 2891-2905. doi: 10.3934/dcds.2020390 [14] Yuta Tanoue. Improved Hoeffding inequality for dependent bounded or sub-Gaussian random variables. Probability, Uncertainty and Quantitative Risk, 2021, 6 (1) : 53-60. doi: 10.3934/puqr.2021003 [15] Florian Dorsch, Hermann Schulz-Baldes. Random Möbius dynamics on the unit disc and perturbation theory for Lyapunov exponents. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021076 [16] Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 255-267. doi: 10.3934/naco.2020024 [17] Fumihiko Nakamura. Asymptotic behavior of non-expanding piecewise linear maps in the presence of random noise. Discrete & Continuous Dynamical Systems - B, 2018, 23 (6) : 2457-2473. doi: 10.3934/dcdsb.2018055 [18] Lingyu Li, Zhang Chen. Asymptotic behavior of non-autonomous random Ginzburg-Landau equation driven by colored noise. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3303-3333. doi: 10.3934/dcdsb.2020233 [19] Xuping Zhang. Pullback random attractors for fractional stochastic $p$-Laplacian equation with delay and multiplicative noise. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021107 [20] Anhui Gu. Weak pullback mean random attractors for non-autonomous $p$-Laplacian equations. Discrete & Continuous Dynamical Systems - B, 2021, 26 (7) : 3863-3878. doi: 10.3934/dcdsb.2020266

2019 Impact Factor: 1.233