Formalization of Morphological Rules for Kazakh Nouns in the New Latin Alphabet

Lena Zhetkenbay, Altynbek Sharipbay, Bibigul Razakhova, Gulmira Bekmanova, Alibek Barlybayev, Aizhan Nazyrova, Banu Yergesh

Abstract


This study presents a hybrid computational model for formalizing and predicting morphological inflections of Kazakh nouns written in the new Latin alphabet. The motivation stems from limitations in previous systems based on Cyrillic orthography, which often misrepresented key phonological features such as vowel harmony and consonant assimilation. The main objective is to develop a linguistically informed and computationally efficient system to support Natural Language Processing (NLP) for Kazakh during its transition to Latin script. The methodology combines rule-based grammar formalization with a machine learning approach, specifically a Bayesian Regulation Backpropagation Neural Network (BR-BPNN). A manually curated dataset of 1,000 Latin-script Kazakh nouns was annotated for various morphological forms. Each word was encoded at the character level using a custom dictionary (kazlat_dict), capturing the final four letters as feature vectors. Formal logic and regular expressions were used to model morphological rules such as pluralization and case endings, incorporating vowel harmony, consonant softness, and sonority. These rules provided the training labels for the BR-BPNN model. The trained model achieved 91.5% accuracy, 89.4% precision, and a correlation coefficient (R) above 0.98, confirming the effectiveness of the hybrid system. A user interface prototype was developed to demonstrate practical utility, enabling users to input root nouns and receive suffix predictions with confidence scores and linguistic explanations. The novelty of this work lies in integrating linguistic theory with machine learning for a low-resource Turkic language. It offers a foundation for intelligent Kazakh language tools including spell checkers, grammar correctors, and educational platforms. Future work will extend the system to other parts of speech and explore contextual modeling to improve handling of ambiguous or irregular forms.


Article Metrics

Abstract: 9 Viewers PDF: 56 Viewers

Keywords


Alphabet; Sound System; Nouns; Conjunctions; Kazakh Language; Suffixes; Metalanguage; Morphological Rules; Natural Language Processing; Formal Model

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0