DEVELOPMENT OF STEMMING ALGORITHM FOR GURAGEGNA TEXT

Ebrahim, Mehbub

st. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/7697

Title:	DEVELOPMENT OF STEMMING ALGORITHM FOR GURAGEGNA TEXT
Authors:	Ebrahim, Mehbub
Keywords:	stemming algorithm; Guragegna stemmer; context-sensitive stemmer; iterative stemmer; Guragegna language
Issue Date:	Jun-2023
Publisher:	ST. MARY’S UNIVERSITY
Abstract:	The process of stemming involves stripping a word of its inflectional and derived variations. It is crucial for many applications of natural language processing. When analyzing the importance of page for user query which only specifies one form, the varied word structures used in searching and indexing should be anticipated. Conflation methods can help improve the efficiency of an IR system by condensing variant phrases into a single form. In order to standardize as many similar phrases and word patterns as possible. That may be utilized in the retrieval procedure, stemmers are employed in information retrieval. During this type of research work, a solid awareness of the Guragegna grammar in addition an examination of the language's inflectional and derivational affix was required. The Gurage language generates several word forms using stems by use of affixation and reduplication (final, total, and frequentative). Prefix, suffix, and infix are frequently used affixations. Gurage often concatenates affixes, which can lead to almost large words with a lot of semantic content. This study introduces the first stemming algorithm that conflates Guragegna phrase variants. Python programming was used in the creation of the Gurage stemmer. The researcher created little rule sets for related affixes in an attempt to follow an algorithm with a straightforward structure. In order to develop the stemmer, a list of stop words and the Experimental text document were both acquired from various sources along with a research article that covers the morphology of the Gurage language. The iterative, context-sensitive, and recoding methods used in this study's stemmer eliminate prefix, suffix, and reduplicated letters that are final, total, and frequentative reduplicates. Prefix, suffix, and then letter reduplication were applied as part of this experiment's removing technique. in the evaluation process is contained in the Data set. The experiment text has 1,933 words, of which 1266 resulted from the stemming procedure, out of a total of 1266.The number of words successfully stemmed is 1097, achieving an accuracy of 86.65%. 13.34% of the stemmed words were wrongly stemmed. Over stemming accounts for 7.97% (101) of the terms, while under stemming accounts for 5.37% .
URI:	. http://hdl.handle.net/123456789/7697
Appears in Collections:	Master of computer science

Files in This Item:

File	Description	Size	Format
DEVELOPMENT OF STEMMING ALGORITHM FOR GURAGE_230710_104510.pdf		838.83 kB	Adobe PDF	View/Open

Show full item record