WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE USING DEEP LEARNING

Wolderufael, Yared

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wolderufael, Yared	-
dc.date.accessioned	2024-05-15T07:20:09Z	-
dc.date.available	2024-05-15T07:20:09Z	-
dc.date.issued	2024-02	-
dc.identifier.uri	http://hdl.handle.net/123456789/7888	-
dc.description.abstract	Textual communication is globally prevalent, with individuals relying on email and social networking platforms for information exchange. Word prediction systems offer a time-saving solution by anticipating the next word during data entry. However, typing complete text can be time-consuming. Despite the development of language models for various languages, research on prediction models for Amharic is limited. Existing studies primarily utilize statistical language models for Amharic prediction, which struggle with data sparsity and fail to capture long-term dependencies. To address these limitations, this study proposes a deep learning approach for Amharic next-word prediction. The dataset is preprocessed and collected with a vocabulary of 18,085 unique words. Bi-directional Long Short-Term Memory (Bi-LSTM) models are employed, along with popular pre-trained word embedding models (Word2vec, Fasttext, Glove, and Keras) for feature extraction. Experiments encompass various hyperparameter values and optimization methods (Adam and Nadam), significantly influencing model training and performance. Model accuracy is compared to identify the most effective solution for Amharic word sequence prediction. Evaluation is conducted using accuracy measurements to assess overall prediction system correctness. Among the tested models, the Fasttext model combined with Bi-LSTM architecture and Adam optimizer achieves the highest training accuracy (97.5%) and validation accuracy (95.6%), surpassing other embedding methods. This research contributes to Amharic language model development, demonstrating the capacity to capture long-term dependencies and accurately predict the next word in Amharic text. The findings highlight the potential of Bi-LSTM-based approaches in enhancing text prediction systems.	en_US
dc.language.iso	en	en_US
dc.publisher	St. Mary's University	en_US
dc.subject	Word prediction, Amharic language, Bi-LSTM, Word embedding, Fasttext, Long-term dependencies	en_US
dc.title	WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE USING DEEP LEARNING	en_US
dc.type	Thesis	en_US
Appears in Collections:	Master of computer science