ENGLISH-WOLAYTTA MACHINE TRANSLATION USING STATISTICAL APPROACH

MARA, MELAKU

st. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/4453

Title:	ENGLISH-WOLAYTTA MACHINE TRANSLATION USING STATISTICAL APPROACH
Authors:	MARA, MELAKU
Keywords:	Machine translation English documents in Wolaytta language
Issue Date:	Jul-2018
Publisher:	St.Mary's University
Abstract:	Machine translation is a technology for the automatic translation of text or speech from one natural language to another. Since there is a need for translation of sentences between English-Wolaytta language to make available the English documents in Wolaytta language and minimize the language barrier. Thus, this study in the development of a English-Wolaytta machine translation system using statistical approach. In order to achieve the objective of this research work, 30,000 bilingual corpus is collected from spiritual domain and 39,893 monolingual corpus from different sources. And also prepared in a format suitable for use in the development process (normalization, tokenization, lower-case and clean) and classified as training, tunning and testing set. Aligned parallel sentences manually and used freely available tools for the different purposes such as SRILM toolkit for language model, MGIZA++ align the corpus at word level by using IBM models (1-5), Decoding has been done using Moses, and Ubuntu operating system which is suitable for Moses environment has been used. In addition, unsupervised morpheme segmentation tool Morfessor is used for segmentation of Wolaytta text. The experiments were taken separately, one for the unsegmented and the other for segmented corpus. The parallel sentences divided by 5,000, 10,000, 15,000, 20,000, 25,000 and 30,000. The unsegmented corpus performs BLEU score of 4.91%, 6.30%, 7.21%, 7.60%, 7.96% and 8.46% used the above divided parallel sentences. The segmented corpus performs BLEU score of 9.83%, 11.38%, 12.70%, 12.77%, 12.93% and 13.21% used the above divided parallel sentences. Its performance improved by increased the size of the corpus and segmented parallel sentences. Base on the experiments done, the researcher observed that there will be a better performance when increase the size of the corpus and morphological segmentation. Therefore future research should focus to further improve the performance of the system increase the size of the corpus and morphological segmentation
URI:	. http://hdl.handle.net/123456789/4453
Appears in Collections:	Business Administration

Files in This Item:

File	Description	Size	Format
last cover.pdf		99.63 kB	Adobe PDF	View/Open
melaku mara (thesis).pdf		572.58 kB	Adobe PDF	View/Open

Show full item record