Skip navigation
st. Mary's University Institutional Repository St. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/5271
Full metadata record
DC FieldValueLanguage
dc.contributor.authorShikur, Musa-
dc.date.accessioned2020-04-07T11:21:15Z-
dc.date.available2020-04-07T11:21:15Z-
dc.date.issued2019-01-
dc.identifier.uri.-
dc.identifier.urihttp://hdl.handle.net/123456789/5271-
dc.description.abstractThe emergence of Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. Sentiment Analysis involves extracting, understanding, classifying and presenting the emotions and opinions expressed by users. We explored opinion mining as a text classification task and employed unigram as a feature set. We have performed different experiments that can be grouped into three. In the first group (lexical classifier), we developed an algorithm to classify reviews based on the number of count of opinion words. The performance of this algorithm has been evaluated by comparing the result of lexical classifier algorithm with the actual labels of the reviews. In the second group of experiments, three popular feature selection methods Chi-Square, MutualInformation-Gain and Galavvotti-Sebastiani-Simi (GSS) coefficient have been compared for performance in selecting a better subset of feature set. For these comparisons, three supervised classifiers Nave Bayes, Logistic-Regression and SVM have been used. Experiments on these three classifiers have been done using all three of the above feature selection methods with 750, 1000, 1250, and 1500 numbers of features. Here, It enabled us to know which combinations of feature selection methods, classifier, and a number of features work best in our domain. In the third group of experiments, we combine the lexical classifier with machine learning sequentially. In this research work, hybrid sentiment classification has been done for classifying Amharic book reviews into positive and negative. The experiments are conducted using 600 Amharic book reviews collected from different sources like facebook, personal blogs, and manually collected from individual book readers. For machine learning, the experiment indicates that the Naïve Bayes algorithm, using Mutual Information Gain feature selection method, with 1500 number of features perform best with an accuracy of 93.33%. The experiment also indicates a hybrid approach with accuracy (87%) outperform lexical approach with 74% accuracy but not machine learning approach which performs with an accuracy of 93.33%.en_US
dc.language.isoenen_US
dc.publisherSt. Mary's Universityen_US
dc.subjectOpinion, Sentiment Analysisen_US
dc.subjectLexicon-Based Classifier, Machine Learning, Hybrid Classifieren_US
dc.titleA Hybrid Sentiment Classification for Amharic Book Reviewsen_US
dc.typeThesisen_US
Appears in Collections:Master of computer science

Files in This Item:
File Description SizeFormat 
A Hybrid Sentiment Classification -Muisa-Shikur.pdf2.08 MBAdobe PDFView/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.