Hate Speech Detection from Facebook Social Media Posts and Comments in Tigrigna language

Bahre, Weldemariam

st. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/6929

Title:	Hate Speech Detection from Facebook Social Media Posts and Comments in Tigrigna language
Authors:	Bahre, Weldemariam
Keywords:	Tigrigna Hate Speech Detection, Facebook Posts and Comments, Machine Learning Classifier
Issue Date:	Jan-2022
Publisher:	ST. MARY’S UNIVERSITY
Abstract:	In recent years, hate speech on social media has become a common phenomenon in the Ethiopian online community particularly due to the substantial growth of users. As part of our country language Tigrigna language Facebook users also increased in recent years. In line with this, the hate speech in Tigrigna language is also increased. The reason could be due to, the political instabilities. Hate speech on social media has the potential to quickly disseminate through the online users that could escalate an act of violence and hate crime among peoples. To address this problem, this research proposed hate speech detection using machine learning and text-mining feature extraction techniques to build a detection model. A hate speech data written in Tigrigna language was collected from the Facebook public page and manually labeled into hate and hate-free classes to build binary class datasets. The research employed an experimental approach to determine the best combination of the machine learning algorithm and features extraction for modeling. Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF)classification algorithms are employed to construct hate speech detection model using the whole dataset with the extracted features based on word unigram, bigram, trigram, as well as combined n-grams and TFIDF. An experimental result shows that the Naïve Bayes classification algorithm with TFDF feature extraction were achieved slightly better performance than the SVM and RF models for hate speech detection with 79% accuracy. In this study we achieved a promising result for designing hate speech detection for Tigrigna language. Since there is no data set available for experimentation, we used limited data for constructing an optimal hate speech detection model using machine learning classification algorithm. Hence, we recommend the need to prepare standard corpus for hate speech detection in local languages, including Tigrigna language.
URI:	. http://hdl.handle.net/123456789/6929
Appears in Collections:	Master of computer science

Files in This Item:

File	Description	Size	Format
Hate speech detection for Tigrigna (Signed ).pdf		2.05 MB	Adobe PDF	View/Open

Show full item record