Abstract: | Credit risk is an important factor influencing bank financial performance, and the capacity to
foresee it enables institutions such as Banks to manage potential risks and maintain their
profitability. Accurate credit risk prediction allows banks to make informed decisions by
identifying customers who are likely to default in advance. In this study, multiple machine
learning approaches are used on Awash Bank customer data to create a prediction model
capable of predicting credit risk. Missing values in numerical features are filled using the mean,
while categorical features are filled with the mode. Categorical features are encoded using
Label Encoding, except the 'Branch' variable, which, due to its high cardinality of 124
unique values, is encoded using the Hasher function, a method suggested for features of this
type. The dataset is split into training, testing, and validation sets using an 80:20:10 ratio, where
10% of the training set is reserved for validation. Key characteristics are identified by applying
a correlation analysis and the ExtraTreesClassifier, and class imbalance is handled using the
SMOTE oversampling approach to avoid bias against the majority class. Five machine learning
models—XGBoost, CatBoost, Random Forest, Support Vector Machine (SVM), and Deep Neural
Networks (DNN)—are trained on the dataset and tested for accuracy, precision, recall, and F1
score. Hyperparameter tuning is performed using RandomizedSearchCV() to optimize the
performance of each selected model. The results show that the XGBoost algorithm
outperformed the others, with an accuracy of 92.2%, followed by CatBoost and Random
Forest.This study contributes to the limited research on credit risk prediction in the Ethiopian
banking sector by utilizing real data from Awash Bank and demonstrating the potential for
machine learning, particularly ensemble methods such as XGBoost, to improve credit risk
management in the banking industry. However, a major limitation of this study is the reliance on
a limited dataset focused exclusively on loans, which may not fully represent the diverse
customer base of Awash Bank, particularly those seeking other types of credit products. Future
research could address this limitation by incorporating additional data sources or conducting
longitudinal studies to enhance predictive accuracy and generalizability. |