Friday, 7 February 2020

Predicting and explaining corruption across countries: A machine learning approach

an article by Marcio Salles Melo Lima (Metalsider, Brasil, and Oklahoma State University, USA) and Dursun Delen (Oklahoma State University, USA) published in Government Information Quarterly Volume 37 Issue 1 (January 2020)

Highlights
  • Corruption is ubiquitous and perceived as a significant challenge for modern societies.
  • This study approaches the corruption from the predictive analytics perspective.
  • The random forest is found to be the most accurate classification technique.
  • Government integrity and property rights were among the most predictive variables.
Abstract

In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies.

A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses.

The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy.

Specifically, within the multi-class classification modelling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians.

The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance.

Full text (PDF 15pp)

Labels:
corruption_perception, machine_learning, predictive_modelling, random_forest, society_policies_and_regulations, government _integrity, social_development,


No comments: