Tuesday, 14 January 2020

Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning

Marc-André Kaufhold (PEASEC, Technische Universität Darmstadt, Germany and University of Siegen, Germany) and Markus Bayer and Christian Reuter (PEASEC, Technische Universität Darmstadt, Germany) published in Information Processing and Management Volume 57 Issue 1 (January 2020)

Highlights

  • Abstract and precise relevance criteria for emergency services and classifiers.
  • Batch learning for relevance classification using precise relevance criteria.
  • Active learning for rapid classification during time-critical disasters.
  • Incremental learning for real-time classifier quality prediction during labeling.
  • Feedback learning allowing users to correct misclassifications reactively.

Abstract

The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload.

Research indicates that supervised machine learning techniques are suitable for identifying relevant messages and filter out irrelevant messages, thus mitigating information overload. Still, they require a considerable amount of labelled data, clear criteria for relevance classification, a usable interface to facilitate the labelling process and a mechanism to rapidly deploy retrained classifiers.

To overcome these issues, we present

  1. a system for social media monitoring, analysis and relevance classification,
  2. abstract and precise criteria for relevance classification in social media during disasters and emergencies,
  3. the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as
  4. an approach and preliminary evaluation for relevance classification including active, incremental and online learning to reduce the amount of required labelled data and to correct mis-classifications of the algorithm by feedback classification.
Using the latter approach, we achieved a well-performing classifier based on the European floods dataset by only requiring a quarter of labelled data compared to the traditional batch learning approach. Despite a lesser effect on the BASF SE incident dataset, still a substantial improvement could be determined.

Full text (PDF 32pp)

Labels:
crisis_management, information_overload, relevance _classification, social_media, supervised_machine_learning,


No comments: