Highlights
- Abstract and precise relevance criteria for emergency services and classifiers.
- Batch learning for relevance classification using precise relevance criteria.
- Active learning for rapid classification during time-critical disasters.
- Incremental learning for real-time classifier quality prediction during labeling.
- Feedback learning allowing users to correct misclassifications reactively.
Abstract
The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload.
Research indicates that supervised machine learning techniques are suitable for identifying relevant messages and filter out irrelevant messages, thus mitigating information overload. Still, they require a considerable amount of labelled data, clear criteria for relevance classification, a usable interface to facilitate the labelling process and a mechanism to rapidly deploy retrained classifiers.
To overcome these issues, we present
- a system for social media monitoring, analysis and relevance classification,
- abstract and precise criteria for relevance classification in social media during disasters and emergencies,
- the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as
- an approach and preliminary evaluation for relevance classification including active, incremental and online learning to reduce the amount of required labelled data and to correct mis-classifications of the algorithm by feedback classification.
Full text (PDF 32pp)
Labels:
crisis_management, information_overload, relevance _classification, social_media, supervised_machine_learning,
No comments:
Post a Comment