Active learning on Indonesian Twitter sentiment analysis using uncertainty sampling

Muhaza Liebenlito, Nur Inayah, Esti Choerunnisa, Taufik Edy Sutanto, Suma Inna

Abstract


Nowadays, sentiment analysis research in social media is rapidly developing. Sentiment analysis typically falls under supervised learning, which requires annotating data. However, the annotation process for sentiment analysis tasks is notoriously time-consuming. Fortunately, an effective strategy to overcome this challenge has emerged, known as active learning. Active learning involves labeling only a small subset of the dataset, leaving the rest for annotation through sampling strategies. This study focuses on comparing two active learning strategies: random sampling and boundary sampling. These strategies are applied to machine learning models such as logistic regression and random forests. In addition, we present an evaluation of the model performance and data savings achieved by implementing these strategies in the context of traditional machine learning for sentiment analysis on Twitter. The dataset considered consists of two labels: positive and negative sentiments. The results of our investigation show that active learning can significantly reduce the amount of training data required, saving up to 65% of the total training data required to achieve peak model accuracy. The most successful model identified uses a random forest with a margin sampling strategy, yielding an accuracy of 81.12% and an F1 score of 88.60%. This research highlights the effectiveness of active learning strategies in sentiment analysis, demonstrating their potential to improve model performance and resource efficiency. The results underscore the viability of employing active learning methods, particularly the combination of random forest models with margin sampling, for more efficient sentiment analysis in social media.

Article Metrics

Abstract: 34 Viewers PDF: 8 Viewers

Keywords


active learning; uncertainty sampling; logistic regression; random forest; sentiment analysis

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Departement of Information System, Universitas Amikom Purwokerto, Indonesia; Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    husniteja@uinjkt.ac.id (managing editor)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0