Enhancing Aspect-Based Sentiment Analysis in Tourism Reviews Through Hybrid Data Augmentation
Abstract
The increasing reliance on online reviews in tourism has made User-Generated Content (UGC) an invaluable resource for understanding visitor perceptions. However, extracting meaningful insights from these reviews remains challenging due to their unstructured nature, aspect imbalance, and the prevalence of code-mixing between languages such as Indonesian and English—particularly in multicultural destinations like Bali. Aspect-Based Sentiment Analysis (ABSA) offers a promising solution by associating sentiment polarity with specific aspects of tourist experiences. Yet, its performance is often constrained by limited and imbalanced datasets, especially for underrepresented aspects such as sanitation and amenities. This study proposes a hybrid data augmentation framework that integrates three complementary strategies: generative augmentation using ChatGPT, semantic filtering via Sentence-BERT (SBERT), and domain refinement through Masked Language Modeling (MLM). The framework is designed to improve ABSA performance on multilingual tourism reviews by generating synthetic aspect-relevant data while preserving semantic integrity and contextual nuance. Using 398 reviews of Kuta Beach in Bali, we evaluate the effectiveness of the proposed approach across five tourism aspects: scenery, dusk, surf, amenities, and sanitation. Results show that the hybrid strategy reduces hallucination rates from 12% (using ChatGPT alone) to 3.8%, increases F1-scores for underrepresented aspects by up to 5.1%, and improves cross-lingual alignment (Cohen’s κ = 0.78). These improvements demonstrate the synergy between generative and semantic augmentation in addressing real-world ABSA challenges. The proposed method not only advances the state of multilingual ABSA but also offers practical implications for tourism analytics, allowing destination managers to better understand and respond to aspect-specific visitor feedback. The framework is extensible to other low-resource domains, were linguistic diversity and data scarcity present similar limitations.
Article Metrics
Abstract: 17 Viewers PDF: 16 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Organized by | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0