Multi-Label Classification of Indonesian Voice Phishing Conversations: A Comparative Study of XLM-RoBERTa and ELECTRA

Ahmad Hidayat, Sarifuddin Madenda, Hustinawaty Hustinawaty

Abstract


Mobile phones have become a primary means of communication, yet their advancement has also been exploited by cybercriminals, particularly through voice phishing schemes. Voice phishing is a form of social engineering fraud carried out via telephone conversations to illegally obtain personal or financial information. The complexity of voice phishing continues to increase, as a single conversation may involve multiple fraudulent schemes simultaneously, necessitating the application of multi-label classification to comprehensively identify all motives of fraud. Previous studies have predominantly utilized single-label approaches and foreign-language data, making them less relevant to the Indonesian language context and unable to produce speaker segmentation outputs for conversational analysis. This study contributes by developing a multi-label voice phishing classification system specifically for Indonesian telephone conversations to address this gap. Audio data were collected from open sources and simulated recordings, resulting in a total of 300 samples labeled into six categories: five phishing modes and one non-phishing category. The proposed system consists of a preprocessing pipeline that includes noise reduction, speaker segmentation, automatic transcription, and text cleaning to preserve the context of two-way conversations. Two machine learning models based on transformer architectures, XLM-RoBERTa and ELECTRA, are employed to identify various fraud schemes that may occur simultaneously within a single conversation. The dataset was split into training, validation, and testing sets with two division ratios for performance evaluation. Several combinations of hyperparameters were tested to obtain the most optimal model configuration. Evaluation was conducted using a supervised learning approach and various performance metrics. The experimental results show that XLM-RoBERTa achieved the highest average accuracy of 97.04 ± 1.15% and the highest average F1-score of 92.66 ± 2.59%. These results highlight the novelty of applying multi-label classification in the Indonesian language context for voice phishing detection, contributing to more effective fraud identification in real-world telephony systems.


Article Metrics

Abstract: 21 Viewers PDF: 11 Viewers

Keywords


Voice Phishing; Indonesian Language; Speaker Segmentation; Multi-Label Classification; Transformer Model

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0