Efficient Web Mining on MyAnimeList: A Concurrency-Driven Approach Using the Go Programming Language

Muhammad Daffa Arviano Putra, Deshinta Arrova Dewi, Wahyuningdiah Trisari Harsanti Putri, Harry Tursulistyono Yani Achsan

Abstract


Anime is a globally popular form of entertainment, with the industry experiencing rapid growth in recent years. Despite the wealth of anime data available on MyAnimeList, the largest community-driven platform for anime enthusiasts, existing publicly available datasets are often outdated and incomplete. This presents a challenge for data science research, as the increasing volume of anime information requires more efficient data extraction methods. This research aims to address this challenge by developing a concurrent web mining program using the Go programming language. Leveraging Go's concurrency capabilities, our program efficiently extracted anime data from MyAnimeList, iterating through anime pages from ID 1 to 52,991. To overcome potential issues like rate limits and server timeouts, we implemented a two-phase execution strategy. As a result, the program successfully gathered 23,105 anime records within 8.5 hours. The extracted data has been transformed into a comprehensive dataset and made publicly available in CSV format. This research demonstrates the effectiveness of concurrent web mining for large-scale data extraction and offers a valuable resource for future data-driven research in the anime industry.


Article Metrics

Abstract: 40 Viewers PDF: 13 Viewers

Keywords


Concurrent Web Mining; Anime Data Extraction; Go Programming Language; Myanimelist Dataset; Data Science in Anime Industry; Process Innovation

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0