Efficient Web Mining on MyAnimeList: A Concurrency-Driven Approach Using the Go Programming Language
Abstract
Anime is a globally popular form of entertainment, with the industry experiencing rapid growth in recent years. Despite the wealth of anime data available on MyAnimeList, the largest community-driven platform for anime enthusiasts, existing publicly available datasets are often outdated and incomplete. This presents a challenge for data science research, as the increasing volume of anime information requires more efficient data extraction methods. This research aims to address this challenge by developing a concurrent web mining program using the Go programming language. Leveraging Go's concurrency capabilities, our program efficiently extracted anime data from MyAnimeList, iterating through anime pages from ID 1 to 52,991. To overcome potential issues like rate limits and server timeouts, we implemented a two-phase execution strategy. As a result, the program successfully gathered 23,105 anime records within 8.5 hours. The extracted data has been transformed into a comprehensive dataset and made publicly available in CSV format. This research demonstrates the effectiveness of concurrent web mining for large-scale data extraction and offers a valuable resource for future data-driven research in the anime industry.
Article Metrics
Abstract: 40 Viewers PDF: 13 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Organized by | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0