Text Mining an Automatic Short Answer Grading (ASAG), Comparison of Three Methods of Cosine Similarity, Jaccard Similarity and Dice's Coefficient

Tri wahyuningsih, Henderi Henderi, Winarno Winarno


This study aims to find correlation assessment of Automatic Short Answer Grading (ASAG) by comparing three methods of Cosine Similarity, Jaccard Similarity and Dice Coefficient by providing one reference answer. From the results of computing using Python programming language and data processing using spreadsheets, it was obtained that the Dice Coefficient method had the highest correlation average value of 0.76, followed by Cosine Similarity with an average correlation value of 0.76, and the lowest correlation average value was the Jaccard method with a value of 0.69. The contribution to this study is the use of three methods in one data, whereas the previous research only used 1 method for 1 data or 2 methods for 1 data. So, the value in this study resulted in a more complete comparison and accuracy of data.

Article Metrics

Abstract: 679 Viewers PDF: 429 Viewers


Text Mining; Automatic Short Answer Grading (ASAG); Cosine Similarity; Jaccard Similarity; Dice’s Coefficient

Full Text:



Putri Ratna, A. A., Budiardjo, B., & Hartanto, D. (2007). SIMPLE : Sistem Penilaian Esai Otomatis Untuk Menilai Ujian Dalam Bahasa Indonesia. Makara, Teknologi, Vol, 11, No.1 , 5-11.

S. Burrows, I. Gurevych, and B. Stein, “The Eras and Trends of Automatic Short Answer Grading,” Int. J. Artif. Intell. Educ., pp. 60–117, 2015.

V. Salvatore, N. Francesca, & A. Cucchiarelli, ”An Overview of Current Research on Automated Essay Grading,” Journal of Information Technology Education, vol. 2, 2003.

S. Jordan, “Student engagement with assessment and feedback: Some lessons from short-answer free-text e-assessment questions,” Comput. Educ., vol. 58, no. 2, pp. 818–834, 2012.

S. Jordan, “Short-answer e-assessment questions : five years on,” Proc. 15th Int. Comput. Assist. Assess. Conf., 2012.

W. H. Gomaa and A. A. Fahmy, “Short Answer Grading Using String Similarity And Corpus-Based Similarity,” Int. J. Adv. Comput. Sci. Appl., vol. 3, no. 11, pp. 115–121, 2012.

Gegick, M., Rotella, P. & Xie, T. 2010. Identifying Security Bug Reports via Text Mining: An Industrial Case Study. IEEE

Imbar, V., Radiant. Adelia, Ayub, M., dan Rehatta, A. 2014. Implementasi Cosine Similarity dan Algoritma Smith Waterman untuk Mendeteksi Kemiripan Teks. Jurnal Informatika Volume 10, Nomor 1.

O. Nurdiana, J. Jumadi, and D. Nursantika, “Perbandingan Metode Cosine Similarity Dengan Metode Jaccard Similarity Pada Aplikasi Pencarian Terjemah Al-Qur’an Dalam Bahasa Indonesia,” J. Online Inform., vol. 1, no. 1, p. 59, 2016, doi: 10.15575/join.v1i1.12.

G. Mandar and G. Gunawan, “Peringkasan dokumen berita Bahasa Indonesia menggunakan metode Cross Latent Semantic Analysis,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 3, no. 2, p. 94, 2017, doi: 10.26594/register.v3i2.1161.

J. Priambodo, “Pendeteksian Plagiarisme Menggunakan Algoritma Rabin-Karp dengan Metode Rolling Hash,” J. Inform. Univ. Pamulang, vol. 3, no. 1, p. 39, 2018, doi: 10.32493/informatika.v3i1.1518.

N. Li and D. D. Wu, “Using text mining and sentiment analysis for online forums hotspot detection and forecast,” Decis. Support Syst., vol. 48, no. 2, pp. 354–368, 2010, doi: 10.1016/j.dss.2009.09.003.

U. Hasanah and D. A. Mutiara, “Perbandingan metode cosine similarity dan jaccard similarity untuk penilaian otomatis jawaban pendek,” Semin. Nas. Sist. Inf. dan Tek. Inform., no. 2019: SENSITIF 2019, pp. 1255–1263, 2019.

S. Roy, S. Dandapat, A. Nagesh, and N. Y., “Wisdom of Students: A Consistent Automatic Short Answer Grading Technique,” Proc. 13th Int. Conf. Nat. Lang. Process., pp. 178–187, 2016.

E. B. Page, “Grading Essays by Computer: Progress Report,” Invit. Conf. Test. Probl. 29 October, 1966, vol. 47, no. 5, pp. 87–100, 1966.

P. A. V. Hall and G. R. Dowling, "Approximate string matching, Comput. Surveys", 12:381-402 ,1980.

G. A. Pradnyana dan N. A. Sanjaya, “Cosine Similarity”, Perancangan Dan Implementasi Automated Document Integration Dengan Menggunakan Algoritma Complete Linkage Agglomerative Hierarchical Clustering, vol. 5, (2), pp. 1-10, September 2012.

S. Purwandari, Rancang Bangun Search Engine Tafsir Al-Quran Yang Mampu Memproses Teks Bahasa Indonesia Menggunakan Metode Jaccard Similarity, Fakultas Sains dan Teknologi Universitas Islam Negeri Maulana Malik Ibrahim Malang, 2012, pp. 9-27.

Chahal, M. (2016). Information Retrieval using Dice Similarity Coefficient. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 6, Issue 6, pp.72-75.

Han, J., Kamber, M., & Pei. J. 2012. Data Mining: Concepts and Techniques third edition. Waltham: Elsevier.

Christopher DM, Prabhakar R, Hinrich S. Introduction to Information Retrieval. Introduction to information retrieval. Cambridge University Press. 2008; 1: 496.

Manning, C. D., Raghavan, P., & mSchutze, H. (2009). Introduction of Information Retrieval, Cambridge University Press.

Patel, B., & Shah, D. D. (2013). Significance of stop word elimination in meta search engine. International Conference On Intelligent Systems and Signal Processing (ISSP, 52-55).

G. Carvalho, D. M. de Matos, and V. Rocio, “Document Retrieval for Question Answering : A Quantitative Evaluation of Text Preprocessing,” Proc. ACM first Ph. D. Work. CIKM, pp. 125–130, 2007.

Subagyo, Pangestu, 1986, Forecasting Konsep dan Aplikasi, Yogyakarta, BPFE UGM.

Fleiss J, Levin B, Cho Paik M. Statistical Methods for Rates and Proportions. Third Edit. Technometrics. 2004; 46: 263-264.

Tala FZ. A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. M.Sc. Thesis, Appendix D. Amsterdam. 2003.

Nazief B, Adriani M. Confix Stripping: Approach to Stemming Algorithm in Bahasa Indonesia. Intern Publ Fac Comput Sci Univ Indonesia Depok, Jakarta. 1996;


  • There are currently no refbacks.


Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Departement of Information System, Universitas Amikom Purwokerto, Indonesia; Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    husniteja@uinjkt.ac.id (managing editor)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0