Big Data Classification of Personality Types Based on Respondents’ Big Five Personality Traits

Jennifer Chi


A mixed model was introduced in this study, k-means clustering analysis for data examination, discriminant analysis for classification, and multilayer perceptron neural network analysis for prediction. After deleted inadequate samples and outliers, total number of observations was 1,009,998 for this study that was collected through on interactive online personality (i.e., big five personality traits) test in 2018. Empirical results based on the k-means clustering analysis identified four different personality clusters using the total score of big five personality traits (Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness to Experience). Results of the k-means clustering analysis were tested for accuracy using the discriminant analysis indicated that cluster means were significantly different, and showed that 95.8% of original grouped cases correctly classified. The multilayer perceptron neural network framework was utilized as a predictive model, showed a 5-5-4 neural network construction, in deciding the personality classification of participants: Training 99.5% of training grouped cases and 99.5% of testing grouped cases correctly classified. Results of this study may provide insight into the understanding of the personality of participants for further psychological, social, cultural, and economic considerations.

Article Metrics

Abstract: 113 Viewers PDF: 70 Viewers


Big Five Personality Traits; Personality Types; Classification; K-means Clustering Analysis; Discriminant Analysis; Multilayer Perceptron Neural Network

Full Text:



F. E. Ahmed, “Artificial neural networks for diagnosis and survival prediction in colon cancer,” Molecular Cancer, 4:29, 1-12, 2005.

H. Ahmad, M. Z. Asghar, A. S. Khan, and A. Habib, “A systematic literature review of personality trait classification from textual content,” Open Computer Science, 10(1), 175-193, 2020.

T. Beatley, “Protecting biodiversity in coastal environments: introduction and overview,” Coastal Management, 19(1), 1–19, 1991.

C. M. Bishop, “Pattern recognition and machine learning,” New York, NY: Springer Science + Business Media, 2006.

B. K. Bose, “Neural network applications in power electronics and motor drives - an introduction and perspective,” IEEE Transactions on Industrial Electronics, 54(1), 14-33, 2007.

D. Child, “The essentials of factor analysis (3rd ed.),” New York, NY: Continuum International Publishing Group, 2006.

G. A. Churchill, Jr., and D. Iacobucci, “Marketing research: methodological foundations (9th ed.),” Mason, OH: Thomson/South-Western, 2005.

L. J. Cronbach, “Coefficient alpha and the internal structure of tests,” Psychometrika, 16(3), 297-334, 1951.

J. G. De Gooijer, and R. J. Hyndman, “25 years of time series forecasting,” International Journal of Forecasting, 22(3), 443-473, 2006.

L. N. N. Do, N. Taherifar, and H. L. Vu, “Survey of neural network-based models for short-term traffic state prediction,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(1), 1-24, 2019.

H. El-Amir, and M. Hamdy, “Deep learning pipeline: building a deep learning model with TensorFlow,” Berkerly, CA: Apress, 2020.

R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 7, 179-188, 1936.

J. P. Freudenstein, C. Strauch, P. Mussel, and M. Ziegler, “Four personality types may be neither robust nor exhaustive,” Nature Human Behaviour, 3(10), 1045-1046, 2019.

C. Gaisendrees, N. Kreuser, O. Lyros, J. Becker, J. Schumacher, I. Gockel, A. Kersting, and R. Thieme, “Classification of personality traits using the Big Five Inventory-10 in esophageal adenocarcinoma patients,” Annals of Esophagus, 3:22, 1-8, 2020.

M. W. Gardner, and S. R. Dorling, “Artificial neural networks (the multilayer perceptron) - a review of applications in the atmospheric sciences,” Atmospheric Environment, 32(14), 2627-2636, 1998.

M. Gerlach, B. Farb, W. Revelle, and L. A. N. Amaral, “A robust data-driven approach identifies four personality types across four large data sets,” Nature Human Behaviour, 2(10), 735-742, 2018.

M. Gerlach, W. Revelle, and L. A. N. Amaral, “Reply to: Four personality types may be neither robust nor exhaustive,” Nature Human Behaviour, 3(10), 1047-1048, 2019.

I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” The MIT Press, 2016.

S. S. Haykin, “Neural networks and learning machines (3rd ed.),” Upper Saddle River, New Jersey: Pearson Education, Inc., 2009.

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, 2, 359-366, 1989.

IBM, “IBM SPSS Neural Networks 26,” Armonk, NY: IBM Corporation, 2019.

O. P. John, and S. Srivastava, “The big-five trait taxonomy: history, measurement, and theoretical perspectives,” In L. A. Pervin, and O. P. John, (Eds.), “Handbook of personality: theory and research,” Vol. 2, pp. 102-138, New York, NY: Guilford Press, 1999.

C. M. Jones, and T. Athanasiou, “Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests,” The Statistician’s Page, 79(1), 16-20, 2005.

K. Katahira, Y. Kunisato, Y. Yamashita, and S. Suzuki, “Commentary: A robust data-driven approach identifies four personality types across four large data sets,” Frontiers in Big Data, 3(8), 1-3, 2020.

A. S. Khan, H. Ahmad, M. Z. Asghar, F. K. Saddozai, A. Arif, and A. Kalid, “Personality classification from onlinr text using machine learning approach,” International Journal of Advanced Computer Science and Applications, 2020, 11(3), 460-476, 2020.

J. N. Mandrekar, “Receiver operating characteristic curve in diagnostic test assessment,” Journal of Thoracic Oncology, 5(9), 1315-1316, 2010.

H. Ramchoun, M. A. Janati Idrissi, Y. Ghanou, and M. Ettaouil, “New modeling of multilayer perceptron architecture optimization with regularization: an application to pattern classification,” IAENG International Journal of Computer Science, 44(3), 261-269, 2017.

R. J. Rossberger, “National personality profiles and innovation: The role of cultural practices,” Creativity and Innovation Management, 23(3), 331–348, 2014.

K. G. Sheela, and S. N. Deepa, “Review on methods to fix number of hidden neurons in neural networks,” Mathematical Problems in Engineering, Article ID 425740, 1-11, 2013.

A. Souri, S. Hosseinpour, and A. M. Rahmani, “Personality classification based on profiles of social networks’ users and the five-factor model of personality,” Human-centric Computing and Information Sciences, 8(1), 8-24, 2018.

B. G. Tabatchnick, and L. S. Fidell, “Using multivariate statistics (6th ed.),” Boston, MA: Pearson Education, Inc., 2013.

A. Talasbek, A. Serek, M. Zhaparov, S. Moo-Yoo, Y. Kim, and G. Jeong, “Personality classification experiment by applying k-means clustering,” International Journal of Emerging Technologies in Learning, 15(16), 162-177, 2020.

N. Z. Zacharis, “Predicting student academic performance in blended learning using artificial neural networks,” International Journal of Artificial Intelligence and Applications, 7(5), 17-29, 2016.

K. H. Zou, A. J. O’Malley, and L. Mauri, “Receiver operating characteristic analysis for evaluating diagnostic tests and predictive models,” Circulation, 115(5), 654-657, 2007.


  • There are currently no refbacks.


Journal of Applied Data Sciences

2723-6471 (Online)
Organized by : MetaBright
Published by : Bright Publisher
Website :
Email :

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0