Assessing Large Language Models for Zero-Shot Dynamic Question Generation and Automated Leadership Competency Assessment
Abstract
Automated interview systems powered by artificial intelligence often rely on fine-tuned models and annotated datasets, limiting their adaptability to new leadership competency frameworks. Large language models have shown potential for generating questions and assessing answers, yet their zero-shot performance, operating without task-specific retraining remains underexplored in leadership assessment. This study examines the zero-shot capability of two models, Qwen 32B and GPT-4o-mini, within a multi-turn self-interview framework. Both models dynamically generated questions, interpreted responses, and assigned scores across ten leadership competencies. Professionals representing the role of Digital Marketing and Account Manager participated, each completing two AI-led interview sessions. Model outputs were evaluated by certified experts using a structured rubric across three dimensions: quality of behavioral insights, relevance of follow-up questions, and fit of assigned scores. Results indicate that Qwen 32B generated richer insights than GPT-4o-mini (mean = 2.86 vs. 2.62; p less than 0.01) and provided more differentiated assessments across competencies. GPT-4o-mini produced more consistent follow-up questions but lacked depth in interpretation, often yielding generic outputs. Both models struggled with accurate scoring of candidate responses, reflected in low answer score ratings (Qwen mean = 2.35; GPT mean = 2.21). These findings suggest a trade-off between insight richness and scoring stability, with both models demonstrating limited ability to fully capture nuanced leadership behaviors. This study offers one of the first empirical benchmarks of zero-shot model performance in leadership interviews. It underscores both the promise and current limitations of deploying such systems for scalable assessment. Future research should explore competency-specific prompt strategies, fairness evaluation across demographic groups, and domain-adapted fine-tuning to improve accuracy, reliability, and ethical alignment in high-stakes recruitment contexts.
Article Metrics
Abstract: 2 Viewers PDF: 1 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
| ISSN | : | 2723-6471 (Online) |
| Collaborated with | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
| Publisher | : | Bright Publisher |
| Website | : | http://bright-journal.org/JADS |
| : | taqwa@amikompurwokerto.ac.id (principal contact) | |
| support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0




.png)