Forecasting Bank Efficiency Using Data Envelopment Analysis with Directional Distance Functions and Machine Learning: Time-Series Validation and Shapley Value Interpretation
Abstract
This study develops a structured framework to forecast the operational efficiency of commercial banks in Vietnam. The analysis is based on a balanced panel of 27 banks over the period 2016–2024. Bank efficiency is first measured using a directional distance function within a data envelopment analysis framework (DEA – DDF). This approach incorporates both desirable outputs and undesirable outputs related to credit risk. The estimated efficiency scores are then used as prediction targets in several machine learning models. Model performance is evaluated under both conventional test settings and time-series cross-validation, and predictions are interpreted using Shapley value–based analysis (SHAP). Under a conventional test set, the gradient boosting model (XGBoost) shows the best performance, with a root mean squared error of 0.060 and a coefficient of determination (R²) of 0.353. However, when time-series cross-validation is applied to reflect realistic forecasting conditions, predictive accuracy declines sharply. The average coefficient of determination falls to approximately 0.005. This suggests that static validation can overstate performance and that forecasting efficiency in a changing financial environment remains difficult. The interpretation results provide additional insights. Net interest margin has a positive effect on predicted efficiency, although the effect weakens at very high levels. The cost-to-income ratio shows a threshold around 0.55, beyond which efficiency declines more strongly. Bank size has a largely neutral impact. The interaction between capital adequacy and profitability shows a conditionally negative pattern. Prediction errors are larger in the most recent year and among banks with very high efficiency scores. In summary, the results highlight both the potential and the limitations of machine learning in forecasting efficiency and emphasize the importance of time-aware validation.
Article Metrics
Abstract: 23 Viewers PDF: 14 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
| ISSN | : | 2723-6471 (Online) |
| Collaborated with | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
| Publisher | : | Bright Publisher |
| Website | : | http://bright-journal.org/JADS |
| : | taqwa@amikompurwokerto.ac.id (principal contact) | |
| support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0




.png)