In this post, I will go over how to use different Linear Regression techniques to build models for predicting a “payment score” for delinquent accounts. A “payment score” represents the amount a delinquent account is expected to pay back once debt collection process is initiated.
We used two different approaches to build linear regression models that were used to predict payment score using data from previously collected information about the account. The information used for creating models included account risk exposure, placement data, payment history, past delinquencies, demographics, geographic location, previous account interactions with customer service (including phone calls, emails, chat) etc. Without going into too many details, just wanted to mention that selection of important explanatory features as well as removing highly correlated features was accomplished using Caret and Boruta packages in R.
We measured prediction accuracy of each model using Root Mean Squared Error(RMSE), which gave us a single number evaluation score for the model. RMSE represented sample standard deviation of differences between predicted values and observed values.
– Linear Regression using Frequentist Approach: We used forward selection approach (using various combinations of features) to build multiple models looking at adjusted R-square values to identify the best model. Predicted payment score values for test dataset using the best model were compared against actual payment received from the account to calculate RMSE.
– Bayesian Inference: For Bayesian Inference, we selected the model with lowest Bayesian Information Criteria (BIC) value amongst multiple models that were created using forward selection. Once the best model was chosen, we used it on test data to predict payment scores with different estimator types (BMA – Bayesian Model Averaging, HPM – Highest Probability Model, MPM – Median Probability Model and BPM – Best Predictive Model). Predicted payment scores for all estimator types were compared against actual payments to calculate RMSE. This process was repeated to calculate mean RMSE for each estimator type over 100 iterations and estimator type with lowest mean RMSE was selected (BMA in our case).
We compared RMSE for the most appropriate Frequentist Approach model verses the best Bayesian Inference model to make the final selection. During our analysis, we consistently observed that Bayesian models were much better at prediction as compared to Frequentist models (Linear Regression).