Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

  1. Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

  1. Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

  1. Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.  

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

  1. These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
    • Model Error and Pool Features are used as Independent Variables
    • Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
    • Two dependent variables — Refi Tuner and Turnover Tuner – are used
    • Separate models are estimated for each cluster
  2. We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]


The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.


These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.