Prepayment modeling is an established art form with a well-known functional form. Though the exact model varies from institution to institution, the basic structure of the model has not changed significantly over time.

Recently, new modeling techniques, in the form of machine learning, have taken the world by storm, excelling in areas such as image recognition, natural language processing, and fraud detection. These methods depend on iterative trial and error rather than expert human design. Will this approach revolutionize an age-old problem?

  Traditional Machine Learning
Development Fit available data to standard prepay components Select a machine learning technique to the data
Calibration Manual tuning of parameters Grid search for hyper parameters
Implementation Typically C++ or SAS Commonly Python or R
Maintenance Periodic manual turning based on monitoring results Automatic tuning based on new data

This post is the first in a three-part series that will provide a high-level overview of the motivations of traditional vs. machine learning approaches. Subsequent blog posts will discuss these approaches in greater detail.


The Challenge

Behavioral modeling is inherently difficult. There are no guarantees that any individual will act in a given way at a given time (unlike perfect rationality assumption in option pricing, for instance) and the reasons for the observed behavior are known only to the individual. For prepayment modeling (in the context of this post, prepayments refer to curtailments as well as full payoffs in advance of the contractual maturity date) we are only privy to whether a prepayment has occurred, not why. We are limited to using available borrower, collateral, and market information to guess at the conditions that trigger prepayment. The limitations in the data and ambiguity around how these various factors interact make prepayment modeling a challenging exercise.



Traditional Approach

The traditional approach simplifies the problem into a few major prepayment types, most commonly refinancing (refi) and housing turnover. Broadly, turnover refers to a homeowner prepaying a mortgage to change residence and refi refers to a homeowner prepaying a mortgage and taking out another mortgage.[1] Each prepayment type is then modeled separately and summed together. This approach focuses on modeling specific, observed phenomena such as ramp period, seasonality, and burnout.

Seasoned model users, like veteran traders and portfolio managers, value having the ability to adjust model behavior via tuning knobs and the overall intuitiveness of the traditional approach. There are challenges, however, with attempting to investigate and model all possible, potentially complex interactions between the factors.

The models can also be expensive to maintain. Variations in the underlying loan attributes necessitate having a large set of parameters. Each parameter of each model must be calibrated, monitored, and re-calibrated periodically.


Machine Learning Approach

The machine learning approach does not explicitly define specific prepayment types. Rather, we rely on algorithms to make sense of the complex relationships among the various factors through training on the data.  These “black box” techniques offer the potential for better fits to the data in exchange for less transparency.

There are significant operational advantages to using a machine learning model. Most machine learning techniques thrive on “big data” and do not require the modeler to pre-select a subset of variables. The model can be tuned automatically according to a regular schedule or a triggering event (like a threshold breach in the monitoring). However, the lack of customization opportunities and transparency makes many institutions uncomfortable relying on them when dealing with billions of dollars in mortgage backed securities.


Hybrid Approach?

Improving model performance and understanding what the model is actually doing requires a delicate balancing act. In the end, the best method may be a mix of traditional and machine learning approaches. A traditional model that acts as a base with a machine learning layer on top may afford the strengths of each approach while mitigating their respective weaknesses. We will be investigating this possibility in our research efforts.


In the end, no modeling technique will be able to fully overcome data limitations. Even the most sophisticated machine learning methods will not find an answer if the answer isn’t in the data we feed it in the first place. These methods, however, make it possible to squeeze as much out of the data as possible.

A forthcoming series of blog posts will compare the traditional and machine learning approaches to prepayment modeling.


[1] Refis typically occur as rate-reduction refis, but some owners refi to take cash out of their homes when prevailing rates are higher than their existing rate, for reasons including debt consolidation, home improvement, etc.