Article Archives

Why Mortgage Climate Risk is Not Just for Coastal Investors

When it comes to climate concerns for the housing market, sea level rise and its impacts on coastal communities often get top billing. But this article in yesterday’s New York Times highlights one example of far-reaching impacts in places you might not suspect.

Chicago, built on a swamp and virtually surrounded by Lake Michigan, can tie its whole existence as a city to its control and management of water. But as the Times article explains, management of that water is becoming increasingly difficult as various dynamics related to climate change are creating increasingly large and unpredictable fluctuations in the level of the lake (higher highs and lower lows). These dynamics are threatening the city with more frequency and severe flooding.

The Times article connects water management issues to housing issues in two ways: the increasing frequency of basement flooding caused by sewer overflow and the battering buildings are taking from increased storm surge off the lake. Residents face increasing costs to mitigate their exposure and fear the potentially negative impact on home prices. As one resident puts it, “If you report [basement flooding] to the city, and word gets out, people fear it’s going to devalue their home.”

These concerns — increasing peril exposure and decreasing valuations — echo fears expressed in a growing number of seaside communities and offer further evidence that mortgage investors cannot bank on escaping climate risk merely by avoiding the coasts. Portfolios everywhere are going to need to begin incorporating climate risk into their analytics.

Hurricane Season a Double-Whammy for Mortgage Prepayments

As hurricane (and wildfire) season ramps up, don’t sleep on the increase in prepayment speeds after a natural disaster event. The increase in delinquencies might get top billing, but prepays also increase after events—especially for homes that were fully insured against the risk they experienced. For a mortgage servicer with concentrated geographic exposure to the event area, this can be a double-whammy impacting their balance sheet—delinquencies increase servicing advances, prepays rolling loans off the book. Hurricane Katrina loan performance is a classic example of this dynamic.

EDGE: Extended Delinquencies in FNMA and FHLMC Loans

In June, the market got its first look at Fannie Mae and Freddie Mac “expanded delinquency” states. The Enterprises are now reporting delinquency states out to 24 months to better account for loans that are seriously delinquent and not repurchased under the extended timeframe for repurchase of delinquent loans announced in 2020. In this short post, we analyze those pipelines and what they could mean for buyouts in certain spec pool stories.

First, we look at the extended pipeline for some recent non-spec cohorts. The table below summarizes some major 30yr cohorts and their months delinquent. We aggregate the delinquencies that are more than 6 months delinquent[1] for ease of exposition.

Recent-vintage GSE loans with higher coupons show a higher level of “chronically delinquent” loans, similar to the trends we see in GNMA loans.

EDGE-Months Deliq for Loans in FN FH–6-17-2021

Digging deeper, we filtered for loans with FICO scores below 680. Chronically delinquent loan buckets in this cohort are marginally more prevalent relative to non-spec borrowers. Not unexpectedly, this suggests a credit component to these delinquencies.

Finally, we filtered for loans with high LTVs at origination. The chronically delinquent buckets are lower than the low FICO sector but still present an overhang of potential GSE repurchases in spec pools.

EDGE Delinq for Agency 30yr Loans with FICO680–6-17-2021

It remains to be seen whether some of these borrowers will be able to resume their original payments — in which case they can remain in the pool with a forbearance payment due at payoff — or if the loans will be repurchased by the GSEs at 24 months delinquent for modification or other workout. If the higher delinquencies lead to the second outcome, the market could see an uptick in involuntary speeds on some spec pool categories in the next 6-12 months.

Contact us if you are interested in seeing variations on this theme. Using Edge, we can examine any loan characteristic and generate a S-curve, aging curve, or time series.

[1] The individual delinquency states are available for each bucket, contact us for details.

Non-Agency Delinquencies Fall Again – Still Room for Improvement

Serious delinquencies among non-Agency residential mortgages continue marching downward during the first half of 2021 but remain elevated relative to their pre-pandemic levels.

Our analysis of more than two million loans held in private-label mortgage-backed securities found that the percentage of loans at least 60 days past due fell again in May across vintages and FICO bands. While performance differences across FICO bands were largely as expected, comparing pre-crisis vintages with mortgages originated after 2009 revealed some interesting distinctions.

The chart below plots serious delinquency rates (60+ DPD) by FICO band for post-2009 vintages. Not surprisingly, these rates begin trending upward in May and June of 2020 (two months after the economic effects of the pandemic began to be felt) with the most significant spikes coming in July and August – approaching 20 percent at the low end of the credit box and less than 5 percent among prime borrowers.

Serious Delinquency by FICO Bank-Poast-2009 Vintage–6-15-2021

Since last August’s peak, serious delinquency rates have fallen most precipitously (nearly 8 percentage points) in the 620 – 680 FICO bucket, compared with a 5-percentage point decline in the 680 – 740 bucket and a 4 percentage point drop in the sub-620 bucket. Delinquency rates have come down the least among prime (FICO > 740) mortgages (just over 2 percentage points) but, having never cracked 5 percent, these loans also had the shortest distance to go.

Serious delinquency rates remain above January 2020 levels across all four credit buckets – approximately 7 percentage points higher in the two sub-680 FICO buckets, compared with the 680 – 740 bucket (5 percentage points higher than in January 2020) and over-740 bucket (2 percentage points higher).

So-called “legacy” vintages (consisting of mortgage originated before the 2008-2009 crisis) reflect a somewhat different performance profile, though they follow a similar pattern.

The following chart plots serious delinquency rates by FICO band for these older vintages. Probably because these rates were starting from a relatively elevated point in January 2020, their pandemic-related spike were somewhat less pronounced, particularly in the low-FICO buckets. These vintages also appear to have felt the spike about a month earlier than did the newer issue loans.

Serious Delinquency by FICO Bank-Pre-2009 Vintage–6-15-2021

Serious delinquency rates among these “legacy” loans are considerably closer to their pre-pandemic levels than are their new-issue counterparts. This is especially true in the sub-prime buckets. Serious delinquencies in the sub-620 FICO bucket actually were 3 percentage points lower last month than they were in January 2020 (and nearly 5 percentage points lower than their peak in July 2020). These differences are less pronounced in the higher-FICO buckets but are still there.

Comparing the two graphs reveals that the pandemic had the effect of causing new-issue low-FICO loans to perform similarly to legacy low-FICO loans, while a significant gap remains between the new-issue prime buckets and their high-FICO pre-2009 counterparts. This is not surprising given the tightening that underwriting standards (beyond credit score) underwent after 2009.

Interested in cutting non-Agency performance across any of several dozen loan-level characteristics? Contact us for a quick, no-pressure demo.

In ESG Policy, ‘E’ Should Not Come at the Expense of ‘S’

ESG—it is the hottest topic in our space. No conference or webinar is complete without a panel touting the latest ESG bond or the latest advance in reporting and certification. What a lot of these pieces neglect to address is the complicated relationship between the “E” and the “S” of ESG. In particular, that climate-risk exposed properties are also often properties in underserved communities, providing much-needed affordable housing to the country.

Last week, the White House issued an Executive Order of Climate-Related Financial Risk. The focus of the order was to direct government agencies toward both disclosure and mitigation of climate-related financial risk. The order reinforces the already relentless focus on ESG initiatives within our industry. The order specifically calls on the USDA, HUD, and the VA to ‘consider approaches to better integrate climate-related financial risk into underwriting standards, loan terms and conditions, and asset management and servicing procedures, as related to their Federal lending policies and programs.” Changes here will likely presage changes by the GSEs.

In mortgage finance, some of the key considerations related to disclosure and mitigation are as follows:

Disclosure of Climate-Related Financial Risk:

Homes exposed to increasing occurrence to natural hazards due to climate changes.
Homes exposed to the risk of decreasing home prices due to climate change, because of either increasing property insurance costs (or un-insurability) or localized transition risks of industry-exposed areas (e.g., Houston to the oil and gas industry).

Mitigation of Climate-Related Financial Risk:

Reducing the housing industry’s contribution to greenhouse gas emissions in alignment with the president’s goal of a net-zero emissions economy by 2050. For example, loan programs that support retrofitting existing housing stock to reduce energy consumption.
Considering a building location’s exposure to climate-related physical risk. Directing investment away for areas exposed to the increasing frequency and severity of natural disasters.

But products and programs that aim to support the goal of increased disclosure and mitigation of climate-related financial risk can create situations in which underserved communities disproportionately bear the costs of our nation’s pivot toward climate resiliency. The table below connects the FEMA’s National Risk Index data to HUD’s list of census tracts that qualify for low-income housing tax credits, which HUD defines as tracts that have ‘50 percent of households with incomes below 60 percent of the Area Median Gross Income (AMGI) or have a poverty rate of 25 percent or more.’ Census tracts with the highest risk of annual loss from natural disaster events are disproportionally made of HUD’s Qualified Tracts.

As an industry, it’s important to remember that actions taken to mitigate exposure to increasing climate-related events will always have a cost to someone. These costs could be in the form of increased insurance premiums, decreasing home prices, or even loss of affordable housing options altogether. All this is not to say that action should not be taken, only that balancing social ESG goals should also be considered when ambitious environmental ESG goals come at their expense.

The White House identified this issue right at the top of the order by indicating that any action on the order would need to account for ‘disparate impacts on disadvantaged communities and communities of color.’

“It is therefore the policy of my Administration to advance consistent, clear, intelligible, comparable, and accurate disclosure of climate-related financial risk (consistent with Executive Order 13707 of September 15, 2015 (Using Behavioral Science Insights to Better Serve the American People), including both physical and transition risks; act to mitigate that risk and its drivers, while accounting for and addressing disparate impacts on disadvantaged communities and communities of color (consistent with Executive Order 13985 of January 20, 2021 (Advancing Racial Equity and Support for Underserved Communities Through the Federal Government)) and spurring the creation of well-paying jobs; and achieve our target of a net-zero emissions economy by no later than 2050.”

The social impacts of any environmental initiative need to be considered. Steps should be taken to avoid having the cost of changes to underwriting processes and credit policies be disproportionately borne by underserved and vulnerable communities. To this end, a balanced ESG policy will ultimately require input from stakeholders across the mortgage industry.

Mortgage DQs by MSA: Non-Agency Performance Chart of the Month

This month we take a closer look at geographical differences in loan performance in the non-agency space. The chart below looks at the 60+ DPD Rate for the 5 Best and Worst performing MSAs (and the overall average). A couple of things to note:

The pandemic seems to have simply amplified performance differences that were already apparent pre-covid. The worst performing MSAs were showing mostly above-average delinquency rates before last year’s disruption.
Florida was especially hard-hit. Three of the five worst-performing MSAs are in Florida. Not surprisingly, these MSAs rely heavily on the tourism industry.
New York jumped from being about average to being one of the worst-performing MSAs in the wake of the pandemic. This is not surprising considering how seriously the city bore the pandemic’s brunt.
Tech hubs show strong performance. All our best performers are strong in the Tech industry—Austin’s the new Bay Area, right?

Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found.

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

Complete – no true gaps when looking back historically.
Accurate
Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

Spikes
Stale data
Missing data
Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days.

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
Local Outliers: detects spikes in near values.
Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

Coding examples
- Application of outlier detection and pipelines
- PCA
Specific loan use cases
- Loan performance
- Entity correction
Novelty Detection
- Anomalies are not always “bad”
- Market monitoring models

You can register for this complimentary workshop here.

Leveraging ML to Enhance the Model Calibration Process

Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Machine Learning-Clustering Results–5-10-2021

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
- Model Error and Pool Features are used as Independent Variables
- Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
- Two dependent variables — Refi Tuner and Turnover Tuner – are used
- Separate models are estimated for each cluster
We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]

Machine Learning-Optimizing Tuning Knobs–5-10-2021

Results

The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.

Machine Learning-Validation of Tuning Results C7–5-10-2021

Conclusion

These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.

Appendix

[1]. https://en.wikipedia.org/wiki/Hierarchical_clustering

[2]. https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering

[3]. https://en.wikipedia.org/wiki/Gradient_boosting

[4]. https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting

RiskSpan VQI: Current Underwriting Standards Q1 2021

VQI March 2021-1st Graph on Historic Trend

RiskSpan’s Vintage Quality Index estimates the relative “tightness” of credit standards by computing and aggregating the percentage of Agency originations each month with one or more “risk factors” (low-FICO, high DTI, high LTV, cash-out refi, investment properties, etc.). Months with relatively few originations characterized by these risk factors are associated with lower VQI ratings. As the historical chart above shows, the index maxed out (i.e., had an unusually high number of loans with risk factors) leading up to the 2008 crisis.

Vintage Quality Index Stability Masks Purchase Credit Contraction

The first quarter of 2021 provides a stark example of why it is important to consider the individual components of RiskSpan’s Vintage Quality Index and not just the overall value.

The Index overall dropped by just 0.37 points to 76.68 in the first quarter of 2021. On the surface, this seems to suggest a minimal change to credit availability and credit quality over the period. But the Index’s net stability masks a significant change in one key metric offset by more modest counterbalancing changes in the remaining eight. The percentage of high-LTV mortgages fell to 16.7% (down from 21% at the end of 2020) during the first quarter.

While this continues a trend in falling rates of high-LTV loans (down 8.7% since Q1 of 2020 and almost 12% from Q1 2019) it coincides with a steady increase in house prices. From December 2020 to February 2021, the Monthly FHFA House Price Index® (US, Purchase Only, Seasonally Adjusted) rose 1.9%. More striking is the year-over-year change from February 2020 to 2021, during which the same rose by 11.1%. Taken together, the 10% increase in home prices combined with a 10% reduction in the share of high-LTV loans paints a sobering picture for marginal borrowers seeking to purchase a home.

Some of the reduction in high-LTV share is obviously attributable to the growing percentage of refinance activity (including cash-out refinancing, which counterbalances the effect the falling high-LTV rate has on the index). But these refis does not impact the purchase-only HPI. As a result, even though the overall Index did not change materially, higher required down payments (owing to higher home prices) combined with fewer high-LTV loans reflects a credit box that effectively shrank in Q1.

Population assumptions:

Monthly data for Fannie Mae and Freddie Mac.

Loans originated more than three months prior to issuance are excluded because the index is meant to reflect current market conditions.

Loans likely to have been originated through the HARP program, as identified by LTV, MI coverage percentage, and loan purpose are also excluded. These loans do not represent credit availability in the market as they likely would not have been originated today but for the existence of HARP.

Data assumptions:

Freddie Mac data goes back to 12/2005. Fannie Mae only back to 12/2014.

Certain fields for Freddie Mac data were missing prior to 6/2008.

GSE historical loan performance data release in support of GSE Risk Transfer activities was used to help back-fill data where it was missing.

An outline of our approach to data imputation can be found in our VQI Blog Post from October 28, 2015.

Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.

Machine learning’s rise in popularity is attributable to multiple underlying trends:

Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention.
Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.
Cheap computation costs. New techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost.
Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development.

Addressing Monitoring Challenges

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, “ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid”. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring:

Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change.
Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well.
Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available.

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models.

« First ‹ Prev 6 7 8910 11 12 Next ›Last »