Get Started
Category: Article

Non-Agency Delinquencies Fall Again – Still Room for Improvement

Serious delinquencies among non-Agency residential mortgages continue marching downward during the first half of 2021 but remain elevated relative to their pre-pandemic levels.

Our analysis of more than two million loans held in private-label mortgage-backed securities found that the percentage of loans at least 60 days past due fell again in May across vintages and FICO bands. While performance differences across FICO bands were largely as expected, comparing pre-crisis vintages with mortgages originated after 2009 revealed some interesting distinctions.

The chart below plots serious delinquency rates (60+ DPD) by FICO band for post-2009 vintages. Not surprisingly, these rates begin trending upward in May and June of 2020 (two months after the economic effects of the pandemic began to be felt) with the most significant spikes coming in July and August – approaching 20 percent at the low end of the credit box and less than 5 percent among prime borrowers.

Since last August’s peak, serious delinquency rates have fallen most precipitously (nearly 8 percentage points) in the 620 – 680 FICO bucket, compared with a 5-percentage point decline in the 680 – 740 bucket and a 4 percentage point drop in the sub-620 bucket. Delinquency rates have come down the least among prime (FICO > 740) mortgages (just over 2 percentage points) but, having never cracked 5 percent, these loans also had the shortest distance to go.

Serious delinquency rates remain above January 2020 levels across all four credit buckets – approximately 7 percentage points higher in the two sub-680 FICO buckets, compared with the 680 – 740 bucket (5 percentage points higher than in January 2020) and over-740 bucket (2 percentage points higher).

So-called “legacy” vintages (consisting of mortgage originated before the 2008-2009 crisis) reflect a somewhat different performance profile, though they follow a similar pattern.

The following chart plots serious delinquency rates by FICO band for these older vintages. Probably because these rates were starting from a relatively elevated point in January 2020, their pandemic-related spike were somewhat less pronounced, particularly in the low-FICO buckets. These vintages also appear to have felt the spike about a month earlier than did the newer issue loans.

Serious delinquency rates among these “legacy” loans are considerably closer to their pre-pandemic levels than are their new-issue counterparts. This is especially true in the sub-prime buckets. Serious delinquencies in the sub-620 FICO bucket actually were 3 percentage points lower last month than they were in January 2020 (and nearly 5 percentage points lower than their peak in July 2020). These differences are less pronounced in the higher-FICO buckets but are still there.

Comparing the two graphs reveals that the pandemic had the effect of causing new-issue low-FICO loans to perform similarly to legacy low-FICO loans, while a significant gap remains between the new-issue prime buckets and their high-FICO pre-2009 counterparts. This is not surprising given the tightening that underwriting standards (beyond credit score) underwent after 2009.

Interested in cutting non-Agency performance across any of several dozen loan-level characteristics? Contact us for a quick, no-pressure demo.


In ESG Policy, ‘E’ Should Not Come at the Expense of ‘S’

ESG—it is the hottest topic in our space. No conference or webinar is complete without a panel touting the latest ESG bond or the latest advance in reporting and certification. What a lot of these pieces neglect to address is the complicated relationship between the “E” and the “S” of ESG. In particular, that climate-risk exposed properties are also often properties in underserved communities, providing much-needed affordable housing to the country.

Last week, the White House issued an Executive Order of Climate-Related Financial Risk. The focus of the order was to direct government agencies toward both disclosure and mitigation of climate-related financial risk. The order reinforces the already relentless focus on ESG initiatives within our industry. The order specifically calls on the USDA, HUD, and the VA to ‘consider approaches to better integrate climate-related financial risk into underwriting standards, loan terms and conditions, and asset management and servicing procedures, as related to their Federal lending policies and programs.” Changes here will likely presage changes by the GSEs.

In mortgage finance, some of the key considerations related to disclosure and mitigation are as follows:

Disclosure of Climate-Related Financial Risk:

  • Homes exposed to increasing occurrence to natural hazards due to climate changes.
  • Homes exposed to the risk of decreasing home prices due to climate change, because of either increasing property insurance costs (or un-insurability) or localized transition risks of industry-exposed areas (e.g., Houston to the oil and gas industry).

Mitigation of Climate-Related Financial Risk:

  • Reducing the housing industry’s contribution to greenhouse gas emissions in alignment with the president’s goal of a net-zero emissions economy by 2050. For example, loan programs that support retrofitting existing housing stock to reduce energy consumption.
  • Considering a building location’s exposure to climate-related physical risk. Directing investment away for areas exposed to the increasing frequency and severity of natural disasters.

But products and programs that aim to support the goal of increased disclosure and mitigation of climate-related financial risk can create situations in which underserved communities disproportionately bear the costs of our nation’s pivot toward climate resiliency. The table below connects the FEMA’s National Risk Index data to HUD’s list of census tracts that qualify for low-income housing tax credits, which HUD defines as tracts that have ‘50 percent of households with incomes below 60 percent of the Area Median Gross Income (AMGI) or have a poverty rate of 25 percent or more.’ Census tracts with the highest risk of annual loss from natural disaster events are disproportionally made of HUD’s Qualified Tracts.

As an industry, it’s important to remember that actions taken to mitigate exposure to increasing climate-related events will always have a cost to someone. These costs could be in the form of increased insurance premiums, decreasing home prices, or even loss of affordable housing options altogether. All this is not to say that action should not be taken, only that balancing social ESG goals should also be considered when ambitious environmental ESG goals come at their expense.

The White House identified this issue right at the top of the order by indicating that any action on the order would need to account for ‘disparate impacts on disadvantaged communities and communities of color.’

“It is therefore the policy of my Administration to advance consistent, clear, intelligible, comparable, and accurate disclosure of climate-related financial risk (consistent with Executive Order 13707 of September 15, 2015 (Using Behavioral Science Insights to Better Serve the American People), including both physical and transition risks; act to mitigate that risk and its drivers, while accounting for and addressing disparate impacts on disadvantaged communities and communities of color (consistent with Executive Order 13985 of January 20, 2021 (Advancing Racial Equity and Support for Underserved Communities Through the Federal Government)) and spurring the creation of well-paying jobs; and achieve our target of a net-zero emissions economy by no later than 2050.”

The social impacts of any environmental initiative need to be considered. Steps should be taken to avoid having the cost of changes to underwriting processes and credit policies be disproportionately borne by underserved and vulnerable communities. To this end, a balanced ESG policy will ultimately require input from stakeholders across the mortgage industry.


Mortgage DQs by MSA: Non-Agency Performance Chart of the Month

This month we take a closer look at geographical differences in loan performance in the non-agency space. The chart below looks at the 60+ DPD Rate for the 5 Best and Worst performing MSAs (and the overall average). A couple of things to note:

  • The pandemic seems to have simply amplified performance differences that were already apparent pre-covid. The worst performing MSAs were showing mostly above-average delinquency rates before last year’s disruption.
  • Florida was especially hard-hit. Three of the five worst-performing MSAs are in Florida. Not surprisingly, these MSAs rely heavily on the tourism industry.
  • New York jumped from being about average to being one of the worst-performing MSAs in the wake of the pandemic. This is not surprising considering how seriously the city bore the pandemic’s brunt.
  • Tech hubs show strong performance. All our best performers are strong in the Tech industry—Austin’s the new Bay Area, right?
Contact Us

Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found.  

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

  • Complete – no true gaps when looking back historically.
  • Accurate
  • Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

  • Spikes
  • Stale data
  • Missing data
  • Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets​. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days​.

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap​. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

  • Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
  • Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
  • Local Outliers: detects spikes in near values.
  • Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
  • Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

 

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

  • Coding examples
    • Application of outlier detection and pipelines
    • PCA
  • Specific loan use cases
    • Loan performance
    • Entity correction
  • Novelty Detection
    • Anomalies are not always “bad”
    • Market monitoring models

You can register for this complimentary workshop here.


Leveraging ML to Enhance the Model Calibration Process

Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

  1. Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

  1. Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

  1. Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.  

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

  1. These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
    • Model Error and Pool Features are used as Independent Variables
    • Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
    • Two dependent variables — Refi Tuner and Turnover Tuner – are used
    • Separate models are estimated for each cluster
  2. We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]

Results

The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.

Conclusion

These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.


RiskSpan VQI: Current Underwriting Standards Q1 2021

RiskSpan’s Vintage Quality Index estimates the relative “tightness” of credit standards by computing and aggregating the percentage of Agency originations each month with one or more “risk factors” (low-FICO, high DTI, high LTV, cash-out refi, investment properties, etc.). Months with relatively few originations characterized by these risk factors are associated with lower VQI ratings. As the historical chart above shows, the index maxed out (i.e., had an unusually high number of loans with risk factors) leading up to the 2008 crisis.

Vintage Quality Index Stability Masks Purchase Credit Contraction

The first quarter of 2021 provides a stark example of why it is important to consider the individual components of RiskSpan’s Vintage Quality Index and not just the overall value. 

The Index overall dropped by just 0.37 points to 76.68 in the first quarter of 2021. On the surface, this seems to suggest a minimal change to credit availability and credit quality over the period. But the Index’s net stability masks a significant change in one key metric offset by more modest counterbalancing changes in the remaining eight. The percentage of high-LTV mortgages fell to 16.7% (down from 21% at the end of 2020) during the first quarter.  

While this continues a trend in falling rates of high-LTV loans (down 8.7% since Q1 of 2020 and almost 12% from Q1 2019) it coincides with a steady increase in house prices. From December 2020 to February 2021, the Monthly FHFA House Price Index® (US, Purchase Only, Seasonally Adjusted) rose 1.9%. More striking is the year-over-year change from February 2020 to 2021, during which the same rose by 11.1%. Taken together, the 10% increase in home prices combined with a 10% reduction in the share of high-LTV loans paints a sobering picture for marginal borrowers seeking to purchase a home.  

Some of the reduction in high-LTV share is obviously attributable to the growing percentage of refinance activity (including cash-out refinancing, which counterbalances the effect the falling high-LTV rate has on the index). But these refis does not impact the purchase-only HPI. As a result, even though the overall Index did not change materially, higher required down payments (owing to higher home prices) combined with fewer high-LTV loans reflects a credit box that effectively shrank in Q1.

Population assumptions:

  • Monthly data for Fannie Mae and Freddie Mac.

  • Loans originated more than three months prior to issuance are excluded because the index is meant to reflect current market conditions.

  • Loans likely to have been originated through the HARP program, as identified by LTV, MI coverage percentage, and loan purpose are also excluded. These loans do not represent credit availability in the market as they likely would not have been originated today but for the existence of HARP.                                                                                               

Data assumptions:

  • Freddie Mac data goes back to 12/2005. Fannie Mae only back to 12/2014.

  • Certain fields for Freddie Mac data were missing prior to 6/2008.   

GSE historical loan performance data release in support of GSE Risk Transfer activities was used to help back-fill data where it was missing.

An outline of our approach to data imputation can be found in our VQI Blog Post from October 28, 2015.                                                


Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.  

Machine learning’s rise in popularity is attributable to multiple underlying trends: 

  1. Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention. 
  2. Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.  
  3. Cheap computation costsNew techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost. 
  4. Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development. 

Addressing Monitoring Challenges 

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring: 

  1. Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change. 
  2.  Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well. 
  3. Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available. 

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models. 


Too Many Documentation Types? A Data-Driven Approach to Consolidating Them

The sheer volume of different names assigned to various documentation types in the non-agency space has really gotten out of hand, especially in the last few years. As of February 2021, an active loan in the CoreLogic RMBS universe could have any of over 250 unique documentation type names, with little or no standardization from issuer to issuer. Even within a single issuer, things get complicated when every possible permutation of the same basic documentation level gets assigned its own type. One issuer in the database has 63 unique documentation names!

In order for investors to be able to understand and quantify their exposure, we need a way of consolidating and mapping all these different documentation types to a simpler, standard nomenclature. Various industry reports attempt to group all the different documentation levels into meaningful categories. But these classifications often fail to capture important distinctions in delinquency performance among different documentation levels.

There is a better way. Taking some of the consolidated group names from the various industry papers and rating agency papers as a starting point, we took another pass focusing on two main elements:

  • The delinquency performance of the group. We focused on the 60-DPD rate while also considering other drivers of loan performance (e.g., DTI, FICO, and LTV) and their correlation to the various doc type groups.
  • The size of the sub-segment. We ensured our resulting groupings were large enough to be meaningful.

What follows is how we thought about it and ultimately landed where we did. These mappings are not set in stone and will likely need to undergo revisions as 1) new documentation types are generated, and 2) additional performance data and feedback from clients on what they consider most important become available. Releasing these mappings into RiskSpan’s Edge Platform will then make it easier for users to track performance.

Data Used

We take a snapshot of all loans outstanding in non-agency RMBS issued after 2013, as of the February 2021 activity period. The data comes from CoreLogic and we exclude loans in seasoned or reperforming deals. We also exclude loans whose documentation type is not reported, some 14 percent of the population.

Approach

We are seeking to create sub-groups that generally conform to the high-level groups on which the industry seems to be converging while also identifying subdivisions with meaningfully different delinquency performance. We will rely on these designations as we re-estimate our credit model.

Steps in the process:

  1. Start with high-level groupings based on how the documentation type is currently named.
    • Full Documentation: Any name referencing ‘Agency,’ ‘Agency AUS,’ or similar.
    • Bank Statements: Any name including the term “Bank Statement[s].”
    • Investor/DSCR: Any name indicating that the underwriting relied on net cash flows to the secured property.
    • Alternative Documentation: A wide-ranging group consolidating many different types, including: asset qualifier, SISA/SIVA/NINA, CPA letters, etc.
    • Other: Any name that does not easily classify into one of the groups above, such as Foreign National Income, and any indecipherable names.

  1. We subdivided the Alternative Documentation group by some of the meaningfully sized natural groupings of the names:
    • Asset Depletion or Asset Qualifier
    • CPA and P&L statements
    • Salaried/Wage Earner: Includes anything with W2 tax return
    • Tax Returns or 1099s: Includes anything with ‘1099’ or ‘Tax Return, but not ‘W2.’
    • Alt Doc: Anything that remained, included items like ‘VIVA, ‘SISA,’ ‘NINA,’ ‘Streamlined,’ ‘WVOE,’ and ‘Alt Doc.’
  1. From there we sought to identify any sub-groups that perform differently (as measured by 60-DPD%).
    • Bank Statement: We evaluated a subdivision by the number of statements provided (less than 12 months, 12 months, and greater than 12 months). However, these distinctions did not significantly impact delinquency performance. (Also, very few loans fell into the under 12 months group.) Distinguishing ‘Business Bank Statement’ loans from the general ‘Bank Statements’ category, however, did yield meaningful performance differences.

    • Alternative Documentation: This group required the most iteration. We initially focused our attention on documentation types that included terms like ‘streamlined’ or ‘fast.’ This, however, did not reveal any meaningful performance differences relative to other low doc loans. We also looked at this group by issuer, hypothesizing that some programs might perform better than others. The jury is still out on this analysis and we continue to track it. The following subdivisions yielded meaningful differences:
      • Limited Documentation: This group includes any names including the terms ‘reduced,’ ‘limited,’ ‘streamlined,’ and ‘alt doc.’ This group performed substantially better than the next group.
      • No Doc/Stated: Not surprisingly, these were the worst performers in the ‘Alt Doc’ universe. The types included here are a throwback to the run-up to the housing crisis. ‘NINA,’ ‘SISA,’ ‘No Doc,’ and ‘Stated’ all make a reappearance in this group.
      • Loans with some variation of ‘WVOE’ (written verification of employment) showed very strong performance, so much so that we created an entirely separate group for them.
  • Full Documentation: Within the variations of ‘Full Documentation’ was a whole sub-group with qualifying terms attached. Examples include ‘Full Doc 12 Months’ or ‘Full w/ Asset Assist.’ These full-doc-with-qualification loans were associated with higher delinquency rates. The sub-groupings reflect this reality:
      • Full Documentation: Most of the straightforward types indicating full documentation, including anything with ‘Agency/AUS.’
      • Full with Qualifications (‘Full w/ Qual’): Everything including the term ‘Full’ followed by some sort of qualifier.
  • Investor/DSCR: The sub-groups here either were not big enough or did not demonstrate sufficient performance difference.
  • Other: Even though it’s a small group, we broke out all the ‘Foreign National’ documentation types into a separate group to conform with other industry reporting.

Among the challenges of this sort of analysis is that the combinations to explore are virtually limitless. Perhaps not surprisingly, most of the potential groupings we considered did not make it into our final mapping. Some of the cuts we are still looking at include loan purpose with respect to some of the alternative documentation types.

We continue to evaluate these and other options. We can all agree that 250 documentation types is way too many. But in order to be meaningful, the process of consolidation cannot be haphazard. Fortunately, the tools for turning sub-grouping into a truly data-driven process are available. We just need to use them.   


Value Beyond Validation: The Future of Automated Continuous Model Monitoring Has Arrived

Imagine the peace of mind that would accompany being able to hand an existing model over to the validators with complete confidence in how the outcomes analysis will turn out. Now imagine being able to do this using a fully automated process.

The industry is closer to this than you might think.

The evolution of ongoing model monitoring away from something that happens only periodically (or, worse, only at validation time) and toward a more continuous process has been underway for some time. Now, thanks to automation and advanced process design, this evolutionary process has reached an inflection point. We stand today at the threshold of a future where:

  • Manual, painful processes to generate testing results for validation are a thing of the past;
  • Models are continuously monitored for fit, and end users are empowered with the tools to fully grasp model strengths and weaknesses;
  • Modeling and MRM experts leverage machine learning to dive more deeply into the model’s underlying data, and;
  • Emerging trends and issues are identified early enough to be addressed before they have time to significantly hamper model performance.

Sound too good to be true? Beginning with its own internally developed prepayment and credit models, RiskSpan data scientists are laying out a framework for automated, ongoing performance monitoring that has the potential to transform behavioral modeling (and model validation) across the industry.

The framework involves model owners working collaboratively with model validators to create recurring processes for running previously agreed-upon tests continuously and receiving the results automatically. Testing outcomes continuously increases confidence in their reliability. Testing them automatically frees up high-cost modeling and validation resources to spend more time evaluating results and running additional, deeper analyses.

The Process:

Irrespective of the regulator, back-testing, benchmarking, and sensitivity analysis are the three pillars of model outcomes analysis. Automating the data and analytical processes that underlie these three elements is required to get to a fully comprehensive automated ongoing monitoring scheme.

In order to be useful, the process must stage testing results in a central database that can:

  • Automatically generate charts, tables, and statistical tests to populate validation reports;
  • Support dashboard reporting that allows model owners, users and validators to explore test results, and;
  • Feed advanced analytics and machine learning platforms capable of 1) helping with automated model calibration, and 2) identifying model weaknesses and blind spots (as we did with a GSE here).

Perhaps not surprisingly, achieving the back-end economies of a fully automated continuous monitoring and reporting regime requires an upfront investment of resources. This investment takes the form of time from model developers and owners as well as (potentially) some capital investment in technology necessary to host and manage the storage of results and output reports.

A good rule of thumb for estimating these upfront costs is between 2 and 3 times the cost of a single annual model test performed on an ad-hoc, manual basis. Consequently, the automation process can generally be expected to pay for itself (in time savings alone) over 2 to 3 cycles of performance testing. But the benefits of automated, continuous model monitoring go far beyond time savings. They invariably result in better models.

Output Applications

Continuous model monitoring produces benefits that extend well beyond satisfying model governance requirements. Indeed, automated monitoring has significantly informed the development process for RiskSpan’s own, internally developed credit and prepayment models – specifically in helping to identify sub-populations where model fit is a problem.

Continuous monitoring also makes it possible to quickly assess the value of newly available data elements. For example, when the GSEs start releasing data on mortgages with property inspection waivers (PIWs) (as opposed to traditional appraisals) we can immediately combine that data element with the results of our automated back-testing to determine whether the PIW information can help predict model error from those results. PIW currently appears to have value in predicting our production model error, and so the PIW feature is now slated to be added to a future version of our model. Having an automated framework in place accelerates this process while also enabling us to proceed with confidence that we are only adding variables that improve model performance.

The continuous monitoring results can also be used to develop helpful dashboard reports. These provide model owners and users with deeper insights into a model’s strengths and weaknesses and can be an important tool in model tuning. They can also be shared with model validators, thus facilitating that process as well.

The dashboard below is designed to give our model developers and users a better sense of where model error is greatest. Sub-populations with the highest model error are deep red. This makes it easy for model developers to visualize that the model does not perform well when FICO and LTV data are missing, which happens often in the non-agency space. The model developers now know that they need to adjust their modeling approach when these key data elements are not available.

The dashboard also makes it easy to spot performance disparities by shelf, for example, and can be used as the basis for applying prepayment multipliers to certain shelves in order to align results with actual experience.

Continuous model monitoring is fast becoming a regulatory expectation and an increasingly vital component of model governance. But the benefits of continuous performance monitoring go far beyond satisfying auditors and regulators. Machine learning and other advanced analytics are also proving to be invaluable tools for better understanding model error within sub-spaces of the population.

Watch this space for a forthcoming post and webinar explaining how RiskSpan leverages its automated model back-testing results and machine learning platform, Edge Studio, to streamline the calibration process for its internally developed residential mortgage prepayment model.


EDGE: New Forbearance Data in Agency MBS

Over the course of 2020 and into early 2021, the mortgage market has seen significant changes driven by the COVID pandemic. Novel programs, ranging from foreclosure moratoriums to payment deferrals and forbearance of those payments, have changed the near-term landscape of the market.

In the past three months, Fannie Mae and Freddie Mac have released several new loan-level credit statistics to address these novel developments. Some of these new fields are directly related to forbearance granted during the pandemic, while others address credit performance more broadly.

We summarize these new fields in the table below. These fields are all available in the Edge Platform for users to query on.

The data on delinquencies and forbearance plans covers March 2021 only, which we summarize below, first by cohort and then by major servicer. Edge users can generate other cuts using these new filters or by running the “Expanded Output” for the March 2021 factor date.

In the first table, we show loan-level delinquency for each “Assistance Plan.” Approximately 3.5% of the outstanding GSE universe is in some kind of Assistance Plan.

In the following table, we summarize delinquency by coupon and vintage for 30yr TBA-eligible pools. Similar to delinquencies in GNMA, recent-vintage 3.5% and 4.5% carry the largest delinquency load.

Many of the loans that are 90-day and 120+-day delinquent also carry a payment forbearance. Edge users can simultaneously filter for 90+-day delinquency and forbearance status to quantify the amount of seriously delinquent loans that also carry a forbearance versus loans with no workout plan.[2]  Finally, we summarize delinquencies by servicer. Notably, Lakeview and Wells leads major servicers with 3.5% and 3.3% of their loans 120+-day delinquent, respectively. Similar to the cohort analysis above, many of these seriously delinquent loans are also in forbearance. A summary is available on request.

In addition to delinquency, the Enterprises provide other novel performance data, including a loan’s total payment deferral amount. The GSEs started providing this data in December, and we now have sufficient data to start to observing prepayment behavior for different levels of deferral amounts. Not surprisingly, loans with a payment deferral prepay more slowly than loans with no deferral, after controlling for age, loan balance, LTV, and FICO. When fully in the money, loans with a deferral paid 10-13 CPR slower than comparable loans.

Next, we separate loans by the amount of payment deferral they have. After grouping loans by their percentage deferral amount, we observe that deferral amount produces a non-linear response to prepayment behavior, holding other borrower attributes constant.

Loans with deferral amounts less than 2% of their UPB showed almost no prepayment protection when deep in-the-money.[3] Loans between 2% and 4% deferral offered 10-15 CPR protection, and loans with 4-6% of UPB in deferral offered a 40 CPR slowdown.

Note that as deferral amount increases, the data points with lower refi incentive disappear. Since deferral data has existed for only the past few months, when 30yr primary rates were in a tight range near 2.75%, that implies that higher-deferral loans also have higher note rates. In this analysis, we filtered for loans that were no older than 48 months, meaning that loans with the biggest slowdown were typically 2017-2018 vintage 3.5s through 4.5s.

Many of the loans with P&I deferral are also in a forbearance plan. Once in forbearance, these large deferrals may act to limit refinancings, as interest does not accrue on the forborne amount. Refinancing would require this amount to be repaid and rolled into the new loan amount, thus increasing the amount on which the borrower is incurring interest charges. A significantly lower interest rate may make refinancing advantageous to the borrower anyway, but the extra interest on the previously forborne amount will be a drag on the refi savings.

Deferral and forbearance rates vary widely from servicer to servicer. For example, about a third of seriously delinquent loans serviced by New Residential and Matrix had no forbearance plan, whereas more than 95% of such loans serviced by Quicken loans were in a forbearance plan. This matters because loans without a forbearance plan may ultimately be more subject to repurchase and modification, leading to a rise in involuntary prepayments on this subset of loans.

As the economy recovers and borrowers increasingly resolve deferred payments, tracking behavior due to forbearance and other workout programs will help investors better estimate prepayment risk, both due to slower prepays as well as possible future upticks in buyouts of delinquent loans.


Contact us if you are interested in seeing variations on this theme. Using Edge, we can examine any loan characteristic and generate a S-curve, aging curve, or time series.




[1] A link to the Deferral Amount announcement can be found here, and a link to the Forbearance and Delinquency announcement can be found here. Freddie Mac offers a helpful FAQ here on the programs.

[2] Contact RiskSpan for details on how to run this query.

[3] For context, a payment deferral of 2% represents roughly 5 months of missed P&I payments on a 3% 30yr mortgage.


Get Started