Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Category: Article

Mortgage DQs by MSA: Non-Agency Performance Chart of the Month

This month we take a closer look at geographical differences in loan performance in the non-agency space. The chart below looks at the 60+ DPD Rate for the 5 Best and Worst performing MSAs (and the overall average). A couple of things to note:

  • The pandemic seems to have simply amplified performance differences that were already apparent pre-covid. The worst performing MSAs were showing mostly above-average delinquency rates before last year’s disruption.
  • Florida was especially hard-hit. Three of the five worst-performing MSAs are in Florida. Not surprisingly, these MSAs rely heavily on the tourism industry.
  • New York jumped from being about average to being one of the worst-performing MSAs in the wake of the pandemic. This is not surprising considering how seriously the city bore the pandemic’s brunt.
  • Tech hubs show strong performance. All our best performers are strong in the Tech industry—Austin’s the new Bay Area, right?
Contact Us

Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found. 

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

  • Complete – no true gaps when looking back historically.
  • Accurate
  • Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

  • Spikes
  • Stale data
  • Missing data
  • Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets​. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days​.

Anomaly Detection

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap​. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Anomaly-Detection

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

Anomaly-Detection

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

  • Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
  • Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
  • Local Outliers: detects spikes in near values.
  • Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
  • Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

  • Coding examples
    • Application of outlier detection and pipelines
    • PCA
  • Specific loan use cases
    • Loan performance
    • Entity correction
  • Novelty Detection
    • Anomalies are not always “bad”
    • Market monitoring models

You can register for this complimentary workshop here.


Leveraging ML to Enhance the Model Calibration Process

Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

  1. Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

  1. Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

  1. Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.  

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

  1. These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
    • Model Error and Pool Features are used as Independent Variables
    • Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
    • Two dependent variables — Refi Tuner and Turnover Tuner – are used
    • Separate models are estimated for each cluster
  2. We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]

Results

The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.

Conclusion

These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.


RiskSpan VQI: Current Underwriting Standards Q1 2021

VQI-Risk-Layers-Calc-March-2021

RiskSpan’s Vintage Quality Index estimates the relative “tightness” of credit standards by computing and aggregating the percentage of Agency originations each month with one or more “risk factors” (low-FICO, high DTI, high LTV, cash-out refi, investment properties, etc.). Months with relatively few originations characterized by these risk factors are associated with lower VQI ratings. As the historical chart above shows, the index maxed out (i.e., had an unusually high number of loans with risk factors) leading up to the 2008 crisis.

Vintage Quality Index Stability Masks Purchase Credit Contraction

The first quarter of 2021 provides a stark example of why it is important to consider the individual components of RiskSpan’s Vintage Quality Index and not just the overall value. 

The Index overall dropped by just 0.37 points to 76.68 in the first quarter of 2021. On the surface, this seems to suggest a minimal change to credit availability and credit quality over the period. But the Index’s net stability masks a significant change in one key metric offset by more modest counterbalancing changes in the remaining eight. The percentage of high-LTV mortgages fell to 16.7% (down from 21% at the end of 2020) during the first quarter.  

While this continues a trend in falling rates of high-LTV loans (down 8.7% since Q1 of 2020 and almost 12% from Q1 2019) it coincides with a steady increase in house prices. From December 2020 to February 2021, the Monthly FHFA House Price Index® (US, Purchase Only, Seasonally Adjusted) rose 1.9%. More striking is the year-over-year change from February 2020 to 2021, during which the same rose by 11.1%. Taken together, the 10% increase in home prices combined with a 10% reduction in the share of high-LTV loans paints a sobering picture for marginal borrowers seeking to purchase a home.  

Some of the reduction in high-LTV share is obviously attributable to the growing percentage of refinance activity (including cash-out refinancing, which counterbalances the effect the falling high-LTV rate has on the index). But these refis does not impact the purchase-only HPI. As a result, even though the overall Index did not change materially, higher required down payments (owing to higher home prices) combined with fewer high-LTV loans reflects a credit box that effectively shrank in Q1.

 

VQI-Risk-Layers-Historical-Trend-March-2021

VQI-RISK-LAYERS-HEADER-MARCH-2021

VQI-March-2021-FICO-660

VQI-March-2021-LTV-80

VQI-March-2021-Adjust-Rate

VQI-March-2021-Loans-W-SF

VQI-March-2021-Cash-Refi

VQI-RISK-LAYERS-HEADER-MARCH-2021

VQI-March-2021

VQI-March-2021

VQI-March-2021-OBL

VQI-Analytic-and-Data-Assumptions-Header

Population assumptions:

  • Monthly data for Fannie Mae and Freddie Mac.

  • Loans originated more than three months prior to issuance are excluded because the index is meant to reflect current market conditions.

  • Loans likely to have been originated through the HARP program, as identified by LTV, MI coverage percentage, and loan purpose are also excluded. These loans do not represent credit availability in the market as they likely would not have been originated today but for the existence of HARP.                                                                                               

Data assumptions:

  • Freddie Mac data goes back to 12/2005. Fannie Mae only back to 12/2014.

  • Certain fields for Freddie Mac data were missing prior to 6/2008.   

GSE historical loan performance data release in support of GSE Risk Transfer activities was used to help back-fill data where it was missing.

An outline of our approach to data imputation can be found in our VQI Blog Post from October 28, 2015.                                                

 


Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.  

Machine learning’s rise in popularity is attributable to multiple underlying trends: 

  1. Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention. 
  2. Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.  
  3. Cheap computation costsNew techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost. 
  4. Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development. 

Addressing Monitoring Challenges 

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring: 

  1. Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change. 
  2.  Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well. 
  3. Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available. 

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models. 


Too Many Documentation Types? A Data-Driven Approach to Consolidating Them

The sheer volume of different names assigned to various documentation types in the non-agency space has really gotten out of hand, especially in the last few years. As of February 2021, an active loan in the CoreLogic RMBS universe could have any of over 250 unique documentation type names, with little or no standardization from issuer to issuer. Even within a single issuer, things get complicated when every possible permutation of the same basic documentation level gets assigned its own type. One issuer in the database has 63 unique documentation names!

In order for investors to be able to understand and quantify their exposure, we need a way of consolidating and mapping all these different documentation types to a simpler, standard nomenclature. Various industry reports attempt to group all the different documentation levels into meaningful categories. But these classifications often fail to capture important distinctions in delinquency performance among different documentation levels.

There is a better way. Taking some of the consolidated group names from the various industry papers and rating agency papers as a starting point, we took another pass focusing on two main elements:

  • The delinquency performance of the group. We focused on the 60-DPD rate while also considering other drivers of loan performance (e.g., DTI, FICO, and LTV) and their correlation to the various doc type groups.
  • The size of the sub-segment. We ensured our resulting groupings were large enough to be meaningful.

What follows is how we thought about it and ultimately landed where we did. These mappings are not set in stone and will likely need to undergo revisions as 1) new documentation types are generated, and 2) additional performance data and feedback from clients on what they consider most important become available. Releasing these mappings into RiskSpan’s Edge Platform will then make it easier for users to track performance.

Data Used

We take a snapshot of all loans outstanding in non-agency RMBS issued after 2013, as of the February 2021 activity period. The data comes from CoreLogic and we exclude loans in seasoned or reperforming deals. We also exclude loans whose documentation type is not reported, some 14 percent of the population.

Approach

We are seeking to create sub-groups that generally conform to the high-level groups on which the industry seems to be converging while also identifying subdivisions with meaningfully different delinquency performance. We will rely on these designations as we re-estimate our credit model.

Steps in the process:

  1. Start with high-level groupings based on how the documentation type is currently named.
    • Full Documentation: Any name referencing ‘Agency,’ ‘Agency AUS,’ or similar.
    • Bank Statements: Any name including the term “Bank Statement[s].”
    • Investor/DSCR: Any name indicating that the underwriting relied on net cash flows to the secured property.
    • Alternative Documentation: A wide-ranging group consolidating many different types, including: asset qualifier, SISA/SIVA/NINA, CPA letters, etc.
    • Other: Any name that does not easily classify into one of the groups above, such as Foreign National Income, and any indecipherable names.

Chart

  1. We subdivided the Alternative Documentation group by some of the meaningfully sized natural groupings of the names:
    • Asset Depletion or Asset Qualifier
    • CPA and P&L statements
    • Salaried/Wage Earner: Includes anything with W2 tax return
    • Tax Returns or 1099s: Includes anything with ‘1099’ or ‘Tax Return, but not ‘W2.’
    • Alt Doc: Anything that remained, included items like ‘VIVA, ‘SISA,’ ‘NINA,’ ‘Streamlined,’ ‘WVOE,’ and ‘Alt Doc.’
  1. From there we sought to identify any sub-groups that perform differently (as measured by 60-DPD%).
    • Bank Statement: We evaluated a subdivision by the number of statements provided (less than 12 months, 12 months, and greater than 12 months). However, these distinctions did not significantly impact delinquency performance. (Also, very few loans fell into the under 12 months group.) Distinguishing ‘Business Bank Statement’ loans from the general ‘Bank Statements’ category, however, did yield meaningful performance differences.

High Level

    • Alternative Documentation: This group required the most iteration. We initially focused our attention on documentation types that included terms like ‘streamlined’ or ‘fast.’ This, however, did not reveal any meaningful performance differences relative to other low doc loans. We also looked at this group by issuer, hypothesizing that some programs might perform better than others. The jury is still out on this analysis and we continue to track it. The following subdivisions yielded meaningful differences:
      • Limited Documentation: This group includes any names including the terms ‘reduced,’ ‘limited,’ ‘streamlined,’ and ‘alt doc.’ This group performed substantially better than the next group.
      • No Doc/Stated: Not surprisingly, these were the worst performers in the ‘Alt Doc’ universe. The types included here are a throwback to the run-up to the housing crisis. ‘NINA,’ ‘SISA,’ ‘No Doc,’ and ‘Stated’ all make a reappearance in this group.
      • Loans with some variation of ‘WVOE’ (written verification of employment) showed very strong performance, so much so that we created an entirely separate group for them.
  • Full Documentation: Within the variations of ‘Full Documentation’ was a whole sub-group with qualifying terms attached. Examples include ‘Full Doc 12 Months’ or ‘Full w/ Asset Assist.’ These full-doc-with-qualification loans were associated with higher delinquency rates. The sub-groupings reflect this reality:
      • Full Documentation: Most of the straightforward types indicating full documentation, including anything with ‘Agency/AUS.’
      • Full with Qualifications (‘Full w/ Qual’): Everything including the term ‘Full’ followed by some sort of qualifier.
  • Investor/DSCR: The sub-groups here either were not big enough or did not demonstrate sufficient performance difference.
  • Other: Even though it’s a small group, we broke out all the ‘Foreign National’ documentation types into a separate group to conform with other industry reporting.

High Level

Among the challenges of this sort of analysis is that the combinations to explore are virtually limitless. Perhaps not surprisingly, most of the potential groupings we considered did not make it into our final mapping. Some of the cuts we are still looking at include loan purpose with respect to some of the alternative documentation types.

We continue to evaluate these and other options. We can all agree that 250 documentation types is way too many. But in order to be meaningful, the process of consolidation cannot be haphazard. Fortunately, the tools for turning sub-grouping into a truly data-driven process are available. We just need to use them.


Value Beyond Validation: The Future of Automated Continuous Model Monitoring Has Arrived

Imagine the peace of mind that would accompany being able to hand an existing model over to the validators with complete confidence in how the outcomes analysis will turn out. Now imagine being able to do this using a fully automated process.

The industry is closer to this than you might think.

The evolution of ongoing model monitoring away from something that happens only periodically (or, worse, only at validation time) and toward a more continuous process has been underway for some time. Now, thanks to automation and advanced process design, this evolutionary process has reached an inflection point. We stand today at the threshold of a future where:

  • Manual, painful processes to generate testing results for validation are a thing of the past;
  • Models are continuously monitored for fit, and end users are empowered with the tools to fully grasp model strengths and weaknesses;
  • Modeling and MRM experts leverage machine learning to dive more deeply into the model’s underlying data, and;
  • Emerging trends and issues are identified early enough to be addressed before they have time to significantly hamper model performance.

Sound too good to be true? Beginning with its own internally developed prepayment and credit models, RiskSpan data scientists are laying out a framework for automated, ongoing performance monitoring that has the potential to transform behavioral modeling (and model validation) across the industry.

The framework involves model owners working collaboratively with model validators to create recurring processes for running previously agreed-upon tests continuously and receiving the results automatically. Testing outcomes continuously increases confidence in their reliability. Testing them automatically frees up high-cost modeling and validation resources to spend more time evaluating results and running additional, deeper analyses.

The Process:

Irrespective of the regulator, back-testing, benchmarking, and sensitivity analysis are the three pillars of model outcomes analysis. Automating the data and analytical processes that underlie these three elements is required to get to a fully comprehensive automated ongoing monitoring scheme.

In order to be useful, the process must stage testing results in a central database that can:

  • Automatically generate charts, tables, and statistical tests to populate validation reports;
  • Support dashboard reporting that allows model owners, users and validators to explore test results, and;
  • Feed advanced analytics and machine learning platforms capable of 1) helping with automated model calibration, and 2) identifying model weaknesses and blind spots (as we did with a GSE here).

Perhaps not surprisingly, achieving the back-end economies of a fully automated continuous monitoring and reporting regime requires an upfront investment of resources. This investment takes the form of time from model developers and owners as well as (potentially) some capital investment in technology necessary to host and manage the storage of results and output reports.

A good rule of thumb for estimating these upfront costs is between 2 and 3 times the cost of a single annual model test performed on an ad-hoc, manual basis. Consequently, the automation process can generally be expected to pay for itself (in time savings alone) over 2 to 3 cycles of performance testing. But the benefits of automated, continuous model monitoring go far beyond time savings. They invariably result in better models.

Output Applications

Continuous model monitoring produces benefits that extend well beyond satisfying model governance requirements. Indeed, automated monitoring has significantly informed the development process for RiskSpan’s own, internally developed credit and prepayment models – specifically in helping to identify sub-populations where model fit is a problem.

Continuous monitoring also makes it possible to quickly assess the value of newly available data elements. For example, when the GSEs start releasing data on mortgages with property inspection waivers (PIWs) (as opposed to traditional appraisals) we can immediately combine that data element with the results of our automated back-testing to determine whether the PIW information can help predict model error from those results. PIW currently appears to have value in predicting our production model error, and so the PIW feature is now slated to be added to a future version of our model. Having an automated framework in place accelerates this process while also enabling us to proceed with confidence that we are only adding variables that improve model performance.

The continuous monitoring results can also be used to develop helpful dashboard reports. These provide model owners and users with deeper insights into a model’s strengths and weaknesses and can be an important tool in model tuning. They can also be shared with model validators, thus facilitating that process as well.

The dashboard below is designed to give our model developers and users a better sense of where model error is greatest. Sub-populations with the highest model error are deep red. This makes it easy for model developers to visualize that the model does not perform well when FICO and LTV data are missing, which happens often in the non-agency space. The model developers now know that they need to adjust their modeling approach when these key data elements are not available.

The dashboard also makes it easy to spot performance disparities by shelf, for example, and can be used as the basis for applying prepayment multipliers to certain shelves in order to align results with actual experience.

Continuous model monitoring is fast becoming a regulatory expectation and an increasingly vital component of model governance. But the benefits of continuous performance monitoring go far beyond satisfying auditors and regulators. Machine learning and other advanced analytics are also proving to be invaluable tools for better understanding model error within sub-spaces of the population.

Watch this space for a forthcoming post and webinar explaining how RiskSpan leverages its automated model back-testing results and machine learning platform, Edge Studio, to streamline the calibration process for its internally developed residential mortgage prepayment model.


EDGE: New Forbearance Data in Agency MBS

Over the course of 2020 and into early 2021, the mortgage market has seen significant changes driven by the COVID pandemic. Novel programs, ranging from foreclosure moratoriums to payment deferrals and forbearance of those payments, have changed the near-term landscape of the market.

In the past three months, Fannie Mae and Freddie Mac have released several new loan-level credit statistics to address these novel developments. Some of these new fields are directly related to forbearance granted during the pandemic, while others address credit performance more broadly.

We summarize these new fields in the table below. These fields are all available in the Edge Platform for users to query on.

The data on delinquencies and forbearance plans covers March 2021 only, which we summarize below, first by cohort and then by major servicer. Edge users can generate other cuts using these new filters or by running the “Expanded Output” for the March 2021 factor date.

In the first table, we show loan-level delinquency for each “Assistance Plan.” Approximately 3.5% of the outstanding GSE universe is in some kind of Assistance Plan.

In the following table, we summarize delinquency by coupon and vintage for 30yr TBA-eligible pools. Similar to delinquencies in GNMA, recent-vintage 3.5% and 4.5% carry the largest delinquency load.

Many of the loans that are 90-day and 120+-day delinquent also carry a payment forbearance. Edge users can simultaneously filter for 90+-day delinquency and forbearance status to quantify the amount of seriously delinquent loans that also carry a forbearance versus loans with no workout plan.[2]  Finally, we summarize delinquencies by servicer. Notably, Lakeview and Wells leads major servicers with 3.5% and 3.3% of their loans 120+-day delinquent, respectively. Similar to the cohort analysis above, many of these seriously delinquent loans are also in forbearance. A summary is available on request.

In addition to delinquency, the Enterprises provide other novel performance data, including a loan’s total payment deferral amount. The GSEs started providing this data in December, and we now have sufficient data to start to observing prepayment behavior for different levels of deferral amounts. Not surprisingly, loans with a payment deferral prepay more slowly than loans with no deferral, after controlling for age, loan balance, LTV, and FICO. When fully in the money, loans with a deferral paid 10-13 CPR slower than comparable loans.

Next, we separate loans by the amount of payment deferral they have. After grouping loans by their percentage deferral amount, we observe that deferral amount produces a non-linear response to prepayment behavior, holding other borrower attributes constant.

Loans with deferral amounts less than 2% of their UPB showed almost no prepayment protection when deep in-the-money.[3] Loans between 2% and 4% deferral offered 10-15 CPR protection, and loans with 4-6% of UPB in deferral offered a 40 CPR slowdown.

Note that as deferral amount increases, the data points with lower refi incentive disappear. Since deferral data has existed for only the past few months, when 30yr primary rates were in a tight range near 2.75%, that implies that higher-deferral loans also have higher note rates. In this analysis, we filtered for loans that were no older than 48 months, meaning that loans with the biggest slowdown were typically 2017-2018 vintage 3.5s through 4.5s.

Many of the loans with P&I deferral are also in a forbearance plan. Once in forbearance, these large deferrals may act to limit refinancings, as interest does not accrue on the forborne amount. Refinancing would require this amount to be repaid and rolled into the new loan amount, thus increasing the amount on which the borrower is incurring interest charges. A significantly lower interest rate may make refinancing advantageous to the borrower anyway, but the extra interest on the previously forborne amount will be a drag on the refi savings.

Deferral and forbearance rates vary widely from servicer to servicer. For example, about a third of seriously delinquent loans serviced by New Residential and Matrix had no forbearance plan, whereas more than 95% of such loans serviced by Quicken loans were in a forbearance plan. This matters because loans without a forbearance plan may ultimately be more subject to repurchase and modification, leading to a rise in involuntary prepayments on this subset of loans.

As the economy recovers and borrowers increasingly resolve deferred payments, tracking behavior due to forbearance and other workout programs will help investors better estimate prepayment risk, both due to slower prepays as well as possible future upticks in buyouts of delinquent loans.


Contact us if you are interested in seeing variations on this theme. Using Edge, we can examine any loan characteristic and generate a S-curve, aging curve, or time series.




[1] A link to the Deferral Amount announcement can be found here, and a link to the Forbearance and Delinquency announcement can be found here. Freddie Mac offers a helpful FAQ here on the programs.

[2] Contact RiskSpan for details on how to run this query.

[3] For context, a payment deferral of 2% represents roughly 5 months of missed P&I payments on a 3% 30yr mortgage.


A Mentor’s Advice: Work Hard on Things You Can Control; Learn to Live with Things You Cannot

March is Women’s History Month and RiskSpan is marking the occasion by sharing a short series of posts featuring advice from women leaders in our industry.

Laurie-Goodman

Today’s contributor is Dr. Laurie Goodman, vice president at the Urban Institute and codirector of Urban’s Housing Finance Policy Center. Laurie helped break barriers as one of the first women to work on Wall Street and built her own brand as a go-to researcher for the housing and mortgage industry.

Laurie serves on the board of directors of MFA Financial, Arch Capital Group, Home Point Capital and DBRS. In 2009, she was inducted into the Fixed Income Analysts Hall of Fame following a series of successful research leadership and portfolio management positions at several Wall Street firms.


Laurie offers this guidance to young women (though it is applicable to everyone):

#1 – Figure out the balance that works for you between your personal life and your work life, realizing that you can’t be all things to all people all the time. There are times when you will spend more time on your work life and times when you will spend more time on your home life and other non-work related activities. You can’t be a super-performer at both all the time. Don’t beat yourself up for that part of your life where you feel you are underperforming.

#2 – Develop a thick skin and don’t take things personally. This will make you a much better colleague. Many times, colleagues and others in your organization make comments that can be interpreted either as personal affronts or general statements on the project. Always look for the non-personal interpretation (even if you suspect it is personal). For example, “Gee, these results aren’t very useful” can be interpreted personally as “It’s your fault — if you had done it differently it would have been better” or non-personally, as in “The material just didn’t give us any new insights.” Assume it was meant non-personally.  

#3 – Develop confidence and advocate for yourself. Speak up in meetings, particularly if you have points to add, or can steer the conversation back on track. If you are not feeling confident, fake it until you realize that you have as much (or more) to contribute than anyone else. And use that confidence to advocate for yourself — your success is more important to you than it is to anyone else. Have the confidence to own your mistakes; we all make mistakes. if you own them, you will do everything you can to correct them.

We also asked Laurie what, if anything, she might have done differently. Her response:

Early in my career, when things went off track for any reason, I got very frustrated. I was unable or unwilling to distinguish between those aspects of my work that were under my control, and those aspects of my work environment that I could not control. As a result, in the early years of my career, I changed jobs frequently. As the years have gone on, I have learned to do the best work I can on issues that are under my control and accept and live with what is not. It has made my work life much more enjoyable and productive.


Our thanks to Laurie for her valuable perspective!

Keep an eye on https://riskspan.com/insights/ throughout March for insights from other women we admire in mortgage and structured finance.


RiskSpan is proud to sponsor POWER OF VOICE BENEFIT. Girls Leadership teaches girls to exercise the power of their voice. #powerofvoice2021


A Mentor’s Advice: Go Where Your Heart Leads and Learn to Say Yes (and No)

March is Women’s History Month and RiskSpan is marking the occasion by sharing a short series of posts featuring advice from women leaders in our industry. Amy Cutts

Today’s contributor is Dr. Amy Crews CuttsPresident and Chief Economist of AC Cutts and Associates, an economics and strategy consulting firm based in Reston, VA. She started her professional career as academic and she used that experience to build her network, soon landing at Freddie Mac. There she honed her professional skills and reputation as an economist, writer, and speaker. Amy was engaged by Equifax in 2011 to create the office of the chief economist. She has been recruited to serve on corporate and nonprofit advisory boards and elected to serve on boards of directors of leading economics associations. Amy has become an internationally recognized expert on consumer credit and economic policy and is a sought-after speaker and advisor. She is a participant in the Wall Street Journal Survey of Leading Economics, and her work has been cited in federal regulations and in cases before the U.S. Supreme Court.


Amy offers this guidance to women embarking on their careers (though it is applicable to everyone): 

#1 – Look deep for your talents and passion. We tend to think of jobs as titles rather than the small things that make up the role. The best day of my career came when I embraced the joy in the small parts of the job, and from that I was able to move mountains within the company and create the role that suited me.  

#2 – Build your networks always. When invited to join others for lunch, go! There are always deadlines, but this is just as important. You never know when you will learn a critically valuable piece of information from a casual conversation. When given the opportunity to present (internally or externally), take it, and know that they respect you enough to have asked and they care about what you have to say. Even a small opportunity may be the start of something big, so jump in with both feet.   

#3 – Speak up and, when you have a strong opinion, add your voice to the discussion (but never, ever, make it personal). In a corporate setting, you get one chance to speak critically of a plan, a policy, or problem. After that you need to move on because you were heard, even if they chose to go against your advice. Good counsel is valuable in every organization. Be someone others want to get advice from. 

We also asked Amy what, if anything, she might have done differently. Her response: 

My biggest regrets have come from not being openminded enough. I didn’t know at the time that the days of secretaries were numbered, but I rejected the suggestion of taking a typing class in high school because I had bigger plans (who knew we would spend our days typing?). I never learned how to code well because, at the time I was in college, computer science courses were mainly for mainframe applications (unfortunately, I did take APL programming, an already dying language in the 1980s). I have rejected job opportunities because I did not fit 100 percent of the job description but would later see someone much less capable in that role, with the prestige or promotion I should have tried for.


Our thanks to Amy for her valuable perspective!

Keep an eye on https://riskspan.com/insights/ this month for insights from other women we admire in mortgage and structured finance.


RiskSpan is proud to sponsor POWER OF VOICE BENEFIT. Girls Leadership teaches girls to exercise the power of their voice. #powerofvoice2021


Get Started
Log in

Linkedin   

risktech2024