Innovation and Alternative Data Archives

May 19 Workshop: Quality Control Using Anomaly Detection (Part 2)

Recorded: May 19 | 1:00 p.m. ET

Last month, RiskSpan’s Suhrud Dagli and Martin Kindler outlined the principles underlying anomaly detection and its QC applications related to market data and market risk. You can view a recording of that workshop here.

On Wednesday, May 19th, Suhrud presented Part 2 of this workshop, which dove into mortgage loan QC and introduce coding examples and approaches for avoiding false negatives using open-source Python algorithms in the Anomaly Detection Toolkit (ADTK).

RiskSpan presents various types of detectors, including extreme studentized deviate (ESD), level shift, local outliers, seasonal detectors, and volatility shift in the context of identifying spike anomalies and other inconsistencies in mortgage data. Specifically:

Coding examples for effective principal component analysis (PCA) loan data QC
Use cases around loan performance and entity correction, and
Novelty detection

Suhrud Dagli

Co-founder and CIO, RiskSpan

Martin Kindler

Managing Director, RiskSpan

Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found.

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

Complete – no true gaps when looking back historically.
Accurate
Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

Spikes
Stale data
Missing data
Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days.

Anomaly Detection

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Anomaly-Detection

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

Anomaly-Detection

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
Local Outliers: detects spikes in near values.
Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

Coding examples
- Application of outlier detection and pipelines
- PCA
Specific loan use cases
- Loan performance
- Entity correction
Novelty Detection
- Anomalies are not always “bad”
- Market monitoring models

You can register for this complimentary workshop here.

Leveraging ML to Enhance the Model Calibration Process

Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Machine Learning-Clustering Results–5-10-2021

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
- Model Error and Pool Features are used as Independent Variables
- Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
- Two dependent variables — Refi Tuner and Turnover Tuner – are used
- Separate models are estimated for each cluster
We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]

Machine Learning-Optimizing Tuning Knobs–5-10-2021

Results

The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.

Machine Learning-Validation of Tuning Results C7–5-10-2021

Conclusion

These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.

Appendix

[1]. https://en.wikipedia.org/wiki/Hierarchical_clustering

[2]. https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering

[3]. https://en.wikipedia.org/wiki/Gradient_boosting

[4]. https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting

Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.

Machine learning’s rise in popularity is attributable to multiple underlying trends:

Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention.
Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.
Cheap computation costs. New techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost.
Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development.

Addressing Monitoring Challenges

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, “ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid”. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring:

Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change.
Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well.
Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available.

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models.

April 28 Workshop: Anomaly Detection

Recorded: April 28 | 1:00 p.m. ET

Outliers and anomalies refer to various types of occurrences in a time series. Spike of value, shift in level or volatility or a change in seasonal pattern are common examples. Anomaly detection depends on specific context.

In this month’s installment in our Data and Machine Learning Workshop Series, RiskSpan Co-Founder & CIO Suhrud Dagli is joined by Martin Kindler, a market risk practitioner who has spent decades dealing with outliers.

Suhrud and Martin explore unsupervised approaches for detecting anomalies.

Suhrud Dagli

Co-founder and CIO, RiskSpan

Martin Kindler

Managing Director, RiskSpan

April 21 Webinar: Automated Prepayment Model Calibration Using Machine Learning

Recorded: April 21 | 1:00 p.m. ET

Manually tuning MBS prepayment models is messy. In what amounts to an elaborate trial-and-error exercise, modelers must frequently resort to subjectively selecting sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and re-running the back-test to see the impacts. Rinse and repeat.

RiskSpan’s Janet Jozwik and Steven Sun present an approach for running a set of back-tests on MBS pools that automatically solves for the right set of tuners to align model results to actuals. Learn how, by automatically covering every feasible combination of model knobs possible, you can visualize for every pool the impact each knob combination has on:

Modeled prepay vs. actuals
Model error
Refi incentive and other pool features

Janet Jozwik

Managing Director, RiskSpan

Steven Sun

Director, RiskSpan

March 31 Workshop: Advanced Forecasting Using Hierarchical Models

Recorded: March 31 | 1:00 p.m. ET

Traditional statistical models apply a single set of coefficients by pooling a large dataset or for specific cohorts.

Hierarchical models learn from feature behavior across dimensions or timeframes.

Suhrud Dagli and Jing Liu host an informative workshop applying hierarchical models to a variety of mortgage and structured finance use cases, including:

Changes in beta and covariance of portfolios across time
Loan performance across geographies and history – e.g., combining credit performance data from 2008 with unemployment-driven credit issues in 2020.
Issuer-level prepayment performance

Suhrud Dagli

Co-founder and Chief Innovation Officer, RiskSpan

Jing Liu

Model Developer, RiskSpan

Edge Enhancements: Spotlight AGENCY EDGE

2021 is off to a great start, but the Edge Team is not resting on its laurels.

On the heels of a year that saw more than a 30 percent increase in Edge subscribers, including a doubling of Agency Module users, we continue to add more of the Ginnie and GSE data you need.

Edge Product Update -GSE Table-3-8-2021–Updated

Edge’s enhanced datasets make customizing S-curves even easier.

For example:

Loans with a principal deferral pay more slowly than loans without them when faced with the same refinancing incentive.

But how much more slowly?

Edge lets you quantify the difference, so you can adjust your models accordingly.

Edge Product Update-S Curve with Footnote-3-8-2021

7 of the 10 largest U.S. broker/dealers use Edge to analyze Agency prepays.

Find out why.

AICPA

The NRI: An Emerging Tool for Quantifying Climate Risk in Mortgage Credit

Climate change is affecting investment across virtually every sector in a growing number of mostly secondary ways. Its impact on mortgage credit investors, however, is beginning to be felt more directly.

Mortgage credit investors are investors in housing. Because housing is subject to climate risk and borrowers whose houses are destroyed by natural disasters are unlikely to continue paying their mortgages, credit investors have a vested interest in quantifying the risk of these disasters.

To this end, RiskSpan is engaged in leveraging the National Risk Index (NRI) to assess the natural disaster and climate risk exposure of mortgage portfolios.

This post introduces the NRI data in the context of mortgage portfolio analysis (loans or mortgage-backed securities), including what the data contain and key considerations when putting together an analysis. A future post will outline an approach for integrating this data into a framework for scenario analysis that combines this data with traditional mortgage credit models.

The National Risk Index

The National Risk Index (NRI) was released in October 2020 through a collaboration led by FEMA. It provides a wealth of new geographically specific data on natural hazard risks across the country. The index and its underlying data were designed to help local governments and emergency planners to better understand these risks and to plan and prepare for the future.

The NRI provides information on both the frequency and severity of natural risk events. The level of detailed underlying data it provides is astounding. The NRI focuses on 18 natural risks (discussed below) and provides detailed underlying components for each. The severity of an event is broken out by damage to buildings, agriculture, and loss of life. This breakdown lets us focus on the severity of events relative to buildings. While the definition of building here includes all types of real estate—houses, commercial, rental, etc.—having the breakdown provides an extra level of granularity to help inform our analysis of mortgages.

The key fields that provide important information for a mortgage portfolio analysis are bulleted below. The NRI provides these data points for each of the 18 natural hazards and each geography they include in their analysis.

Annualized Event Frequency
Exposure to Buildings: Total dollar amount of exposed buildings
Historical Loss Ratio for Buildings (Bayesian methods to derive this estimate, such that every geography is covered for its relevant risks)
Expected Annual Loss for Buildings
Population estimates (helpful for geography weighting)

Grouping Natural Disaster Risks for Mortgage Analysis

The NRI data covers 18 natural hazards, which pose varying degrees of risk to housing. We have found the framework below to be helpful when considering which risks to include in an analysis. We group the 18 risks along two axes:

1) The extent to which an event is impacted by climate change, and

2) An event’s potential to completely destroy a home.

Earthquakes, for example, have significant destructive potential, but climate change is not a major contributor to earthquakes. Conversely, heat waves and droughts wrought by climate change generally do not pose significant risk to housing structures.

When assessing climate risk, RiskSpan typically focuses on the five natural hazard risks in the top right quadrant below.

Immediate Event Risk versus Cumulative Event Risk

Two related but distinct risks inform climate risk analysis.

Immediate Event Analysis: The risk of mortgage delinquency and default resulting directly from a natural disaster event. A home severely damaged or destroyed by a hurricane, for example.
Cumulative Event Risk: Less direct than immediate event risk, this is the risk of widespread home price declines across an entire area communities because of increasing natural hazard risk brought on by climate change. These secondary effects include:
- Heightened homebuyer awareness or perception of increasing natural hazard risk,
- Property insurance premium increases or areas becoming ‘self-insured,’
- Government policy impacts (e.g., potential flood zone remapping), and
- Potential policy changes related to insurance from key players in the mortgage market (i.e., Fannie Mae, Freddie Mac, FHFA, etc.).

NRI data provides an indication of the probability of immediate event occurrence and its historic severity in terms of property losses. We can also empirically observe historical mortgage performance in the wake of previous natural disaster events. Data covering several hurricane and wildfire events are available.

Cumulative event risk is less observable. A few academic papers attempt to tease out these impacts, but the risk of broader home price declines typically needs to be incorporated into a risk assessment framework through transparent scenario overlays. Examples of such scenarios include home price declines of as much as 20% in newly flood-exposed areas of South Florida. There is also research suggesting that there are often long term impacts to consumer credit following a natural disaster.

Geography Normalization

Linking to the NRI is simple when detailed loan pool geographic data are available. Analysts can merge by census tract or county code. Census tract is the more geographically granular measure and provides a more detailed analysis.

For many capital markets participants, however, that level of geographic specific detail is not available. At best, an investor may have a 5-digit or 3-digit zip code. Zip codes do not directly match to a given county or census tract and can potentially span across those distinctions.

There is no perfect way to perform the data link when zip code is the only available geographic marker. We take an approach that leverages the other data on housing stock by census tract to weight mortgage portfolio data when multiple census tracts map to a zip code.

Other Data Limitations

The loss information available represents a simple historical average loss rate given an event. But hurricanes (and hurricane seasons) are not all created equal. The same is true of other natural disasters. Relying on averages may work over long time horizons but could significantly underpredict or overpredict loss in a particular year. Further, the frequency of events is rising so that what used to be considered 100 year event may be closer to a 10 or 20 year event. Lacking data about what losses might look like under extreme scenarios makes modeling such events problematic.

The data also make it difficult to take correlation into account. Hurricanes and coastal flooding are independent events in the dataset but are obviously highly correlated with one another. The impact of a large storm on one geographic area is likely to be correlated with that of nearby areas (such as when a hurricane makes its way up the Eastern Seaboard).

The workarounds for these limitations have limitations of their own. But one solution involves designing transparent assumptions and scenarios related to the probability, severity, and correlation of stress events. We can model outlier events by assuming that losses for a particular peril follow a normal distribution with set standard deviations. Other assumptions can be made about correlations between perils and geographies. Using these assumptions, stress scenarios can be derived by picking a particular percentile along the loss distribution.

A Promising New Credit Analysis Tool for Mortgages

Notwithstanding its limitations, the new NRI data is a rich source of information that can be leveraged to help augment credit risk analysis of mortgage and mortgage-backed security portfolios. The data holds great promise as a starting point (and perhaps more) for risk teams starting to put together climate risk and other ESG analysis frameworks.

January 13 Workshop: Pattern Recognition in Time Series Data

Recorded: January 13, 2021 | 1:00 p.m. ET

Traders and investors rely on time series patterns generated by asset performance to inform and guide their trading and asset allocation decisions. Economists take advantage of analogous patterns in macroeconomic and market data to forecast recessions and other market events.

But you need to be able to spot these patterns in order to use them.

Catch the latest in RiskSpan’s series of machine learning and data workshops as Chirag Soni and Jing Liu, two of RiskSpan’s experts working at the intersection of data science and capital markets, demonstrate how advanced machine learning techniques such as Dynamic Time Warping and KShape can be applied to automate time series analysis and effectively detect patterns hiding in your data.

Chirag and Jing will discuss specific applications, explain popular algorithms, and walk through code examples.

Join us on Wednesday, January 13th!

123 4 5 6

May 19 Workshop: Quality Control Using Anomaly Detection (Part 2)

Recorded: May 19 | 1:00 p.m. ET

Anomaly Detection and Quality Control

Leveraging ML to Enhance the Model Calibration Process

Three Principles for Effectively Monitoring Machine Learning Models

April 28 Workshop: Anomaly Detection

Recorded: April 28 | 1:00 p.m. ET

April 21 Webinar: Automated Prepayment Model Calibration Using Machine Learning

Recorded: April 21 | 1:00 p.m. ET

March 31 Workshop: Advanced Forecasting Using Hierarchical Models

Recorded: March 31 | 1:00 p.m. ET

Edge Enhancements: Spotlight AGENCY EDGE

2021 is off to a great start, but the Edge Team is not resting on its laurels.

Edge’s enhanced datasets make customizing S-curves even easier.

7 of the 10 largest U.S. broker/dealers use Edge to analyze Agency prepays.

Find out why.

The NRI: An Emerging Tool for Quantifying Climate Risk in Mortgage Credit

January 13 Workshop: Pattern Recognition in Time Series Data

Recorded: January 13, 2021 | 1:00 p.m. ET

Company

Products

Security & Compliance