Get Started
Articles Tagged with: Innovation and Alternative Data

Managing Market Risk for Crypto Currencies

 

Contents

 

Overview

Asset Volatility vs Asset Sensitivity to Benchmark (Beta)

Portfolio Asset Covariance

Value at Risk (VaR)

Bitcoin Futures: Basis and Proxies

Intraday Value at Risk (VaR)

Risk-Based Limits

VaR Validation (Bayesian Approach)

Scenario Analysis

Conclusion


Overview

Crypto currencies have now become part of institutional investment strategies. According to CoinShares, assets held under management by crypto managers reached $57B at the end of Q1 2021.  

Like any other financial asset, crypto investments are subject to market risk monitoring with several approaches evolving. Crypto currencies exhibit no obvious correlation to other assets classes, risk factors  or economic variables. However, crypto currencies have exhibited high price volatility and have enough historical data to implement a robust market risk process. 

In this paper we discuss approaches to implementing market risk analytics for a portfolio of crypto assets. We will look at betas to benchmarks, correlations, Value at Risk (VaR) and historical event scenarios. 

Value at Risk allows risk managers to implement risk-based limits structures, instead of relying on traditional notional measures. The methodology we propose enables consolidation of risk for crypto assets with the rest of the portfolio. We will also discuss the use of granular time horizons for intraday limit monitoring. 

Asset Volatility vs Asset Sensitivity to Benchmark (Beta)

For exchange-traded instruments, beta measures the sensitivity of asset price returns relative to a benchmark. For US-listed large cap stocks, beta is generally computed relative to the S&P 500 index. For crypto currencies, several eligible benchmark indices have emerged that represent the performance of the overall crypto currency market.

We analyzed several currencies against S&P’s Bitcoin Index (SPBTC). SPBTC is designed to track the performance of the original crypto asset, Bitcoin. As market capitalization for other currencies grows, it would be more appropriate to switch to a dynamic multi-currency index such as Nasdaq’s NCI. At the time of this paper, Bitcoin constituted 62.4% of NCI.

Traditionally, beta is calculated over a variable time frame using least squares fit on a linear regression of benchmark return and asset return. One of the issues with calculating betas is the variability of the beta itself. In order to overcome that, especially given the volatility of crypto currencies, we recommend using a rolling beta.

Due to the varying levels of volatility and liquidity of various crypto currencies, a regression model may not always be a good fit. In addition to tracking fit through R-squared, it is important to track confidence level for the computed betas.

Figure 1 History of Beta to S&P Bitcoin Index with Confidence Intervals

The chart above shows rolling betas and confidence intervals for four crypto currencies between January 2019 and July 2021. Beta and confidence interval both vary over time and periods of high volatility (stress) cause a larger dislocation in the value of beta.

Rolling betas can be used to generate a hierarchical distribution of expected asset values.

Portfolio Asset Covariance

Beta is a useful measure to track an asset’s volatility relative to a single benchmark. In order to numerically analyze the risk exposure (variance) of a portfolio with multiple crypto assets, we need to compute a covariance matrix. Portfolio risk is a function not only of each asset’s volatility but also of the cross-correlation among them.

Figure 2 Correlations for 11 currencies (calculated using observations from 2021)

The table above shows a correlation matrix across 11 crypto assets, including Bitcoin.

Like betas, correlations among assets change over time. But correlation matrices are more unwieldy to track over time than betas are. For this reason, hierarchical models provide a good, practical framework for time-varying covariance matrices.

Value at Risk (VaR)

The VaR for a position or portfolio can be defined as some threshold Τ (in dollars) where the existing position, when faced with market conditions resembling some given historical period, will have P/L greater than Τ with probability k. Typically, k  is chosen to be 99% or 95%.

To compute this threshold Τ, we need to:

  1. Set a significance percentile k, a market observation period, and holding period n.
  2. Generate a set of future market conditions (scenarios) from today to period n.
  3. Compute a P/L on the position for each scenario

After computing each position’s P/L, we sum the P/L for each scenario and then rank the scenarios’ P/Ls to find the the k th percentile (worst) loss. This loss defines our VaR Τ at the the k th percentile for observation-period length n.

Determining what significance percentile k and observation length n to use is straightforward and often dictated by regulatory rules. For example, 99th percentile 10-day VaR is used for risk-based capital under the Market Risk Rule. Generating the scenarios and computing P/L under these scenarios is open to interpretation. We cover each of these, along with the advantages and drawbacks of each, in the next two sections.

To compute VaR, we first need to generate projective scenarios of market conditions. Broadly speaking, there are two ways to derive this set of scenarios:

  1. Project future market conditions using historical (actual) changes in market conditions
  2. Project future market conditions using a Monte Carlo simulation framework

In this paper, we consider a historical simulation approach.

RiskSpan projects future market conditions using actual (observed) n-period changes in market conditions over the lookback period. For example, if we are computing 1-day VaR for regulatory capital usage under the Market Risk Rule, RiskSpan takes actual daily changes in risk factors. This approach allows our VaR scenarios to account for natural changes in correlation under extreme market moves. RiskSpan finds this to be a more natural way of capturing changing correlations without the arbitrary overlay of how to change correlations in extreme market moves. This, in turn, will more accurately capture VaR. Please note that newer crypto currencies may not have enough data to generate a meaningful set of historical scenarios. In these cases, using a benchmark adjusted by a short-term beta may be used as an alternative.

One key consideration for the historical simulation approach is the selection of the observation window or lookback period. Most regulatory guidelines require at least a one-year window. However, practitioners also recommend a shorter lookback period for highly volatile assets. In the chart below we illustrate how VaR for our portfolio of crypto currencies changes for a range of lookback periods and confidence intervals. Please note that VaR is expressed as a percentage of portfolio market value.

Use of an exponentially weighted moving average methodology can be used to overcome the challenges associated with using a shorter lookback period. This approach emphasizes recent observations by using exponentially weighted moving averages of squared deviations. In contrast to equally weighted approaches, these approaches attach different weights to the past observations contained in the observation period. Because the weights decline exponentially, the most recent observations receive much more weight than earlier observations.

Figure 3 Daily VaR as % of Market Value calculated using various historical observation periods

VaR as a single number does not represent the distribution of P/L outcomes. In addition to computing VaR under various confidence intervals, we also compute expected shortfall, worst loss, and standard deviation of simulated P/L vectors. Worst loss and standard deviation are self-explanatory while the calculation of expected shortfall is described below.

Expected shortfall is the average of all the P/L figures to the left of the VaR figure. If we have 1,000 simulated P/L vectors, and the VaR is the 950th worst case observation, the expected shortfall is the average of P/Ls from 951 to 1000. 

The table below presents VaR-related metrics as a percentage of portfolio market value under various lookback periods.

Figure 4 VaR for a portfolio of crypto assets computed for various lookback periods and confidence intervals

Bitcoin Futures: Basis and Proxies

One of the most popular trades for commodity futures is the basis trade. This is when traders build a strategy around the difference between the spot price and futures contract price of a commodity. This exists in corn, soybean, oil and of course Bitcoin. 

For the purpose of calculating VaR, specific contracts may not provide enough history and risk systems use continuous contracts. Continuous contracts introduce additional basis as seen in the chart below. Risk managers need to work with the front office to align risk factor selection with trading strategies, without compromising independence of the risk process.

Figure 5 BTC/Futures basis difference between generic and active contracts

Intraday Value

The highly volatile nature of crypto currencies requires another consideration for VaR calculations. A typical risk process is run at the end of the day and VaR is calculated for a one-day or longer forecasting period. But Crypto currencies, especially Bitcoin, can also show significant intraday price movements. 

We obtained intraday prices for Bitcoin (BTC) from Gemini, which is ranked third by volume. This data was normalized to create time series to generate historical simulations. The chart below shows VaR as a percentage of market value for Bitcoin (BTC) for one-minute, one-hour and one-day forecasting periods. Our analysis shows that a Bitcoin position can lose as much as 3.5% of its value in one hour (99th percentile VaR).

 

Risk-Based Limits 

Right from the inception of Value at Risk as a concept it has been used by companies to manage limits for a trading unit. VaR serves as a single risk-based limit metric with several advantages and a few challenges:

Pros of using VaR for risk-based limit:

  • VaR can be applied across all levels of portfolio aggregation.
  • Aggregations can be applied across varying exposures and strategies.
  • Today’s cloud scale makes it easy to calculate VaR using granular risk factor data.

VaR can be subject to model risk and manipulation. Transparency and use of market risk factors can avoid this pitfall.

Ability to calculate intra-day VaR is key for a risk-based limit implementation for crypto assets. Risk managers should consider at least an hourly VaR limit in addition to the traditional daily limits.

VaR Validation (Bayesian Approach)

Standard approaches for back-testing VaR are applicable to portfolios of crypto assets as well.

Given the volatile nature of this asset class, we also explored an approach to validating the confidence interval and percentiles implied from historical simulations. Although this is a topic that deserves its own document, we present a high-level explanation and results of our analysis.

Building an approach first proposed in the Pyfolio library, we generated a posterior distribution using the Pymc3 package from our historically observed VaR simulations.

Sampling routines from Pymc3 were used to generate 10,000 simulations of the 3-year lookback case. A distribution of percentiles (VaR) was then computed across these simulations.

The distribution shows that the mean 95th percentile VaR would be 7.3% vs 8.9% calculated using the historical simulation approach. However, the tail of the distribution indicates a VaR closer to the historical simulation approach. One could conclude that the test indicates that the original calculation still represents the extreme case, which is the motivation behind VaR.

Figure 6 Distribution of percentiles generated from posterior simulations

Scenario Analysis

In addition to standard shock scenarios, we recommend using the volatility of Bitcoin to construct a simulation of outcomes. The chart below shows the change in Bitcoin (BTC) volatility for select events in the last two years. Outside of standard macro events, crypto assets respond to cyber security events and media effects, including social media.

Figure 7 Weekly observed volatility for Bitcoin  

Conclusion

Given the volatility of crypto assets, we recommend, to the extent possible, a probability distribution approach. At the very least, risk managers should monitor changes in relationship (beta) of assets.

For most financial institutions, crypto assets are part of portfolios that include other traditional asset classes. A standard approach must be used across all asset classes, which may make it challenging to apply shorter lookback windows for computing VaR. Use of the exponentially weighted moving approach (described above) may be considered.

Intraday VaR for this asset class can be significant and risk managers should set appropriate limits to manage downward risk.

Idiosyncratic risks associated with this asset class have created a need for monitoring scenarios not necessarily applicable to other asset classes. For this reason, more scenarios pertaining to cyber risk are beginning to be applied across other asset classes.  

CONTACT US TO LEARN MORE!

Related Article

Calculating VaR: A Review of Methods


RiskSpan Named to Inaugural STORM50 Ranking by Chartis Research – Winner of “A.I. Innovation in Capital Markets”

Chartis Research has named RiskSpan to its Inaugural “STORM50” Ranking of leading risk and analytics providers. The STORM report “focuses on the computational infrastructure and algorithmic efficiency of the vast array of technology tools used across the financial services industry” and identifies industry-leading vendors that excel in the delivery of Statistical Techniques, Optimization frameworks, and Risk Models of all types. 

RiskSpan’s flagship Edge Platform was a natural fit for the designation because of its positioning squarely at the nexus of statistical behavioral modeling (specifically around mortgage credit and prepayment risk) and functionality enabling users to optimize trading and asset management strategies.  Being named the winner of the “A.I. Innovation in Capital Markets” solutions category reflects the work of RiskSpan’s vibrant innovation lab, which includes researching and developing machine learning solutions to structured finance challenges. These solutions include mining a growing trove of alternative/unstructured data sources, anomaly detection in loan-level and other datasets, and natural language processing for constructing deal cash flow models from legal documents.

Learn more about the Edge Platform or contact us to discuss ways we might help you modernize and improve your mortgage and structured finance data and analytics challenges. 


Automating Compliance Risk Analytics

 Recorded: August 4th | 1:00 p.m. EDT

Completing the risk sections of Form PF, AIFMD, Open Protocol and other regulatory filings requires submitters to first compute an extensive battery of risk analytics, often across a wide spectrum of trading strategies and instrument types. This “pre-work” is both painstaking and prone to human error. Automating these upstream analytics greatly simplifies life downstream for those tasked with completing these filings.

RiskSpan’s Marty Kindler walks through a process for streamlining delta equivalent exposure, 10 year bond equivalent exposure, DV01/CS01, option greeks, stress scenario impacts and VaR in support not only of downstream regulatory filings but of an enhanced, overall risk management regime.


Featured Speaker

Martin Kindler

Managing Director, RiskSpan


Is Your Enterprise Risk Management Keeping Up with Recent Regulatory Changes?

Recorded: June 30th | 1:00 p.m. EDT

Nick Young, Head of RiskSpan’s Model Risk Management Practice, and his team of model validation analysts walk through the most important regulatory updates of the past 18 months from the Federal Reserve, OCC, and FDIC pertaining to enterprise risk management in general (and model risk management in particular).

Nick’s team present tips for ensuring that your policies and practices are keeping up with recent changes to AML and other regulatory requirements.


Featured Speakers

Nick Young

Head of Model Risk Management, RiskSpan


Data & Machine Learning Workshop Series

RiskSpan’s Edge Platform is supported by a dynamic team of professionals who live and breathe mortgage and structured finance data. They know firsthand the challenges this type of data presents and are always experimenting with new approaches for extracting maximum value from it.

Join us for a complimentary series of virtual workshops where RiskSpan professionals share what we’re learning about applying machine learning and other innovative techniques to data that asset managers, broker-dealers and mortgage bankers care about.

Catch up with these previously recorded workshops


Measuring and Visualizing Feature Impact & Machine Learning Model Materiality

RiskSpan CIO Suhrud Dagli demonstrates in greater detail how machine learning can be used in input data validations, to measure feature impact, and to visualize how multiple features interact with each other.

Structured Data Extraction from Images Using Google Document AI

RiskSpan Director Steven Sun shares a procedural approach to tackling the difficulties of efficiently extracting structured data from images, scanned documents, and handwritten documents using Google’s latest Document AI Solution.

Pattern Recognition in Time Series Data

Traders and investors rely on time series patterns generated by asset performance to inform and guide their trading and asset allocation decisions. Economists take advantage of analogous patterns in macroeconomic and market data to forecast recessions and other market events. But you need to be able to spot these patterns in order to use them.

Advanced Forecasting Using Hierarchical Models

Traditional statistical models apply a single set of coefficients by pooling a large dataset or for specific cohorts. Hierarchical models learn from feature behavior across dimensions or timeframes. This informative workshop applies hierarchical models to a variety of mortgage and structured finance use cases.

Quality Control with Anomaly Detection (Part I)

Outliers and anomalies refer to various types of occurrences in a time series. Spike of value, shift in level or volatility or a change in seasonal pattern are common examples.  RiskSpan Co-Founder & CIO Suhrud Dagli is joined by Martin Kindler, a market risk practitioner who has spent decades dealing with outliers.

Quality Control with Anomaly Detection (Part 2)

Suhrud Dagli presents Part 2 of this workshop, which dove into mortgage loan QC and introduce coding examples and approaches for avoiding false negatives using open-source Python algorithms in the Anomaly Detection Toolkit (ADTK).


Mortgage DQs by MSA: Non-Agency Performance Chart of the Month

This month we take a closer look at geographical differences in loan performance in the non-agency space. The chart below looks at the 60+ DPD Rate for the 5 Best and Worst performing MSAs (and the overall average). A couple of things to note:

  • The pandemic seems to have simply amplified performance differences that were already apparent pre-covid. The worst performing MSAs were showing mostly above-average delinquency rates before last year’s disruption.
  • Florida was especially hard-hit. Three of the five worst-performing MSAs are in Florida. Not surprisingly, these MSAs rely heavily on the tourism industry.
  • New York jumped from being about average to being one of the worst-performing MSAs in the wake of the pandemic. This is not surprising considering how seriously the city bore the pandemic’s brunt.
  • Tech hubs show strong performance. All our best performers are strong in the Tech industry—Austin’s the new Bay Area, right?
Contact Us

May 19 Workshop: Quality Control Using Anomaly Detection (Part 2)

Recorded: May 19 | 1:00 p.m. ET

Last month, RiskSpan’s Suhrud Dagli and Martin Kindler outlined the principles underlying anomaly detection and its QC applications related to market data and market risk. You can view a recording of that workshop here.

On Wednesday, May 19th, Suhrud presented Part 2 of this workshop, which dove into mortgage loan QC and introduce coding examples and approaches for avoiding false negatives using open-source Python algorithms in the Anomaly Detection Toolkit (ADTK).

RiskSpan presents various types of detectors, including extreme studentized deviate (ESD), level shift, local outliers, seasonal detectors, and volatility shift in the context of identifying spike anomalies and other inconsistencies in mortgage data. Specifically:

  • Coding examples for effective principal component analysis (PCA) loan data QC
  • Use cases around loan performance and entity correction, and
  • Novelty detection

Suhrud Dagli

Co-founder and CIO, RiskSpan

Martin Kindler

Managing Director, RiskSpan



Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found.  

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

  • Complete – no true gaps when looking back historically.
  • Accurate
  • Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

  • Spikes
  • Stale data
  • Missing data
  • Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets​. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days​.

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap​. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

  • Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
  • Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
  • Local Outliers: detects spikes in near values.
  • Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
  • Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

 

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

  • Coding examples
    • Application of outlier detection and pipelines
    • PCA
  • Specific loan use cases
    • Loan performance
    • Entity correction
  • Novelty Detection
    • Anomalies are not always “bad”
    • Market monitoring models

You can register for this complimentary workshop here.


Leveraging ML to Enhance the Model Calibration Process

Last month, we outlined an approach to continuous model monitoring and discussed how practitioners can leverage the results of that monitoring for advanced analytics and enhanced end-user reporting. In this post, we apply this idea to enhanced model calibration.

Continuous model monitoring is a key part of a modern model governance regime. But testing performance as part of the continuous monitoring process has value that extends beyond immediate governance needs. Using machine learning and other advanced analytics, testing results can also be further explored to gain a deeper understanding of model error lurking within sub-spaces of the population.

Below we describe how we leverage automated model back-testing results (using our machine learning platform, Edge Studio) to streamline the calibration process for our own residential mortgage prepayment model.

The Problem:

MBS prepayment models, RiskSpan’s included, often provide a number of tuning knobs to tweak model results. These knobs impact the various components of the S-curve function, including refi sensitivity, turnover lever, elbow shift, and burnout factor.

The knob tuning and calibration process is typically messy and iterative. It usually involves somewhat-subjectively selecting certain sub-populations to calibrate, running back-testing to see where and how the model is off, and then tweaking knobs and rerunning the back-test to see the impacts. The modeler may need to iterate through a series of different knob selections and groupings to figure out which combination best fits the data. This is manually intensive work and can take a lot of time.

As part of our continuous model monitoring process, we had already automated the process of generating back-test results and merging them with actual performance history. But we wanted to explore ways of taking this one step further to help automate the tuning process — rerunning the automated back-testing using all the various permutations of potential knobs, but without all the manual labor.

The solution applies machine learning techniques to run a series of back-tests on MBS pools and automatically solve for the set of tuners that best aligns model outputs with actual results.

We break the problem into two parts:

  1. Find Cohorts: Cluster pools into groups that exhibit similar key pool characteristics and model error (so they would need the same tuners).

TRAINING DATA: Back-testing results for our universe of pools with no model tuning knobs applied

  1. Solve for Tuners: Minimize back-testing error by optimizing knob settings.

TRAINING DATA: Back-testing results for our universe of pools under a variety of permutations of potential tuning knobs (Refi x Turnover)

  1. Tuning knobs validation: Take optimized tuning knobs for each cluster and rerun pools to confirm that the selected permutation in fact returns the lowest model errors.

Part 1: Find Cohorts

We define model error as the ratio of the average modeled SMM to the average actual SMM. We compute this using back-testing results and then use a hierarchical clustering algorithm to cluster the data based on model error across various key pool characteristics.

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by either merging or splitting observations successively. The hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the root cluster that contains all samples, while the leaves represent clusters with only one sample. [1]

Agglomerative clustering is an implementation of hierarchical clustering that takes the bottom-up approach (merging approach). Each observation starts in its own cluster, and clusters are then successively merged together. There are multiple linkage criteria that could be chosen from. We have used Ward linkage criteria.

Ward linkage strategy minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach.[2]

Part 2: Solving for Tuners

Here our training data is expanded to be a set of back-test results to include multiple results for each pool under different permutations of tuning knobs.  

Process to Optimize the Tuners for Each Cluster

Training Data: Rerun the back-test with permutations of REFI and TURNOVER tunings, covering all reasonably possible combinations of tuners.

  1. These permutations of tuning results are fed to a multi-output regressor, which trains the machine learning model to understand the interaction between each tuning parameter and the model as a fitting step.
    • Model Error and Pool Features are used as Independent Variables
    • Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT)* methods are used to find the optimized tuning parameters for each cluster of pools derived from the clustering step
    • Two dependent variables — Refi Tuner and Turnover Tuner – are used
    • Separate models are estimated for each cluster
  2. We solve for the optimal tuning parameters by running the resulting model with a model error ratio of 1 (no error) and the weighted average cluster features.

* Gradient Tree Boosting/Gradient Boosted Decision Trees (GBDT) is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is a weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of arbitrary differentiable loss function. [3]

*We used scikit-learn’s GBDT implementation to optimize and solve for best Refi and Turnover tuner. [4]

Results

The resultant suggested knobs show promise in improving model fit over our back-test period. Below are the results for two of the clusters using the knobs that suggested by the process. To further expand the results, we plan to cross-validate on out-of-time sample data as it comes in.

Conclusion

These advanced analytics show promise in their ability to help streamline the model calibration and tuning process by removing many of the time-consuming and subjective components from the process altogether. Once a process like this is established for one model, applying it to new populations and time periods becomes more straightforward. This analysis can be further extended in a number of ways. One in particular we’re excited about is the use of ensemble models—or a ‘model of models’ approach. We will continue to tinker with this approach as we calibrate our own models and keep you apprised on what we learn.


Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.  

Machine learning’s rise in popularity is attributable to multiple underlying trends: 

  1. Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention. 
  2. Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.  
  3. Cheap computation costsNew techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost. 
  4. Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development. 

Addressing Monitoring Challenges 

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring: 

  1. Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change. 
  2.  Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well. 
  3. Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available. 

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models. 


Get Started