Linkedin    Twitter   Facebook

Get Started
Get a Demo
Articles Tagged with: Model Validation

Automating Compliance Risk Analytics

 Recorded: August 4th | 1:00 p.m. EDT

Completing the risk sections of Form PF, AIFMD, Open Protocol and other regulatory filings requires submitters to first compute an extensive battery of risk analytics, often across a wide spectrum of trading strategies and instrument types. This “pre-work” is both painstaking and prone to human error. Automating these upstream analytics greatly simplifies life downstream for those tasked with completing these filings.

RiskSpan’s Marty Kindler walks through a process for streamlining delta equivalent exposure, 10 year bond equivalent exposure, DV01/CS01, option greeks, stress scenario impacts and VaR in support not only of downstream regulatory filings but of an enhanced, overall risk management regime.

Featured Speaker

Martin Kindler

Managing Director, RiskSpan

Is Your Enterprise Risk Management Keeping Up with Recent Regulatory Changes?

Recorded: June 30th | 1:00 p.m. EDT

Nick Young, Head of RiskSpan’s Model Risk Management Practice, and his team of model validation analysts walk through the most important regulatory updates of the past 18 months from the Federal Reserve, OCC, and FDIC pertaining to enterprise risk management in general (and model risk management in particular).

Nick’s team present tips for ensuring that your policies and practices are keeping up with recent changes to AML and other regulatory requirements.

Featured Speakers

Nick Young

Head of Model Risk Management, RiskSpan

Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.  

Machine learning’s rise in popularity is attributable to multiple underlying trends: 

  1. Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention. 
  2. Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.  
  3. Cheap computation costsNew techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost. 
  4. Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development. 

Addressing Monitoring Challenges 

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring: 

  1. Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change. 
  2.  Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well. 
  3. Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available. 

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models. 

Value Beyond Validation: The Future of Automated Continuous Model Monitoring Has Arrived

Imagine the peace of mind that would accompany being able to hand an existing model over to the validators with complete confidence in how the outcomes analysis will turn out. Now imagine being able to do this using a fully automated process.

The industry is closer to this than you might think.

The evolution of ongoing model monitoring away from something that happens only periodically (or, worse, only at validation time) and toward a more continuous process has been underway for some time. Now, thanks to automation and advanced process design, this evolutionary process has reached an inflection point. We stand today at the threshold of a future where:

  • Manual, painful processes to generate testing results for validation are a thing of the past;
  • Models are continuously monitored for fit, and end users are empowered with the tools to fully grasp model strengths and weaknesses;
  • Modeling and MRM experts leverage machine learning to dive more deeply into the model’s underlying data, and;
  • Emerging trends and issues are identified early enough to be addressed before they have time to significantly hamper model performance.

Sound too good to be true? Beginning with its own internally developed prepayment and credit models, RiskSpan data scientists are laying out a framework for automated, ongoing performance monitoring that has the potential to transform behavioral modeling (and model validation) across the industry.

The framework involves model owners working collaboratively with model validators to create recurring processes for running previously agreed-upon tests continuously and receiving the results automatically. Testing outcomes continuously increases confidence in their reliability. Testing them automatically frees up high-cost modeling and validation resources to spend more time evaluating results and running additional, deeper analyses.

The Process:

Irrespective of the regulator, back-testing, benchmarking, and sensitivity analysis are the three pillars of model outcomes analysis. Automating the data and analytical processes that underlie these three elements is required to get to a fully comprehensive automated ongoing monitoring scheme.

In order to be useful, the process must stage testing results in a central database that can:

  • Automatically generate charts, tables, and statistical tests to populate validation reports;
  • Support dashboard reporting that allows model owners, users and validators to explore test results, and;
  • Feed advanced analytics and machine learning platforms capable of 1) helping with automated model calibration, and 2) identifying model weaknesses and blind spots (as we did with a GSE here).

Perhaps not surprisingly, achieving the back-end economies of a fully automated continuous monitoring and reporting regime requires an upfront investment of resources. This investment takes the form of time from model developers and owners as well as (potentially) some capital investment in technology necessary to host and manage the storage of results and output reports.

A good rule of thumb for estimating these upfront costs is between 2 and 3 times the cost of a single annual model test performed on an ad-hoc, manual basis. Consequently, the automation process can generally be expected to pay for itself (in time savings alone) over 2 to 3 cycles of performance testing. But the benefits of automated, continuous model monitoring go far beyond time savings. They invariably result in better models.

Output Applications

Continuous model monitoring produces benefits that extend well beyond satisfying model governance requirements. Indeed, automated monitoring has significantly informed the development process for RiskSpan’s own, internally developed credit and prepayment models – specifically in helping to identify sub-populations where model fit is a problem.

Continuous monitoring also makes it possible to quickly assess the value of newly available data elements. For example, when the GSEs start releasing data on mortgages with property inspection waivers (PIWs) (as opposed to traditional appraisals) we can immediately combine that data element with the results of our automated back-testing to determine whether the PIW information can help predict model error from those results. PIW currently appears to have value in predicting our production model error, and so the PIW feature is now slated to be added to a future version of our model. Having an automated framework in place accelerates this process while also enabling us to proceed with confidence that we are only adding variables that improve model performance.

The continuous monitoring results can also be used to develop helpful dashboard reports. These provide model owners and users with deeper insights into a model’s strengths and weaknesses and can be an important tool in model tuning. They can also be shared with model validators, thus facilitating that process as well.

The dashboard below is designed to give our model developers and users a better sense of where model error is greatest. Sub-populations with the highest model error are deep red. This makes it easy for model developers to visualize that the model does not perform well when FICO and LTV data are missing, which happens often in the non-agency space. The model developers now know that they need to adjust their modeling approach when these key data elements are not available.

The dashboard also makes it easy to spot performance disparities by shelf, for example, and can be used as the basis for applying prepayment multipliers to certain shelves in order to align results with actual experience.

Continuous model monitoring is fast becoming a regulatory expectation and an increasingly vital component of model governance. But the benefits of continuous performance monitoring go far beyond satisfying auditors and regulators. Machine learning and other advanced analytics are also proving to be invaluable tools for better understanding model error within sub-spaces of the population.

Watch this space for a forthcoming post and webinar explaining how RiskSpan leverages its automated model back-testing results and machine learning platform, Edge Studio, to streamline the calibration process for its internally developed residential mortgage prepayment model.

Validating Structured Finance Models

Introduction: Structured Finance Models

Models used to govern the valuation and risk management of structured finance instruments take a variety of forms. Unlike conventional equity investments, structured finance instruments are often customized to meet the unique needs of specific investors. They are tailored to mitigate various types of risks, including interest rate risk, credit risk, market risk and counterparty risks. Therefore, structured finance instruments may be derived from a compilation of loans, stocks, indices, or derivatives. Mortgage-backed securities (MBS) are the most ubiquitous example of this, but structured finance instruments also include:

  • Derivatives
  • Collateralized Mortgage Obligations (CMO)
  • Collateralized Bond Obligations (CBO)
  • Collateralized Debt Obligations (CDO)
  • Credit Default Swaps (CDS)
  • Hybrid Securities

Pricing and measuring the risk of these instruments is typically carried out using an integrated web of models. One set of models might be used to derive a price based on discounted cash flows. Once cash flows and corresponding discounting factors have been established, other models might be used to compute risk metrics (duration and convexity) and financial metrics (NII, etc.).

These models can be grouped into three major categories:

  • Curve Builder and Rate Models: Market rates are fundamental to valuing most structured finance instruments. Curve builders calibrate market curves (treasury yield curve, Libor/Swap Rate curve, or SOFR curve) using the market prices of the underlying bond, future, or swap. Interest rate models take the market curve as an input and generate simulated rate paths as the future evolution of the selected type of the market curve.

  • Projection Models: Using the market curve (or the single simulated rate path), a current coupon projection model projects forward 30-year and 15-year fixed mortgage rates. Macroeconomic models project future home values using a housing-price index (HPI). Prepayment models estimate how quickly loans are likely to pay down based on mortgage rate projections and other macroeconomic projections. And roll-rate models forecast the probability of a loan’s transitioning from one current/default state to another.

  • Cash Flow Models and Risk Metrics: Cash flow models combine the deal information of the underlying structured instrument with related rate projections to derive an interest-rate-path-dependent cash flow.

The following illustrates how the standard discounted cash flow approach works for a mortgage-related structured finance instrument:

Most well-known analytic solutions apply this discounted cash flow approach, or some adaptation of it, in analyzing structured finance instruments.

Derivatives introduce an additional layer of complexity that often calls for approaches and models beyond the standard discounted cash flow approach. Swaption and interest rate cap and floors, for example, require a deterministic approach, such as the Black model. For bond option pricing, lattice models or tree structures are commonly used. The specifics of these models are beyond the scope of this presentation, but many of the general model validation principles applied to discounted cash flow models are equally applicable to derivative pricing models.

Validating Curve Builder and Rate Models

Curve Builders

Let’s begin with the example of a curve builder designed for calibrating the on-the-run U.S. Treasury yield curve. To do this, the model takes a list of eligible on-the-run Treasury bonds as the key model inputs, which serves as the fitting knots[1]. A proper interpolator that connects all the fitting knots is then used to smooth the curve and generate monthly or quarterly rates for all maturities up to 30 years. If abnormal increments or decrements are observed in the calibrated yield curve, adjustments are made to alleviate deviations between the fitting knots until the fitted yield curve is stable and smooth. A model validation report should include a thorough conceptual review of how the model carries out this task.

Based on the market-traded securities selected, the curve builder is able to generate an on or off-the-run Treasury yield as well as LIBOR swap curve SOFR curve, or whatever is needed. The curve builder serves as the basis for measuring nominal and option‐adjusted spreads for many types of securities and for applying spreads whenever spread is used to determine model price.

A curve builder’s inputs are therefore a set of market-traded securities. To validate the inputs, we take the market price of the fitting knots for three month-end trading dates and compare them against the market price inputs used in the curve builder. We then calibrate the par rate and spot rate based on the retrieved market price and compare it with the fitted curve generated from the curve builder.

To validate curve builder’s model structure and development, we check the internal transition between the model-provided par rate, spot rate and forward rate on three month-end trading dates. Different compounding frequencies can significantly impact these transitions. We also review the model’s assumptions, limitations and governance activities established by the model owner.

Validating model outputs usually begins by benchmarking the outputs against a similar curve provided by Bloomberg or another reputable challenger system. Next, we perform a sensitivity analysis to check the locality and stability of the forward curve by shocking the input fitting knots and analyzing its impact on the model-provided forward curve. For large shocks (i.e., 300 bp or more) we test boundary conditions, paying particular attention to the forward curve. Normally, we expect to see forwards not becoming negative, as this would breach no-arbitrage conditions.

For the scenario analysis, we test the performance of the curve builder during periods of stress and other significant events, including bond market movement dates, Federal Open Market Committee (FOMC) dates and treasury auction dates. The selected dates cover significant events for Treasury/bond markets and provide meaningful analysis for the validation.

Interest Rate Models

An interest rate model is a mathematical model that is mainly used to describe the future evolution of interest rates. Its principal output is a simulated term structure, which is the fundamental component of a Monte Carlo simulation. Interest rate models typically fall into one of two broad categories:

  • Short-rate models: A short-rate model describes the future evolution of the short rate (instantaneous spot rate, usually written).
  • LIBOR Market Model (LMM): An LMM describes the future evolution of the forward rate, usually written. Unlike the instantaneous spot rate, forward rates can be observed directly from the market, as can their implied volatility.

This blog post provides additional commentary around interest rate model validations.

Conceptual soundness and model theory reviews are conducted based on the specific interest rate model’s dynamics. The model inputs, regardless of the model structure selected, include the selected underlying curve and its corresponding volatility surface as of the testing date. We normally benchmark model inputs against market data from a challenger system and discuss any observed differences.

We then examine the model’s output, which is the set of stochastic paths comprising a variety of required spot rates or forward LIBOR and swap rates, as well as the discount factors consistent with the simulated rates. To check the non-arbitrage condition in the simulated paths, we compare the mean and median path with the underlying curve and comment on the differences. We measure the randomness from the simulated paths and compare it against the interest rate model’s volatility parameter inputs.

Based on the simulated paths, an LMM also provides calibrated ATM swaption volatility. We compare the LMM’s implied ATM swaption volatility with its inputs and the market rates from the challenger system as a review of the model calibration. For the LMM, we also compare the model against history on the correlation of forward swap rates and serial correlation of a forward LIBOR rate. An LMM allows a good choice of structures that generate realistic swap rates, whose correlation is consistent with historical value.

Validating Projection Models

Projection models come in various shapes and sizes.

“Current Coupon” Models

Current coupon models generate mortgage rate projections based on a market curve or a single simulated interest rate path. These projections are a key driver to prepayment projection models and mortgage valuation models. There are a number of model structures that can explain the current coupon projection, ranging from the simple constant-spread method to the recursive forward-simulation method. Since it has been traditionally assumed that the ten-year part of the interest rate curve drives mortgage rates, a common assumption involves holding the spread between current coupon and the ten-year swap or treasury rates constant. However, this simple and intuitive approach has a basic problem: primary market mortgage rates nowadays depend on secondary-market MBS current-coupon yields. Hence, current coupon depends not just on the ten-year part of the curve, but also on other factors that affect MBS current-coupon yields. Such factors include:

  • The shape of the yield curve
  • Tenors on the yield curve
  • Volatilities

A conceptual review of current coupon models includes a discussion around the selected method and comparisons with alternative approaches. To validate model inputs, we focus on the data transition procedures between the curve builder and current coupon model or between the interest rate model and the current coupon model. To validate model outputs, we perform a benchmarking analysis against projections from a challenger approach. We also perform back-testing to measure the differences between model projections and actual data over a testing period, normally 12 months. We use mean absolute error (MAE) to measure the back-testing results. If the MAE is less than 0.5%, we conclude that the model projection falls inside the acceptable range. For the sensitivity analysis, we examine the movements of the current coupon projection under various shock scenarios (including key-rate shocks and parallel shifting) on the rate inputs.

Prepayment Models

Prepayment models are behavioral models that help investor understand and forecast loan portfolio’s likely prepayment behavior and identify the corresponding major drivers.

The prepayment model’s modeling structure is usually econometric in nature. It assumes that the same set of drivers that affected prepayment and default behavior in the past will drive them in the future under all scenarios, even though the period in the past that is most applicable may vary by scenario in the future.

Major drivers are identified and modeled separately as a function of collateral characteristics and macroeconomic variables. Each type of prepayment effect is then scaled based on the past prepayment and default experience of similar collateral. Assumed is that if the resulting model can explain and reasonably fit historical prepayments, then it may be a good model to project the future, subject to a review of the future projections after careful assessment.

Prepayment effects normally include housing turnover, refinancing and burnout[2]. Each prepayment effect is modeled separately and then combined together. A good conceptual review of prepayment modeling methodology will discuss the mathematical fundamentals of the model, including an assessment of the development procedure for each prepayment effect and comparisons with alternative statistical approaches.

Taking for example a model that projects prepayment rates on tradable Agency mortgage collateral (or whole-loan collateral comparable to Agencies) from settlement date to maturity, development data includes the loan-level or pool-level transition data originally from Fannie Mae, Freddie Mac, Ginnie Mae and third-party servicers. Data obtained from third parties is marked as raw data. We review the data processing procedures used to get from the raw data to the development data. These procedures include reviewing data characteristics, data cleaning, data preparation and data transformation processes.

After the development data preparation, variable selection and loan segmentation become key to explaining each prepayment effect. Model developers seek to select a set of collateral attributes with clear and constant evidence of impact to the given prepayment effect. We validate the loan segmentation process by checking whether the historical prepayment rate from different loan segments demonstrates level differences based on the set of collateral attributes selected.

A prepayment model’s implementation process is normally a black box. This increases the importance of the model output review, which includes performance testing, stress testing, sensitivity analysis, benchmarking and back-testing. An appropriate set of validation tests will capture:

  • Sensitivity to collateral and borrower characteristics (loan-to-value, loan size, etc.)
  • Sensitivity to significant assumptions
  • Benchmarking of prepayment projections
  • Performance during various historical events
  • Back-testing
  • Scenario stability
  • Model projections compared with projections from dealers
  • Performance by different types of mortgages, including CMOs and TBAs

A prepayment model sensitivity analysis might take a TBA security and gradually change the value of input variables, one at a time, to isolate the impact of each variable. This procedure provides an empirical understanding of how the model performs with respect to parameter changes. If the prepayment model has customized tuning functionality, we can apply the sensitivity analysis independently to each prepayment effect by setting the other tuning parameters at zero.

For the benchmarking analysis, we compare the model’s cohort-level, short-term conditional prepayment rate (CPR) projection against other dealer publications, including Barclays and J.P. Morgan (as applicable and available). We also compare the monthly CPR projections against those of the challenger model, such as Bloomberg Agency Model (BAM), for the full stack Agency TBA and discuss the difference. Discrepancies identified during the course of a benchmarking analysis may trigger further investigation into the model’s development. However, it doesn’t necessarily mean that the underlying model is in error since the challenger model itself is simply an alternative projection. Differences might be caused by any number of factors, including different development data or modeling methodologies.

Prepayment model back-testing involves selecting a set of market-traded MBS and a set of hypothetical loan cohorts and comparing the actual monthly CPR against the projected CPR over a prescribed time window (normally one year). Thresholds should be established prior to testing and differences that exceed these thresholds should be investigated and discussed in the model validation report.

Validating Cash Flow Models and Risk Metrics

A cash flow model combines the simulated paths from interest rate, prepayment, default, and delinquency models to compute projected cash flows associated with monthly principal and interest payments.

Cash flow model inputs include the underlying instrument’s characteristics (e.g., outstanding balance, coupon rate, maturity date, day count convention, etc.) and the projected vectors associated the CPR, default rate, delinquency, and severity (if applicable). A conceptual review of a cash flow model involves a verification of the data loading procedure to ensure that the instrument’s characteristics are captured correctly within the model. It should also review the underlying mathematical formulas to verify the projected vectors are correctly applied.

Model outputs can be validated via sensitivity analysis. This often involves shocking each input variable, one at a time, and examining its resulting impacts on the monthly remaining balance. Benchmarking can be accomplished by developing a challenger model and compare the resulting cash flows.

Combining the outputs of all the sub-models, a price of the underlying structured finance instrument can be generated (and tested) along with its related risk metrics (duration, convexity, option adjusted spread, etc.).

Using MBS as an example, an option adjusted spread (OAS) analysis is commonly used. Theoretically, OAS is calibrated by matching the model price with the market price. The OAS can be viewed as a constant spread that is applied to the discounting curve when computing the model price. Because it deals with the differences between model price and market price, OAS is particularly useful in MBS valuation. It is particularly helpful in measuring prepayment risk and market risk. A comprehensive analysis reviews the following:

  • Impact of interest rate shocks on a TBA stack in terms of price, OAS, effective duration, and effective convexity.
  • Impact of projected prepayment rate shock on a TBA stack in terms of price, OAS, effective duration, and effective convexity.
  • Impact of projected prepayment rate shock on the option cost (measured as basis point, zero-volatility spread minus OAS).

Beyond OAS, the validation should include independent benchmarking of the model price. Given a sample portfolio that contains the deal information for a list of structured finance instruments, validators derive a model price using the same market rate as the subject model as a basis for comparison. Analyzing the shock profiles enables validators to conclude whether the given discounting cash flow method is generating satisfactory model performance.


Structured finance model validations are complex because they invariably involve testing a complicated array of models, sub-models, and related models. The list of potential sub-models (across all three categories discussed above) significantly exceeds the examples cited.

Validators must design validation tasks specific to each model type in order to adequately assess the risks posed by potential shortcomings associated with model inputs, structure, theory, development, outputs and governance practices.

When it comes to models governing structured finance instruments, validators must identify any model risk not only at the independent sub-model level but at the broader system level, for which the final outputs include model price and risk metrics. This requires a disciplined and integrated approach.



[1] Knots represent a set of predefined points on the curve

[2] Burnout effect describes highly seasoned mortgage pools in which loans likely to repay have already done so, resulting in relatively slow prepayment speeds despite falling interest rates.


Why Model Validators Need to Care About the LIBOR Transition

The transition to the Secured Overnight Financing Rate (SOFR) as a LIBOR replacement after 2021 creates layers of risk for banks. Many of these risks are readily apparent, others less so. But the factors banks must consider while choosing replacement rates and correctly implementing contractual fallback language makes a seamless transition a daunting proposition. Though sometimes overlooked, model risk managers have an important role in ensuring this happens correctly and in a way that does not jeopardize the reliability of model outputs.   

LIBOR, SOFR and the need for transition

A quick refresher: The London Interbank Offered Rate (LIBOR) currently serves as the benchmark at which major global banks lend to one another on a short-term basis in the international interbank market. LIBOR is calculated by the Intercontinental Exchange (ICE) and is published daily. LIBOR is based on a combination of five currencies and seven maturities. The most common of these is the three-month U.S. Dollar rate.

Accusations of manipulation by major banks going back as early as 2008, however, raised concerns about the sustainability of LIBOR. A committee convened by the Federal Reserve Board and the Federal Reserve Bank of New York in 2017—the Alternative Reference Rates Committee (ARRC)—identified a broad Treasury repurchase agreement (repo) financing rate as its preferred alternative reference rate to replace LIBOR after 2021. This repo rate (now known as SOFR) was chosen for its ability to provide liquidity to underlying markets and because the volumes underlying SOFR are far larger than any other U.S. money market. This combination of size and liquidity contributes to SOFR’s transparency and protects market participants from attempts at manipulation.

What Does This Mean for MRM?

Because the transition has potential bearing on so many layers of risk—market risk, operational risk, strategic risk, reputation risk, compliance risk, not to mention the myriad risks associated with mispricing assets—any model in a bank’s existing inventory that is tasked with gauging or remediating these risks is liable to be impacted. Understanding how and the extent to which models are considering how LIBOR transition may affect pricing and other core processes are (or should be) of principal concern to model validators.

Ongoing Monitoring and Benchmarking

Regulatory guidance and model validation best practices require testing model inputs and benchmarking how the model performs with the selected inputs relative to alternatives. For this reason, the validation any model whose outputs are sensitive to variable interest rates should include an assessment of how a replacement index (such as SOFR) and adjustment methodology were selected.

Model validators should be able to ascertain whether the model developer has documented enough evidence relating to:

  • Available reference rates and the appropriateness of each to the bank’s specific products
  • System capabilities for using these replacement rates with the bank’s products.
  • Control risks associated with unavailable alternative rates

Fallback Language considerations:

Fallback language—contractual provisions that govern the process for selecting a replacement rate in the event of LIBOR termination—should also factor into a validator’s assessment of model inputs. While many existing fallback provisions can be frustratingly vague when it comes to dealing with a permanent cessation of LIBOR, validators of models that rely on reference rates as inputs have an obligation to determining compliance with fallback language containing clear and executable terms. These include:

  • Specific triggers to enact the replacement rate
  • Clarity regarding the replacement rate and spread adjustments
  • Permissible options under fallback language – and whether other options might be more appropriate than the one ultimately selected based on the potential for valuation changes, liquidity impact, hedging implications, system changes needed, and customer impact

In November 2019, the ARRC published the finalized fallback language for residential adjustable rate mortgages, bilateral business loans, floating rate notes, securitizations, and syndicated loans. It has also actively engaged with the International Swap Derivatives Association (ISDA) to finalize the fallback parameters for derivatives.

The ARRC also recommended benchmark replacement rates adjusted for spread that would replace the current benchmark due to circumstances that trigger the replacement. The recommendation included the following benchmark replacement waterfalls. Validators of models relying on these replacements may choose, as part of their best practices review, to determine the extent to which existing fallback provisions align with the recommendations.

Replacement Description
Term SOFR + spread adjustment Forward-looking term SOFR for the applicable corresponding tenor. Note: Loan recommendations allow use of the next longest tenor term SOFR rate if the corresponding tenor is unavailable  
Compounded SOFR + spread Adjustment Compounded average of daily SOFRs over the relevant period depending on the tenor of USD LIBOR being replaced
Relevant selected rate + spread adjustment   Rate selected by the Relevant Governmental Body, lender, or borrower & administrative agent
Relevant ISDA replacement rate + spread adjustment The applicable replacement rate (without spread adjustment) that is embedded in ISDA’s standard definitions  
Issuer, designated transaction representative or noteholder replacement + spread adjustment An identified party will select a replacement rate, in some cases considering any industry-accepted rate in the related market. Note: in certain circumstances this step could be omitted

Model risk managers can sometimes be lulled into believing that the validation of interest rate inputs consists solely of verifying their source and confirming that they have been faithfully brought into the model. Ultimately, however, model validators are responsible for verifying not only the provenance of model inputs but also their appropriateness. Consequently, ensuring a smooth transition to the most appropriate available reference rate replacement is of paramount importance to risk management efforts related to the models these rates feed.


The information within this section has been taken directly from the [AR1]

Managing Machine Learning Model Risk

Though the terms are often used interchangeably in casual conversation, machine learning is a subset of artificial intelligence. Simply put, ML is the process of getting a computer to learn the properties of one dataset and generalizing this “knowledge” on other datasets.

ML Financial Models

ML models have crept into virtually every corner of banking and finance — from fraud and money-laundering prevention to credit and prepayment forecasting, trading, servicing, and even marketing. These models take various forms (see Table 1, below). Modelers base their selection of a particular ML technique on a model’s objective and data availability.   

Table 1. ML Models and Application in Finance

Model Application
Linear Regression Credit Risk; Forecasting
Logistic Regression Credit Risk
Monte Carlo Simulation Capital Market; (ALM)
Artificial Neutral Networks Score Card and AML
Decision Trees Regression Models (Random Forest, Bagging) Score Card
Multinomial Logistic Regression Prepayment Projection
Deep Learning Prepayment Projection
Time Series Model Capital Forecasting; Macroeconomics Forecasting Model
Linear Regression with ARIMA Errors Capital Forecasting
Factor Models Short Rate Evolution
Fuzzy Matching AML; OFAC
Linear Discriminant Analysis (LDA) AML; OFAC
K Means Clustering AML; OFAC


ML models require large datasets relative to conventional models as well as more sophisticated computer programing and econometric/statistical skills. ML model developers are required to have deep knowledge about the ML model they want to use, its assumptions and limitations, and alternative approaches.


ML Model Risk

ML models present many of the same risks that accompany conventional models. As with any model, errors in design or application can lead to performance issues resulting in financial losses, poor decisions, and damage to reputation.

ML is all about algorithms. Failing to understand the mathematical aspects of these algorithms can lead to adopting inefficient optimization algorithms without knowing the nature or the interpretation of the optimization being solved. Making decisions under these circumstances increases model risk and can lead to unreliable outputs.

As sometimes befalls conventional regression models, ML models may perform well on the training data but not on the test data. Their complexity and high dimensionality makes them especially susceptible to overfitting. The poor performance of some ML models when applied beyond the training dataset can translate into a huge source of risk.

Finally, ML models can give rise to unintended consequences when used inappropriately or incorrectly. Model risk is magnified when the goal of a ML model’s algorithm is not aligned with the business problem or doesn’t consider all relevant considerations of the business problem. Model risk also arises when an ML model is used outside the environment for which it was designed. These risks include overstated/understated model outputs and lack of fairness. Table 2, below, presents a more comprehensive list of these risks.

Table 2. Potential risk from ML models

Bias toward protected groups
Use of poor-quality data
Job displacement
Models may produce socially unacceptable results
Automation may create model governance issues


Managing ML Model Risk

managing ML model risk

It may seem self-evident, but the first step in managing ML model risk consists of reliably  identifying every model in the inventory that relies on machine learning. This exercise is not always as straightforward as it might seem. Successfully identifying all ML models requires MRM departments to incorporate the right information requests into their model determination or model assessment forms. These should include questions designed to identify specific considerations of ML model techniques, algorithms, platforms and capabilities. MRM departments need to adopt a consistent but flexible definition about what constitutes an ML model across the institution. Models developers, owners and users should be trained in identifying ML models and those features that need to be reported in the model identification assessment form.

MRM’s next step involves risk assessing ML models in the inventory. As with traditional models, ML models should be risk assessed based on their complexity, materiality and frequency of use. Because of their complexity, however, ML models require an additional level of screening in order to account for data structure, level of algorithm sophistication, number of hyper-parameters, and how the models are calibrated. The questionnaire MRM uses to assess the risk of its conventional models often needs to be enhanced in order to adequately capture the additional risk dimensions introduced by ML models.

Managing ML model risk also involves not only ensuring that a clear model development and implementation process is in place but also that it is consistent with the business objective and the intended use of the models. Thorough documentation is important for any model, but the need to describe model theory, methodology, design and logic takes on added importance when it comes to ML models. This includes specifying the methodology (regression or classification), the type of model (linear regression, logistic regression natural language processing, etc.), the resampling method (cross-validation, bootstrap) and the subset selection method such as backward, forward or stepwise selection. Obviously, simply stating that the model “relies on a variety of machine learning techniques” is not going to pass muster.

As with traditional models, developers must document the data source, quality and any transformations that are performed. This includes listing the data sources, normalization and sampling techniques, training and test data size, the data dimension reduction technique (principal component, partial least squares, etc.) as well as controls around them. An assessment of the risk around the utilization of certain data should also be assessed.

A model implementation plan and controls around the model should be also be developed.

Finally, all model performance testing should be clearly stated, and the results documented. This helps assess whether the model is performing as intended and in line with its design and business objective. Limitations and calibrations around the models should also be documented.

Like traditional models, ML models require independent validation to ensure they are sound and performing as intended and to identify potential limitations. All components of ML models should be subject to validation, including conceptual soundness, outcomes analysis and ongoing monitoring.

Validators can assess the conceptional soundness of an ML model by evaluating its design and construction, focusing on the theory, methodology, assumptions and limitations, data quality and integrity, hyper-parameter calibration and overlays, bias and interpretability.

Validators can assess outcomes analysis by checking whether the model outputs are appropriate and in line with a priori expectations. Results of the performance metrics should also be assessed for accuracy and degree of precision. Performance metrics for ML models vary by model type. Similar to traditional predictive models, common performance metrics for ML models include the mean-squared-error (MSE), Gini coefficient, entropy, the confusion matrix, and the receiver operating characteristic (ROC) curve.

Outcomes analysis should also include out-of-sample testing, which can be conducted using cross-validation techniques. Finally, ongoing monitoring should be reviewed as a core element of the validation process. Validators should evaluate whether model use is appropriate given changes in products, exposures and market conditions. Validators should also ensure performance metrics are being monitored regularly based on the inherent risk of the model and frequency of use. Validators should ensure that a continuous performance monitoring plan exists and captures the most important metrics. Also, a change control document and access control document should be available.  

The principles outlined above will sound familiar to any experienced model validator—even one with no ML training or experience. ML models do not upend the framework of MRM best practices but rather add a layer of complexity to their implementation. This complexity requires MRM departments in many cases to adjust their existing procedures to property identify ML models and suitably capture the risk emerging from them. As is almost always the case, aggressive staff training to ensure that their well-considered process enhancements are faithfully executed and have their desired effect.       

September 30 Webinar: Machine Learning in Model Validation

Recorded: September 30th | 1:00 p.m. EDT

Join our panel of experts as they share their latest work using machine learning to identify and validate model inputs.

  • Suhrud Dagli, Co-Founder & Fintech Lead, RiskSpan
  • Jacob Kosoff, Head of Model Risk Management & Validation, Regions Bank
  • Nick Young, Head of Model Validation, RiskSpan
  • Sanjukta Dhar, Consulting Partner, Risk and Regulatory Compliance Strategic Initiative, TCS Canada

Featured Speakers


Suhrud Dagli

Co-Founder & Fintech Lead, RiskSpan

Jacob Kosoff

Head of Model Risk Management & Validation, Regions Bank


Nick Young

Head of Model Validation, RiskSpan

Sanjukta Dhar

Sanjukta Dhar

Consulting Partner, Risk and Regulatory Compliance Strategic Initiative, Tata Consulting

The Why and How of a Successful SAS-to-Python Model Migration

A growing number of financial institutions are migrating their modeling codebases from SAS to Python. There are many reasons for this, some of which may be unique to the organization in question, but many apply universally. Because of our familiarity not only with both coding languages but with the financial models they power, my colleagues and I have had occasion to help several clients with this transition.

Here are some things we’ve learned from this experience and what we believe is driving this change.

Python Popularity

The popularity of Python has skyrocketed in recent years. Its intuitive syntax and a wide array of packages available to aid in development make it one of the most user-friendly programming languages in use today. This accessibility allows users who may not have a coding background to use Python as a gateway into the world of software development and expand their toolbox of professional qualifications.

Companies appreciate this as well. As an open-source language with tons of resources and low overhead costs, Python is also attractive from an expense perspective. A cost-conscious option that resonates with developers and analysts is a win-win when deciding on a codebase.

Note: R is another popular and powerful open-source language for data analytics. Unlike R, however, which is specifically used for statistical analysis, Python can be used for a wider range of uses, including UI design, web development, business applications, and others. This flexibility makes Python attractive to companies seeking synchronicity — the ability for developers to transition seamlessly among teams. R remains popular in academic circles where a powerful, easy-to-understand tool is needed to perform statistical analysis, but additional flexibility is not necessarily required. Hence, we are limiting our discussion here to Python.

Python is not without its drawbacks. As an open-source language, less oversight governs newly added features and packages. Consequently, while updates may be quicker, they are also more prone to error than SAS’s, which are always thoroughly tested prior to release.


Visualization Capabilities

While both codebases support data visualization, Python’s packages are generally viewed more favorably than SAS’s, which tend to be on the more basic side. More advanced visuals are available from SAS, but they require the SAS Visual Analytics platform, which comes at an added cost.

Python’s popular visualization packages — matplotlib, plotly, and seaborn, among others — can be leveraged to create powerful and detailed visualizations by simply importing the libraries into the existing codebase.


SAS is a command-driven software package used for statistical analysis and data visualization. Though available only for Windows operating systems, it remains one of the most widely used statistical software packages in both industry and academia.

It’s not hard to see why. For financial institutions with large amounts of data, SAS has been an extremely valuable tool. It is a well-documented language, with many online resources and is relatively intuitive to pick up and understand – especially when users have prior experience with SQL. SAS is also one of the few tools with a customer support line.

SAS, however, is a paid service, and at a standalone level, the costs can be quite prohibitive, particularly for smaller companies and start-ups. Complete access to the full breadth of SAS and its supporting tools tends to be available only to larger and more established organizations. These costs are likely fueling its recent drop-off in popularity. New users simply cannot access it as easily as they can Python. While an academic/university version of the software is available free of charge for individual use, its feature set is limited. Therefore, for new users and start-up companies, SAS may not be the best choice, despite being a powerful tool. Additionally, with the expansion and maturity of the variety of packages that Python offers, many of the analytical abilities of Python now rival those of SAS, making it an attractive, cost-effective option even for very large firms.

Future of tech

Many of the expected advances in data analytics and tech in general are clearly pointing toward deep learning, machine learning, and artificial intelligence in general. These are especially attractive to companies dealing with large amounts of data.

While the technology to analyze data with complete independence is still emerging, Python is better situated to support companies that have begun laying the groundwork for these developments. Python’s rapidly expanding libraries for artificial intelligence and machine learning will likely make future transitions to deep learning algorithms more seamless.

While SAS has made some strides toward adding machine learning and deep learning functionalities to its repertoire, Python remains ahead and consistently ranks as the best language for deep learning and machine learning projects. This creates a symbiotic relationship between the language and its users. Developers use Python to develop ML projects since it is currently best suited for the job, which in turn expands Python’s ML capabilities — a cycle which practically cements Python’s position as the best language for future development in the AI sphere.

Overcoming the Challenges of a SAS-to-Python Migration

SAS-to-Python migrations bring a unique set of challenges that need to be considered. These include the following.

Memory overhead

Server space is getting cheaper but it’s not free. Although Python’s data analytics capabilities rival SAS’s, Python requires more memory overhead. Companies working with extremely large datasets will likely need to factor in the cost of extra server space. These costs are not likely to alter the decision to migrate, but they also should not be overlooked.

The SAS server

All SAS commands are run on SAS’s own server. This tightly controlled ecosystem makes SAS much faster than Python, which does not have the same infrastructure out of the box. Therefore, optimizing Python code can be a significant challenge during SAS-to-Python migrations, particularly when tackling it for the first time.

SAS packages vs Python packages

Calculations performed using SAS packages vs. Python packages can result in differences, which, while generally minuscule, cannot always be ignored. Depending on the type of data, this can pose an issue. And getting an exact match between values calculated in SAS and values calculated in Python may be difficult.

For example, the true value of “0” as a float datatype in SAS is approximated to 3.552714E-150, while in Python float “0” is approximated to 3602879701896397/255. These values do not create noticeable differences in most calculations. But some financial models demand more precision than others. And over the course of multiple calculations which build upon each other, they can create differences in fractional values. These differences must be reconciled and accounted for.

Comparing large datasets

One of the most common functions when working with large datasets involves evaluating how they change over time. SAS has a built-in function (proccompare) which compares datasets swiftly and easily as required. Python has packages for this as well; however, these packages are not as robust as their SAS counterparts. 


In most cases, the benefits of migrating from SAS to Python outweigh the challenges associated with going through the process. The envisioned savings can sometimes be attractive enough to cause firms to trivialize the transition costs. This should be avoided. A successful migration requires taking full account of the obstacles and making plans to mitigate them. Involving the right people from the outset — analysts well versed in both languages who have encountered and worked through the pitfalls — is key.

Changes to Loss Models…and How to Validate Them

So you’re updating all your modeling assumptions. Don’t forget about governance.

Modelers have now been grappling with how COVID-19 should affect assumptions and forecasts for nearly two months. This exercise is raising at least as many questions as it is answering.

No credit model (perhaps no model at all) is immune. Among the latest examples are mortgage servicers having to confront how to bring their forbearance and loss models into alignment with new realities.

These new realities are requiring servicers to model unprecedented macroeconomic conditions in a new and changing regulatory environment. The generous mortgage forbearance provisions ushered in by March’s CARES Act are not tantamount to loan forgiveness. But servicers probably shouldn’t count on reimbursement of their forbearance advances until loan liquidation (irrespective of what form the payoff takes).

The ramifications of these costs and how servicers should modeling them is a central topic to be addressed in a Mortgage Bankers Association webinar on Wednesday, May 13, “Modeling Forbearance Losses in the COVID-19 world” (free for MBA members). RiskSpan CEO Bernadette Kogler will lead a panel consisting of Faith Schwartz, Suhrud Dagli, and Morgan Snyder in a discussion of the forbearance’s regulatory implications, the limitations of existing models, and best practices for modeling forbearance-related advances, losses, and operational costs.

Models, of course, are only as good as their underlying data and assumptions. When it comes to forbearance modeling, those assumptions obviously have a lot to do with unemployment, but also with the forbearance take-up rate layered on top of more conventional assumptions around rates of delinquency, cures, modifications, and bankruptcies.

The unique nature of this crisis requires modelers to expand their horizons in search of applicable data. For example, GSE data showing how delinquencies trend in rising unemployment scenarios might need to be supplemented by data from Greek or other European crises to better simulate extraordinarily high unemployment rates. Expense and liquidation timing assumptions will likely require looking at GSE and private-label data from the 2008 crisis. Having reliable assumptions around these is critically important because liquidity issues associated with servicing advances are often more an issue of timing than of anything else.

Model adjustments of the magnitude necessary to align them with current conditions almost certainly qualify as “material changes” and present a unique set of challenges to model validators. In addition to confronting an expanded workload brought on by having to re-validate models that might have been validated as recently as a few months ago, validators must also effectively challenge the new assumptions themselves. This will likely prove challenging absent historical context.

RiskSpan’s David Andrukonis will address many of these challenges—particularly as they relate to CECL modeling—as he participates in a free webinar, “Model Risk Management and the Impacts of COVID-19,” sponsored by the Risk Management Association. Perhaps fittingly, this webinar will run concurrent with the MBA webinar discussed above.

As is always the case, the smoothness of these model-change validations will depend on the lengths to which modelers are willing to go to thoroughly document their justifications for the new assumptions. This becomes particularly important when introducing assumptions that significantly differ from those that have been used previously. While it will not be difficult to defend the need for changes, justifying the individual changes themselves will prove more challenging. To this end, meticulously documenting every step of feature selection during the modeling process is critical not only in getting to a reliable model but also in ensuring an efficient validation process.

Documenting what they’re doing and why they’re doing it is no modeler’s favorite part of the job—particularly when operating in crisis mode and just trying to stand up a workable solution as quickly as possible. But applying assumptions that have never been used before always attracts increased scrutiny. Modelers will need to get into the habit of memorializing not only the decisions made regarding data and assumptions, but also the other options considered, and why the other considered options were ultimately passed over.

Documenting this decision-making process is far easier at the time it happens, while the details are fresh in a modeler’s mind, than several months down the road when people inevitably start probing.

Invest in the “ounce of prevention” now. You’ll thank yourself when model validation comes knocking.

Get Started
Get A Demo

Linkedin    Twitter    Facebook