Imputation and Analysis with Machine Learning
white paper
Imputation and Analysis with Machine Learning
Despite industry-wide efforts to incorporate robust quality control programs, challenges with mortgage data persist. Fortunately, combining machine learning with cloud computing shows promise in addressing mortgage data gaps and producing more accurate results than traditional approaches.
This white paper introduces two methods to impute missing values and understand the relationships between various features of a residential loan database.
eBook: Machine Learning in Model Risk Management
ebook
Machine Learning in Model Risk Management
In this eBook, we first address some of the ways in which machine learning techniques can be leveraged by model validators to assess models developed using conventional means. We then tackle several considerations that model validators should take into account when independently assessing machine learning models that appear in their inventories.
In this eBook, you’ll learn:
- Real-world examples illustrating how machine learning models can be used to solve financial problems
- Procedures for validating machine learning models
- Machine learning methods that can be applied during model validation to understand and mitigate model risk
eBook: A Validator’s Guide to Model Risk Management
ebook
A Validator’s Guide to Model Risk Management
Learn from RiskSpan model validation experts what constitutes a model, considerations for validating vendor models, how to prepare, how to determine scope, comparisons of performance metrics, and considerations for evaluating model inputs.
Calculating VaR: A Review of Methods
Editor’s Note: This article discusses Value at Risk (VaR) calculation methodology for financial institutions managing mortgage, fixed income, and structured finance portfolios. Any references to asset classes in this article are illustrative examples of VaR methodology only. RiskSpan does not offer cryptocurrency or digital asset analytics products or services. Our analytics capabilities are focused exclusively on mortgage-backed securities, structured finance, and fixed income instruments.
Calculating VaR
A Review of Methods
CONTRIBUTOR
Don Brown
Co-Head of Quantitative Analytics
TABLE OF CONTENTS
Have questions about calculating VaR?
Chapter 1
Introduction

The VaR for a position or book of business can be defined as some threshold (in dollars) where the existing position, when faced with market conditions similar to some given historical period, will have P/L greater than with probability. Typically, is chosen to be or. To compute this threshold , we need to
- Set a significance percentile , a market observation period, and holding period n.1
- Generate a set of future market conditions (“scenarios”) from today to period n.
- Compute a P/L on the position for each scenario
After computing each position’s P/L, we sum the P/L for each scenario and then rank the scenarios’ P/L to find the kth percentile (worst) loss.2 This loss defines our VaR T at the kth percentile for observation-period length n. Determining what significance percentile k and observation length n to use is straightforward and is often dictated by regulatory rules, for example 99th percentile 10-day VaR is used for risk-based capital under the Market Risk Rule. Generating the scenarios and computing P/L under these scenarios is open to interpretation. We cover each of these in the next two sections, with their advantages and drawbacks.
Chapter 2
Generating Scenarios
To compute VaR, we first need to generate projective scenarios of market conditions. Broadly speaking, there are two ways to derive this set of scenarios3
- Project future market conditions using a Monte Carlo simulation framework
- Project future market conditions using historical (actual) changes in market conditions
MONTE CARLO SIMULATION
Many commercial providers simulate future market conditions using Monte Carlo simulation. To do this, they must first estimate the distributions of risk factors, including correlations between risk factors. Using correlations that are derived from historical data makes the general assumption that correlations are constant within the period. As shown in the academic literature, correlations tend to change, especially in extreme market moves – exactly the kind of moves that tend to define the VaR threshold.4 By constraining correlations, VaR may be either overstated or understated depending on the structure of the position. To account for this, some providers allow users to “stress” correlations by increasing or decreasing them. Such a stress scenario is either arbitrary, or is informed by using correlations from yet another time-period (for example, using correlations from a time of market stress), mixing and matching market data in an ad hoc way.
Further, many market risk factors are highly correlated, which is especially true on the interest rate curve. To account for this, some providers use a single factor for rate-level and then a second or third factor for slope and curvature of the curve. While this may be broadly representative, this approach may not capture subtle changes on other parts of the curve. This limited approach is acceptable for non-callable fixed income securities, but proves problematic when applying curve changes to complex securities such as MBS, where the security value is a function of forward mortgage rates, which in turn is a multivariate function of points on the curve and often implied volatility.
MONTE CARLO SIMULATION
Many commercial providers simulate future market conditions using Monte Carlo simulation. To do this, they must first estimate the distributions of risk factors, including correlations between risk factors. Using correlations that are derived from historical data makes the general assumption that correlations are constant within the period. As shown in the academic literature, correlations tend to change, especially in extreme market moves – exactly the kind of moves that tend to define the VaR threshold.4 By constraining correlations, VaR may be either overstated or understated depending on the structure of the position. To account for this, some providers allow users to “stress” correlations by increasing or decreasing them. Such a stress scenario is either arbitrary, or is informed by using correlations from yet another time-period (for example, using correlations from a time of market stress), mixing and matching market data in an ad hoc way.
Further, many market risk factors are highly correlated, which is especially true on the interest rate curve. To account for this, some providers use a single factor for rate-level and then a second or third factor for slope and curvature of the curve. While this may be broadly representative, this approach may not capture subtle changes on other parts of the curve. This limited approach is acceptable for non-callable fixed income securities, but proves problematic when applying curve changes to complex securities such as MBS, where the security value is a function of forward mortgage rates, which in turn is a multivariate function of points on the curve and often implied volatility.
HISTORICAL SIMULATION
RiskSpan projects future market conditions by using actual (observed) -day changes in market conditions over the look-back period. For example, if we are computing 10-day VaR for regulatory capital usage under the Market Risk Rule, RiskSpan takes actual 10-day changes in market variables. This approach allows our VaR scenarios to account for natural changes in correlation under extreme market moves, such as occurs during a flight-to-quality where risky assets tend to underperform risk-free assets, and risky assets tend to move in a highly correlated manner. RiskSpan believes this is a more natural way to capture changing correlations, without the arbitrary overlay of how to change correlations in extreme market moves. This, in turn, will more correctly capture VaR.5
HISTORICAL SIMULATION
RiskSpan projects future market conditions by using actual (observed) -day changes in market conditions over the look-back period. For example, if we are computing 10-day VaR for regulatory capital usage under the Market Risk Rule, RiskSpan takes actual 10-day changes in market variables. This approach allows our VaR scenarios to account for natural changes in correlation under extreme market moves, such as occurs during a flight-to-quality where risky assets tend to underperform risk-free assets, and risky assets tend to move in a highly correlated manner. RiskSpan believes this is a more natural way to capture changing correlations, without the arbitrary overlay of how to change correlations in extreme market moves. This, in turn, will more correctly capture VaR.5
Chapter 3
Calculating Simulated P/L
With the VaR scenarios defined, we move on to computing P/L under these scenarios. Generally, there are two methods employed
- A Taylor approximation of P/L for each instrument, sometimes called “delta-gamma”
- A full revaluation of each instrument using its market-accepted technique for valuation
Market practitioners sometimes blend these two techniques, employing full revaluation where the valuation technique is simple (e.g. yield + spread) and using delta-gamma where revaluation is more complicated (e.g. OAS simulation on MBS).
DELTA-GAMMA P/L APPROXIMATION
Many market practitioners use a Taylor approximation or “delta-gamma” approach to valuing an instrument under each VaR scenario. For instruments whose price function is approximately linear across each of the m risk factors, users tend to use the first order Taylor approximation, where the instrument price under the kth VaR scenario is given by


Where ΔP is the simulated price change, Δxi is the change in the ith risk factor, and is the price delta with respect to the ith risk factor evaluated at the base case. In many cases, these partial derivatives are approximated by bumping the risk factors up/down.6 If the instrument is slightly non-linear, but not non-linear enough to use a higher order approximation, then approximating a first derivative can be a source of error in generating simulated prices. For instruments that are approximately linear, using first order approximation is typically as good as full revaluation. From a computation standpoint, it is marginally faster but not significantly so. Instruments whose price function is approximately linear also tend to have analytic solutions to their initial price functions, for example yield-to-price, and these analytic solutions tend to be as fast as a first-order Taylor approximation. If the instrument is non-linear, practitioners must use a higher order approximation which introduces second-order partial derivatives. For an instrument with m risk-factors, we can approximate the price change in the kth scenario by using the multivariate second order Taylor approximation

To simplify the application of the second-order Taylor approximation, practitioners tend to ignore many of the cross-partial terms. For example, in valuing MBS under delta-gamma, practitioners tend to simplify the approximation by using the first derivatives and a single “convexity” term, which is the second derivative of price with respect to overall rates. Using this short-cut raises a number of issues:
- It assumes that the cross-partials have little impact. For many structured products, this is not true.7
- Since structured products calculate deltas using finite shifts, how exactly does one calculate a second-order mixed partials?8
- For structured products, using a single, second-order “convexity” term assumes that the second order term with respect to rates is uniform across the curve and does not vary by where you are on the curve. For complex mortgage products such as mortgage servicing rights, IOs and Inverse IOs, convexity can vary greatly depending on where you look at the curve.
Using a second-order approximation assumes that the second order derivatives are constant as rates change. For MBS, this is not true in general. For example, in the graphs below we show a constant-OAS price curve for TBA FNMA 30yr 3.5%, as well as a graph of its “DV01”, or first derivative with respect to rates. As you can see, the DV01 graph is non-linear, implying that the convexity term (second derivative of the price function) is non-constant, rendering a second-order Taylor approximation a weak assumption. This is especially true for large moves in rate, the kind of moves that dominate the computation of the VaR.9


In addition to the assumptions above, we occasionally observe that commercial VaR providers compute 1-day VaR and, in the interest of computational savings, scale this 1-day VaR by √10 to generate 10-day VaR. This approximation only works if
- Changes in risk factors are all independently, identically distributed (no autocorrelation or heteroscedasticity)
- The asset price function is linear in all risk factors
In general, neither of these conditions hold and using a scaling factor of √10 will likely yield an incorrect value for 10-day VaR.10
RATIONALIZING WEAKNESS IN THE APPROXIMATION
With the weaknesses in the Taylor approximation cited above, why do some providers still use delta-gamma VaR? Most practitioners will cite that the Taylor approximation is much faster than full revaluation for complex, non-linear instruments. While this seems true at first glance, you still need to:
- Compute or approximate all the first partial derivatives
- Compute or approximate some of the second partial derivatives and decide which are relevant or irrelevant. This choice may vary from security type to security type.
Neither of these tasks are computationally simple for complex, path-dependent securities which are found in many portfolios. Further, the choice of which second-order terms to ignore has to be supported by documentation to satisfy regulators under the Market Risk Rule.
Even after approximating partials and making multiple, qualitative assessments of which second-order terms to include/exclude, we are still left with error from the Taylor approximation. This error grows with the size of the market move, which also tends to be the scenarios that dominate the VaR calculation. With today’s flexible cloud computation and ultra-fast, cheap processing, the Taylor approximation and its computation of partials ends up being only marginally faster than a full revaluation for complex instruments.11
With the weaknesses in Taylor approximation, especially with non-linear instruments, and the speed and cheapness of full revaluation, we believe that fully revaluing each instrument in each scenario is both more accurate and more straightforward than having to defend a raft of assumptions around the Taylor approximation.
Chapter 4
Conclusion
With these points in mind, what is the best method for computing VaR? Considering the complexity of many instruments, and considering the comparatively cheap and fast computation available through today’s cloud computing, we believe that calculating VaR using a historical-scenario, full revaluation approach provides the most accurate and robust VaR framework.
From a scenario generation standpoint, using historical scenarios allows risk factors to evolve in a natural way. This in turn captures actual changes in risk factor correlations, changes which can be especially acute in large market moves. In contrast, a Monte Carlo simulation of scenarios typically allows users to “stress” correlations, but these stresses scenarios are arbitrary which may ultimately lead to misstated risk.
From a valuation framework, we feel that full revaluation of assets provides the most accurate representation of risk, especially for complex instruments such as complex ABS and MBS securities. The assumptions and errors introduced in the Taylor approximation may overwhelm any minor savings in run-time, given today’s powerful and cheap cloud analytics. Further, the Taylor approximation forces users to make and defend qualitative judgements of which partial derivatives to include and which to ignore. This greatly increasing the management burden around VaR as well as regulatory scrutiny around justifying these assumptions.
In short, we believe that a historical scenario, full-revaluation VaR provides the most accurate representation of VaR, and that today’s cheap and powerful computing make this approach feasible for most books and trading positions. For VaR, it’s no longer necessary to settle for second-best.
References
ENDNOTES
5 For example, a bank may have positions in two FX pairs that are poorly correlated in times normal times and highly negatively correlated in times of stress. If a 99%ile worst-move coincides with a stress period, then the aggregate P/L from the two positions may offset each other. If we used the overall correlation to drive a Monte Carlo simulated VaR, the calculated VaR could be much higher.
6 This is especially common in MBS, where the first and second derivatives are computed using a secant-line approximation after shifting risk factors, such as shifting rates ± 25bp
7 For example, as rates fall and a mortgage becomes more refinancible, the mortgage’s exposure to implied volatility also increases, implying that the cross-partial for price with respect to rates and vol is non-zero.
8 Further, since we are using finite shifts, the typical assumption that ƒxy = ƒyx which is based on the smoothness of ƒ(x,y) does not necessarily hold. Therefore, we need to compute two sets of cross partials, further increasing the initial setup time.
9 Why is the second derivative non-constant? As rates move significantly, prepayments stop rising or falling. At these “endpoints,” cash flows on the mortgage change little, making the instrument positively convex like a fixed-amortization schedule bond. In between, changes in prepayments case the mortgage to extend or shorten as rates rise or fall, respectively, which in turn make the MBS negatively convex.
10 Much has been written on the weakness of this scaling, see for example [7]
11 For example, using a flexible computation grid RiskSpan can perform a full OAS revaluation on 20,000 MBS passthroughs using a 250-day lookback period in under one hour. Lattice-solved options are an order of magnitude faster, and analytic instruments such as forwards, European options, futures and FX are even faster.
1 The holding period n is typically one day, ten days, or 21 days (a business-month) although in theory it can be any length period.
2 We can also partition the book into different sub-books or “equivalence classes” and compute VaR on each class in the partition. The entire book is the trivial partition.
3 There is a third approach to VaR: parametric VaR, where the distributions of asset prices are described by the well-known distributions such as Gaussian. Given the often-observed heavy-tail distributions, combined with difficulties in valuing complex assets with non-linear payoffs, we will ignore parametric VaR in this review.
4 The academic literature contains many papers on increased correlation during extreme market moves, for example [5]
5 For example, a bank may have positions in two FX pairs that are poorly correlated in times normal times and highly negatively correlated in times of stress. If a 99%ile worst-move coincides with a stress period, then the aggregate P/L from the two positions may offset each other. If we used the overall correlation to drive a Monte Carlo simulated VaR, the calculated VaR could be much higher.
6 This is especially common in MBS, where the first and second derivatives are computed using a secant-line approximation after shifting risk factors, such as shifting rates ± 25bp
7 For example, as rates fall and a mortgage becomes more refinancible, the mortgage’s exposure to implied volatility also increases, implying that the cross-partial for price with respect to rates and vol is non-zero.
8 Further, since we are using finite shifts, the typical assumption that ƒxy = ƒyx which is based on the smoothness of ƒ(x,y) does not necessarily hold. Therefore, we need to compute two sets of cross partials, further increasing the initial setup time.
9 Why is the second derivative non-constant? As rates move significantly, prepayments stop rising or falling. At these “endpoints,” cash flows on the mortgage change little, making the instrument positively convex like a fixed-amortization schedule bond. In between, changes in prepayments case the mortgage to extend or shorten as rates rise or fall, respectively, which in turn make the MBS negatively convex.
10 Much has been written on the weakness of this scaling, see for example [7]
11 For example, using a flexible computation grid RiskSpan can perform a full OAS revaluation on 20,000 MBS passthroughs using a 250-day lookback period in under one hour. Lattice-solved options are an order of magnitude faster, and analytic instruments such as forwards, European options, futures and FX are even faster.
Get the fully managed solution
Using RS Edge to Quantify the Impact of The QM Patch Expiration
Using RS Edge Data to Quantify the Impact of the QM Patch Expiration
A 2014 Consumer Financial Protection Bureau (CFPB) rule established that mortgages purchased by the GSEs (Fannie Mae or Freddie Mac) can be considered “qualified” even if their debt-to-income ratio (DTI) exceeds 43 percent. This provision is known as the “qualified mortgage (QM) patch” or sometimes the “GSE patch.” It has become one of the most important holdouts of the Dodd-Frank Act and an important facilitator of U.S. lending activity under looser credit standards. The CFPB implemented the patch to encourage lenders to make loans that do not meet QM requirements, but are still “responsibly underwritten.” Because all GSE loans must pass the strict standards for conforming mortgages, they are presumed to be reasonably underwritten–notwithstanding sometimes having DTI ratios higher than 43 percent.
The QM patch is set to expire on January 10, 2021. This phaseout has spawned concern over the impact both on mortgage originators and potentially on borrowers when the patch is no longer available and GSEs are less apt to purchase loans with higher DTI ratios.[1]
We performed an analysis of GSE loan data housed in RiskSpan’s RS Edge platform to quantify this potential impact.
The Good News:
The slowdown in purchases of high-DTI loans is already occurring, which could partially mitigate the impact of the expiration of the patch.
We used RS Edge to analyze the percentage of QM loans to which the patch applies today. From 2016 through the beginning of 2019, Fannie and Freddie sharply increased their purchases of loans with DTI ratios greater than 43 percent, with these loans accounting for over 34 percent of Fannie’s purchases as recently as February 2019 and over 30 percent of Freddie’s purchases in November 2018 (see Figure 1).
Figure 1: % of GSE Acquisitions with DTI > 43 (2016 – 2019)

Our data shows, however, that Fannie and Freddie have already begun to wind down purchases of these loans. By the end of 2019, only about 23 percent of GSE loans purchased had DTI greater than 43 percent. This is illustrated more clearly in Figure 2, below.
Figure 2: % of GSE Acquisitions with DTI > 43 (2019 only)

As discussed in the December 2019 Wall Street Journal article “Fannie Mae and Freddie Mac Curb Some Loans as Regulator Reins in Risk,” the wind-down could be related to the GSE’s general efforts to hold stronger portfolios as they aim to climb out of conservatorship. However, our data suggests an equally plausible explanation for the slowdown Borrowers generally exhibit a greater willingness to stretch their incomes to buy a house than to refinance, so purchase loans are more likely than refinancings to feature higher DTI ratios. Figure 3 illustrates this phenomenon.
Figure 3: Most High-DTI Loans Back Home Purchases

The Bad News:
The bad news, of course, is that one-fifth of Freddie and Fannie loans purchased with DTI>43% is still significant. Over 900,000 mortgages purchased by the GSEs in 2019 were of the High-DTI variety, accounting for over $240 billion in UPB.
In theory, these 900,000 borrowers will no longer have a way of being slotted into QM loans after the patch expires next year. While this could be good news for the non-QM market, which would potentially be poised to capture this new business, it may not be the best news for these borrowers, who likely do not fancy paying the higher interest rates generally associated with non-QM lending.
Originators, not relishing the prospect of losing QM protection for these loans, have also expressed concern about the phaseout of the patch. A group of lenders that includes Wells Fargo and Quicken Loans has petitioned the CFPB to completely eliminate the DTI requirements under ability-to-pay rules.
Figure 4: % of DTI>43 Loans Sold to GSEs by Originator

We will be closely monitoring the situation and continuing to offer tools that will help to quantify the potential impact of the expiration.
[1] Consumer Financial Protection Bureau, July 25, 2019.[/vc_column_text][/vc_column][/vc_row]
Bill Moretti, Industry Leader in Structured Finance and Fintech Joins RiskSpan to Lead Innovation Lab
ARLINGTON, VA, January 10, 2020 –
Bill Moretti, an industry leader at the intersection of structured finance, financial technology, and portfolio management, has joined RiskSpan as a Senior Managing Director and head of its SmartLink innovation lab.
Over the course of his two-decade tenure as a senior investment executive with MetLife, Bill became recognized as an innovative and energetic leader, strategic thinker, change agent, and savvy risk manager. As MetLife’s head of Global Structured Finance, Bill created proprietary analytical systems, which he paired with traditional fundamental credit analysis to maximize portfolio income and returns through market rallies while preserving investment capital during crises.
“Bill is exactly who we were looking for, and we are delighted to have his unique blend of expertise,” said RiskSpan CEO Bernadette Kogler. “Bill’s track record as a successful implementer of disruptive solutions in capital markets—an industry with a history of stubbornness when it comes to technology innovation—makes him a perfect complement to RiskSpan’s talent portfolio.”
Bill co-chairs the Structured Finance Association’s Technology Innovation Committee and is a past chairman and current member of the American Council of Life Insurers’ Advisory Committee.
On January 30th, Bill will join industry veterans Bernadette Kogler and Suhrud Dagli for a free webinar discussing the need for better data and analytics in in a changing fixed-income market. Register now for 2020: Entering The Decade in Data & Smart Analytics.
About RiskSpan
RiskSpan simplifies the management of complex data and models in the capital markets, commercial banking, and insurance industries. We transform seemingly unmanageable loan data and securities data into productive business analytics.
Media Contact
Timothy Willis
Email: info@riskspan.com
Phone: (703) 956-5200
Institutionally Focused Broker-Dealer: Prepayment Analysis
An institutional-broker dealer needed a solution to analyze agency MBS prepayment data.
The Solution
The Edge Platform has been adopted and is actively used by the Agency trading desk to analyze Agency MBS prepayment data, to discover relationships between borrower characteristics and prepayment behavior.
Commercial Bank: CECL Model Validation
A commercial bank required an independent validation of its CECL models. The models are embedded into three platforms (Trepp, Impairment Studio and Evolv) and included the following:
- Trepp Default Model (Trepp DM) is used by the Bank to estimate the PD, LGD and EL of the CRE portfolio
- Moody’s ImpairmentStudio – Lifetime Loss Rate (LLR) Model is used to calculate the Lifetime Loss Rate for the C&I portfolio
- EVOLV – Lifetime Loss Rate (LLR) model is used to calculate the Lifetime Loss Rate for Capital Call and Venture Capital loans within the Commercial and Industrial (C&I) segment, Non-rated Commercial loans, Consumer as well as Municipal loans
- EVOLV – Base Loss Rate (BLR) model is used to calculate quantitative allowance for 1-4 Family commercial loans and Personal loans for commercial use within the C&I segment Residential loans, HELOC and Indirect vehicle.
The Solution
Because the CECL models are embedded into three platforms, RiskSpan conducted an independent, comprehensive validation of all three platforms.
Our validation included components typical of a full-scope model validation, focusing on a conceptual soundness review, process verification and outcomes analysis.
Deliverables
RiskSpan was given access to the models’ platforms, and workpapers, along with the models’ development documentation, and weekly Q&A sessions with the model owners.
Our review evaluated:
i. the business requirements and purpose of the model, and the metrics that used by the developer to select the best model and evaluate its success in meeting these requirements will be judged.
ii. the identification and justification for
(a) any theoretical basis for the model structure;
(b) the use of specific developmental data;
(c) the use of any statistical or econometric technique to estimate the model; and
(d) the criteria used to identify and select the best model among alternatives.
iii. the reasonableness of model-development decisions, documented assumptions, data adjustments, and model-performance criteria as measured at the time of development.
iv. Process verification to determine the accuracy of data transcription, adjustment, transformation and model code.
RiskSpan produced a written validation report detailing its validation assessments, tests, and findings, and providing a summary assessment of the suitability of the models for their intended uses as an input to the bank’s CECL process, based upon the Conceptual Soundness Review and Process Verification.
Regional Bank: AML/BSA Model Validation
A large regional bank required a qualified, independent third party to perform risk-based procedures designed to provide reasonable assurance that its FCRM anti-money laundering system’s transaction monitoring, customer risk rating, and watch list filtering applications were functioning as designed and intended.
The Solution
RiskSpan reviewed existing materials, past audits and results, testing protocols and all documentation related to the bank’s model risk management standards, model setup and execution. We inventoried all model data sources, scoring processes and outputs related to the AML system.
The solution consisted of testing each of the five model segments: Design and Development; Input Processing; Implementation; Output and Use; and Performance.
The solution also quantified risk and exposure of identified gaps and limitations and presented sound industry practices and resolutions.
Deliverables
- A sustainable and robust transaction monitoring tuning methodology, which documented the bank’s approach, processes to be executed, frequency of execution, and the governance structure for executing tuning and optimization in the AML model. This included collecting and assessing previous regulatory feedback.
- A framework that included a formal, documented, consistent process for sampling and analysis procedures to evaluate the ALM system’s scenarios and change control documentation.
- A process for managing model risk consistent with the bank’s examiner expectations and business needs.

