Get Started
Articles Tagged with: Model Validation

A Discussion About SAS to Python MigrationTrends

Recently, quite a few financial institutions are migrating their modelling code bases from SAS to Python. There are many reasons for this, some of which may be unique to the organization in question, but many apply universally. RiskSpan is actively helping clients with the switch from SAS to Python since our team has expertise in both coding languages, as well as detailed knowledge about the financial models and their purpose. This article dives into the main reasons for this transition.

  1. Popularity:

In recent years, the popularity of Python has really skyrocketed. It is one of the most user-friendly programming languages in use today, because of the intuitive syntax, as well as the wide array of packages available to aid in development. Due to these properties, many users who may not have a coding background are using this language as a gateway into the world of software development and leveraging it as a building block to expand their toolbox of professional qualifications.

From a company’s perspective, Python is an open source language with tons of resources, and low overhead costs. This, coupled with the popularity among both developers and analysts, makes it a great choice when deciding to migrate your codebase.

Note: R is also a popular, open source language which is a powerful tool for data analytics. However, while R is specifically used for statistical analysis, Python can be used for a whole range of uses (UI design, web development, business applications, etc.). This flexibility makes Python attractive to companies that want synchronicity across their tools, or the ability to have their developers transition between teams relatively seamlessly. R remains popular in academic circles where a powerful, easy to understand tool is needed to perform statistical analysis, but additional flexibility is not necessarily required. Hence, R is not discussed in this blog post.

There are certain drawbacks with Python being an open source language, however – there is less oversight when new features or packages are added, so while updates may be quicker, these updates are also more prone to error as compared to languages like SAS which have thoroughly tested updates to its software package.

  1. Visualization Capabilities:

While SAS and Python support data visualization, the consensus is that SAS has very basic data visualization tools. To create more advanced visuals with SAS, the SAS Visual Analytics platform needs to be used, which comes at an added cost.

Python has packages such as matplotlib, plotly, seaborn, etc. which can be leveraged to create powerful and detailed visualizations by simply importing the libraries into the existing codebase.

  1. Accessibility:

SAS is a command-driven software package used for statistical analysis and data visualization. It is available only for Windows operating systems. It is arguably one of the most widely used statistical software packages in both industry and academia.

For financial institutions with large amounts of data, SAS has been an extremely valuable tool. It is a well-documented language, with many online resources, and is considered relatively intuitive to pick up and understand – especially when users have prior experience with SQL. SAS is also one of the few tools which has a customer support line. Currently, SAS has the biggest market share as far as data analytics software is concerned, for these very reasons.

However, SAS is a paid service, and at a standalone level, the costs can be quite prohibitive. The cost of using SAS is also a barrier for smaller companies and start-ups. Hence, complete access to the full breadth of SAS and its supporting tools is only available to larger and more established organizations. While this makes sense, in recent times this has also been the main cause for a drop off in popularity. New users cannot access it as easily as python. While an academic/university version of the software is available free of charge for individual use, its feature set is limited. Therefore, for new users and start-up companies alike, SAS may not be the best choice, despite being a powerful tool. Additionally, with the expansion and maturity of the variety of packages that Python offers, many of the analytical abilities of Python rival those of SAS, making it an attractive option for larger organizations as well, since migration from SAS to Python will prove quite cost effective in the long run.

  1. Future of tech:

Many of the expected advances in data analytics and tech in general seem to be pointing toward Deep learning and machine learning.

Deep learning is a subset of machine learning, which in turn in a subset of artificial intelligence. Machine learning is the study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead (source : Deep learning is the process of implementing machine learning practices using artificial neural networks.

To different extents, deep learning and machine learning help computer systems recalibrate constantly so that they learn from updates or changes to the data fed into their models. Since the process should involve minimal input from a developer in the future, it is an attractive option for companies which deal with a large amount of data. While the technology to analyze data completely independently is still some way off, many companies have already begun to lay the groundwork for these future developments. Currently, Python is better placed to make headway in this avenue. With its rapidly expanding libraries for artificial intelligence and machine learning, transferring code bases to python now may make future transitions to deep learning algorithms more seamless.

While SAS has made some strides toward adding machine learning and deep learning functionalities to its repertoire, Python is far ahead in this sphere. In fact, Python has been consistently ranked as the best language for Deep learning and Machine learning projects in recent years. This creates a symbiotic relationship between the language and its users; developers use Python to develop ML projects since it is currently best suited for the job, which in turn expands Python’s ML capabilities; a cycle which practically cements Python’s position as the best language for future development in the AI sphere.

This factor is likely to have the greatest impact in the coming future on the decisions data driven companies are going to have to make for their long-term goals for technology.

Challenges during SAS to Python migrations:

While SAS to Python migrations are a route that some companies are considering, the process comes with its own set of unique challenges. Some of them are as follows:

  1. Python has more memory overhead compared to SAS

While Python might have similar capabilities to SAS regarding data analytics, Python has more memory overhead compared to SAS. As a result, when companies are working with extremely large datasets, they may need to factor in the cost of extra server space.

  1. SAS has their own server.

SAS has its own server, which is used to run all SAS commands. This tightly controlled ecosystem makes SAS much faster than Python, which does not have the same infrastructure out of the box. Therefore, optimizing Python code can be a significant challenge during SAS to Python migrations.

  1. SAS packages vs Python packages

Calculations done using SAS packages vs Python packages can have miniscule differences in value. Depending on the type of data, this can pose an issue, since getting an exact match between values calculated in SAS and values calculated in Python may be difficult.

For example, the true value of “0” as a float datatype in SAS is approximated to 3.552714E-150, while in Python float “0” is approximated to 3602879701896397/255. For most intents and purposes, these values do not create noticeable differences in calculations. However, some financial models can be exacting, and over the course of multiple calculations which build upon each other, you may see differences in fractional values.

  1. Comparing large datasets.

One of the most common functions while working with large datasets is looking at the changes which occur in these datasets over time. SAS has a built-in function called proccompare which compares datasets swiftly and easily as required. Python has packages which can do this as well, however these packages are not as robust as their SAS counterparts. Hence, this is a possible challenge which needs to be considered.

In conclusion, there are many advantages and many challenges when considering SAS to Python migration, and this article aimed to shed light on some of the most important factors. Ultimately, it will come down to each organization’s specific needs when making the decision about whether to pursue this path.

Changes to Loss Models…and How to Validate Them

So you’re updating all your modeling assumptions. Don’t forget about governance.

Modelers have now been grappling with how COVID-19 should affect assumptions and forecasts for nearly two months. This exercise is raising at least as many questions as it is answering.

No credit model (perhaps no model at all) is immune. Among the latest examples are mortgage servicers having to confront how to bring their forbearance and loss models into alignment with new realities.

These new realities are requiring servicers to model unprecedented macroeconomic conditions in a new and changing regulatory environment. The generous mortgage forbearance provisions ushered in by March’s CARES Act are not tantamount to loan forgiveness. But servicers probably shouldn’t count on reimbursement of their forbearance advances until loan liquidation (irrespective of what form the payoff takes).

The ramifications of these costs and how servicers should modeling them is a central topic to be addressed in a Mortgage Bankers Association webinar on Wednesday, May 13, “Modeling Forbearance Losses in the COVID-19 world” (free for MBA members). RiskSpan CEO Bernadette Kogler will lead a panel consisting of Faith Schwartz, Suhrud Dagli, and Morgan Snyder in a discussion of the forbearance’s regulatory implications, the limitations of existing models, and best practices for modeling forbearance-related advances, losses, and operational costs.

Models, of course, are only as good as their underlying data and assumptions. When it comes to forbearance modeling, those assumptions obviously have a lot to do with unemployment, but also with the forbearance take-up rate layered on top of more conventional assumptions around rates of delinquency, cures, modifications, and bankruptcies.

The unique nature of this crisis requires modelers to expand their horizons in search of applicable data. For example, GSE data showing how delinquencies trend in rising unemployment scenarios might need to be supplemented by data from Greek or other European crises to better simulate extraordinarily high unemployment rates. Expense and liquidation timing assumptions will likely require looking at GSE and private-label data from the 2008 crisis. Having reliable assumptions around these is critically important because liquidity issues associated with servicing advances are often more an issue of timing than of anything else.

Model adjustments of the magnitude necessary to align them with current conditions almost certainly qualify as “material changes” and present a unique set of challenges to model validators. In addition to confronting an expanded workload brought on by having to re-validate models that might have been validated as recently as a few months ago, validators must also effectively challenge the new assumptions themselves. This will likely prove challenging absent historical context.

RiskSpan’s David Andrukonis will address many of these challenges—particularly as they relate to CECL modeling—as he participates in a free webinar, “Model Risk Management and the Impacts of COVID-19,” sponsored by the Risk Management Association. Perhaps fittingly, this webinar will run concurrent with the MBA webinar discussed above.

As is always the case, the smoothness of these model-change validations will depend on the lengths to which modelers are willing to go to thoroughly document their justifications for the new assumptions. This becomes particularly important when introducing assumptions that significantly differ from those that have been used previously. While it will not be difficult to defend the need for changes, justifying the individual changes themselves will prove more challenging. To this end, meticulously documenting every step of feature selection during the modeling process is critical not only in getting to a reliable model but also in ensuring an efficient validation process.

Documenting what they’re doing and why they’re doing it is no modeler’s favorite part of the job—particularly when operating in crisis mode and just trying to stand up a workable solution as quickly as possible. But applying assumptions that have never been used before always attracts increased scrutiny. Modelers will need to get into the habit of memorializing not only the decisions made regarding data and assumptions, but also the other options considered, and why the other considered options were ultimately passed over.

Documenting this decision-making process is far easier at the time it happens, while the details are fresh in a modeler’s mind, than several months down the road when people inevitably start probing.

Invest in the “ounce of prevention” now. You’ll thank yourself when model validation comes knocking.

Webinar: Applying Model Validation Principles to Anti-Money Laundering Tools


Applying Model Validation Principles to Anti-Money Laundering Tools

This webinar will explore some of the more efficient ways we have encountered for applying model validation principles to AML tools, including:

  • Ensuring that the rationale supporting rules and thresholds is sufficiently documented 
  • Applying above-the-line and below-the-line testing to an effective benchmarking regime 
  • Assessing the relevance of rules that are seldom triggered or frequently overridden 

About The Hosts

Timothy Willis

Managing Director – RiskSpan

Timothy Willis is head of RiskSpan’s Governance and Controls Practice, with a particular focus on model risk management. He is an experienced engagement manager, financial model validator and mortgage industry analyst who regularly authors and oversees the delivery of technical reports tailored to executive management and regulatory audiences.

Tim has directed projects validating virtually every type of model used by banks. He has also developed business requirements and improved processes for commercial banks of all sizes, mortgage banks, mortgage servicers, Federal Home Loan Banks, rating agencies, Fannie Mae, Freddie Mac, and U.S. Government agencies.

Susan Devine, Cams, CPA

Senior Consultant – Third Pillar Consulting

Susan has more than twenty years of experience as an independent consultant providing business analysis, financial model validations, anti-money laundering reviews in compliance with the Bank Secrecy Act, and technical writing to government and commercial entities. Experience includes developing and documenting business processes, business requirements, security requirements, computer systems, networks, systems development lifecycle activities, and financial models. Experience related to business processes includes business process reviews, security plans in compliance with NIST and GISRA, Sarbanes Oxley compliance documents, Dodd-Frank Annual Stress Testing, functional and technical requirements for application development projects, policies, standards, and operating procedures for business and technology processes.

Chris Marsten

Financial and Data Analyst – RiskSpan

Chris is a financial and data analyst at RiskSpan where he develops automated analytics and reporting for client loan portfolios and provides data analysis in support of model validation projects. He also possesses extensive experience writing ETL code and automating manual processes. Prior to coming to RiskSpan, he developed and managed models for detecting money laundering and terrorist activity for Capital One Financial Corporation, where he also forecasted high-risk customer volumes and created an alert investigation tool for identifying suspicious customers and transactions.

Webinar: Building and Running an Efficient Model Governance Program


Building and Running an Efficient Model Governance Program

Join RiskSpan Model Governance Expert Tim Willis for a webinar about running an efficient program. This webinar will cover essential elements of a model risk management policy including how to devise policies for open-source models and other applications not easily categorized. They’ll discuss best practices for building and maintaining a model inventory, tips for assigning appropriate risk ratings to models and determining validation frequency.

About The Host

Timothy Willis

Managing Director – RiskSpan

Timothy Willis is head of RiskSpan’s Governance and Controls Practice, with a particular focus on model risk management. He is an experienced engagement manager, financial model validator and mortgage industry analyst who regularly authors and oversees the delivery of technical reports tailored to executive management and regulatory audiences.

Webinar: Managing Down Model Validation Costs


Managing Down Model Validation Costs

Learn how to make your model validation budget go further for you.  In this webinar, you’ll learn about:  Balancing internal and external resources, prioritizing models with the most risk, documenting to facilitate the process.

About The Hosts

Timothy Willis

Managing Director – RiskSpan

Timothy Willis is an experienced engagement manager, financial model validator and mortgage industry analyst who regularly authors and oversees the delivery of technical reports tailored to executive management and regulatory audiences. Tim has directed projects validating virtually every type of model used by banks. He has also developed business requirements and improved processes for commercial banks of all sizes, mortgage banks, mortgage servicers, Federal Home Loan Banks, rating agencies, Fannie Mae, Freddie Mac, and U.S. Government agencies.

Nick Young

Director of Model Risk Management

Nick Young has more than ten years of experience as a quantitative analyst and economist. At RiskSpan, he performs model validation, development and governance on a wide variety of models including those used for Basel capital planning, reserve/impairment, Asset Liability Management (ALM), CCAR/DFAST stress testing, credit origination, default, prepayment, market risk, Anti-Money Laundering (AML), fair lending, fraud and account management.

eBook: A Validator’s Guide to Model Risk Management


A Validator’s Guide to Model Risk Management

Learn from RiskSpan model validation experts what constitutes a model, considerations for validating vendor models, how to prepare, how to determine scope, comparisons of performance metrics, and considerations for evaluating model inputs.

Model Validation Programs – Optimizing Value in Model Risk Groups

Watch RiskSpan Managing Director, Tim Willis, discuss how to optimize model validation programs. RiskSpan’s model risk management practice has experience in both building and validating models, giving us unique expertise to provide very high quality validations without diving into activities and exercises of marginal value.


Talk Scope


Here Come the CECL Models: What Model Validators Need to Know

As it turns out, model validation managers at regional banks didn’t get much time to contemplate what they would do with all their newly discovered free time. Passage of the Economic Growth, Regulatory Relief, and Consumer Protection Act appears to have relieved many model validators of the annual DFAST burden. But as one class of models exits the inventory, a new class enters—CECL models.

Banks everywhere are nearing the end of a multi-year scramble to implement a raft of new credit models designed to forecast life-of-loan performance for the purpose of determining appropriate credit-loss allowances under the Financial Accounting Standards Board’s new Current Expected Credit Loss (CECL) standard, which takes full effect in 2020 for public filers and 2021 for others.

The number of new models CECL adds to each bank’s inventory will depend on the diversity of asset portfolios. More asset classes and more segmentation will mean more models to validate. Generally model risk managers should count on having to validate at least one CECL model for every loan and debt security type (residential mortgage, CRE, plus all the various subcategories of consumer and C&I loans) plus potentially any challenger models the bank may have developed.

In many respects, tomorrow’s CECL model validations will simply replace today’s allowance for loan and lease losses (ALLL) model validations. But CECL models differ from traditional allowance models. Under the current standard, allowance models typically forecast losses over a one-to-two-year horizon. CECL requires a life-of-loan forecast, and a model’s inputs are explicitly constrained by the standard. Accounting rules also dictate how a bank may translate the modeled performance of a financial asset (the CECL model’s outputs) into an allowance. Model validators need to be just as familiar with the standards governing how these inputs and outputs are handled as they are with the conceptual soundness and mathematical theory of the credit models themselves.

CECL Model Inputs – And the Magic of Mean Reversion

Not unlike DFAST models, CECL models rely on a combination of loan-level characteristics and macroeconomic assumptions. Macroeconomic assumptions are problematic with a life-of-loan credit loss model (particularly with long-lived assets—mortgages, for instance) because no one can reasonably forecast what the economy is going to look like six years from now. (No one really knows what it will look like six months from now, either, but we need to start somewhere.) The CECL standard accounts for this reality by requiring modelers to consider macroeconomic input assumptions in two separate phases: 1) a “reasonable and supportable” forecast covering the time frame over which the entity can make or obtain such a forecast (two or three years is emerging as common practice for this time frame), and 2) a “mean reversion” forecast based on long-term historical averages for the out years. As an alternative to mean reverting by the inputs, entities may instead bypass their models in the out years and revert to long-term average performance outcomes by the relevant loan characteristics.

Assessing these assumptions (and others like them) requires a model validator to simultaneously wear a “conceptual soundness” testing hat and an “accounting policy” compliance hat. Because the purpose of the CECL model is to prove an accounting answer and satisfy an accounting requirement, what can validators reasonably conclude when confronted with an assumption that may seem unsound from purely statistical point of view but nevertheless satisfies the accounting standard?

Taking the mean reversion requirement as an example, the projected performance of loans and securities beyond the “reasonable and supportable” period is permitted to revert to the mean in one of two ways: 1) modelers can feed long-term history into the model by supplying average values for macroeconomic inputs, allowing modeled results to revert to long-term means in that way, or 2) modelers can mean revert “by the outputs” – bypassing the model and populating the remainder of the forecast with long-term average performance outcomes (prepayment, default, recovery and/or loss rates depending on the methodology). Either of these approaches could conceivably result in a modeler relying on assumptions that may be defensible from an accounting perspective despite being statistically dubious, but the first is particularly likely to raise a validator’s eyebrow. The loss rates that a model will predict when fed “average” macroeconomic input assumptions are always going to be uncharacteristically low. (Because credit losses are generally large in bad macroeconomic environments and low in average and good environments, long-term average credit losses are higher than the credit losses that occur during average environments. A model tuned to this reality—and fed one path of “average” macroeconomic inputs—will return credit losses substantially lower than long-term average credit losses.) A credit risk modeler is likely to think that these are not particularly realistic projections, but an auditor following the letter of the standard may choose not find any fault with them. In such situations, validators need to fall somewhere in between these two extremes—keeping in mind that the underlying purpose of CECL models is to reasonably fulfill an accounting requirement—before hastily issuing a series of high-risk validation findings.

CECL Model Outputs: What are they?

CECL models differ from some other models in that the allowance (the figure that modelers are ultimately tasked with getting to) is not itself a direct output of the underlying credit models being validated. The expected losses that emerge from the model must be subject to a further calculation in order to arrive at the appropriate allowance figure. Whether these subsequent calculations are considered within the scope of a CECL model validation is ultimately going to be an institutional policy question, but it stands to reason that they would be.

Under the CECL standard, banks will have two alternatives for calculating the allowance for credit losses: 1) the allowance can be set equal to the sum of the expected credit losses (as projected by the model), or 2) the allowance can be set equal to the cost basis of the loan minus the present value of expected cash flows. While a validator would theoretically not be in a position to comment on whether the selected approach is better or worse than the alternative, principles of process verification would dictate that the validator ought to determine whether the selected approach is consistent with internal policy and that it was computed accurately.

When Policy Trumps Statistics

The selection of a mean reversion approach is not the only area in which a modeler may make a statistically dubious choice in favor of complying with accounting policy.

Discount Rates

Translating expected losses into an allowance using the present-value-of-future-cash-flows approach (option 2—above) obviously requires selecting an appropriate discount rate. What should it be? The standard stipulates the use of the financial asset’s Effective Interest Rate (or “yield,” i.e., the rate of return that equates an instrument’s cash flows with its amortized cost basis). Subsequent accounting guidance affords quite a bit a flexibility in how this rate is calculated. Institutions may use the yield that equates contractual cash flows with the amortized cost basis (we can call this “contractual yield”), or the rate of return that equates cash flows adjusted for prepayment expectations with the cost basis (“prepayment-adjusted yield”).

The use of the contractual yield (which has been adjusted for neither prepayments nor credit events) to discount cash flows that have been adjusted for both prepayments and credit events will allow the impact of prepayment risk to be commingled with the allowance number. For any instruments where the cost basis is greater than unpaid principal balance (a mortgage instrument purchased at 102, for instance) prepayment risk will exacerbate the allowance. For any instruments where the cost basis is less than the unpaid principal balance, accelerations in repayment will offset the allowance. This flaw has been documented by FASB staff, with the FASB Board subsequently allowing but not requiring the use of a prepay-adjusted yield.

Multiple Scenarios

The accounting standard neither prohibits nor requires the use of multiple scenarios to forecast credit losses. Using multiple scenarios is likely more supportable from a statistical and model validation perspective, but it may be challenging for a validator to determine whether the various scenarios have been weighted properly to arrive at the correct, blended, “expected” outcome.

Macroeconomic Assumptions During the “Reasonable and Supportable” Period

Attempting to quantitatively support the macro assumptions during the “reasonable and supportable” forecast window (usually two to three years) is likely to be problematic both for the modeler and the validator. Such forecasts tend to be more art than science and validators are likely best off trying to benchmark them against what others are using than attempting to justify them using elaborately contrived quantitative methods. The data that is mostly likely to be used may turn out to be simply the data that is available. Validators must balance skepticism of such approaches with pragmatism. Modelers have to use something, and they can only use the data they have.

Internal Data vs. Industry Data

The standard allows for modeling using internal data or industry proxy data. Banks often operate under the dogma that internal data (when available) is always preferable to industry data. This seems reasonable on its face, but it only really makes sense for institutions with internal data that is sufficiently robust in terms of quantity and history. And the threshold for what constitutes “sufficiently robust” is not always obvious. Is one business cycle long enough? Is 10,000 loans enough? These questions do not have hard and fast answers.


Many questions pertaining to CECL model validations do not yet have hard and fast answers. In some cases, the answers will vary by institution as different banks adopt different policies. Industry best practices will doubtless emerge in response to others. For the rest, model validators will need to rely on judgment, sometimes having to balance statistical principles with accounting policy realities. The first CECL model validations are around the corner. It’s not too early to begin thinking about how to address these questions.

Applying Machine Learning to Conventional Model Validations

In addition to transforming the way in which financial institutions approach predictive modeling, machine learning techniques are beginning to find their way into how model validators assess conventional, non-machine-learning predictive models. While the array of standard statistical techniques available for validating predictive models remains impressive, the advent of machine learning technology has opened new avenues of possibility for expanding the rigor and depth of insight that can be gained in the course of model validation. In this blog post, we explore how machine learning, in some circumstances, can supplement a model validator’s efforts related to:

  • Outlier detection on model estimation data
  • Clustering of data to better understand model accuracy
  • Feature selection methods to determine the appropriateness of independent variables
  • The use of machine learning algorithms for benchmarking
  • Machine learning techniques for sensitivity analysis and stress testing



Outlier Detection

Conventional model validations include, when practical, an assessment of the dataset from which the model is derived. (This is not always practical—or even possible—when it comes to proprietary, third-party vendor models.) Regardless of a model’s design and purpose, virtually every validation concerns itself with at least a cursory review of where these data are coming from, whether their source is reliable, how they are aggregated, and how they figure into the analysis.

Conventional model validation techniques sometimes overlook (or fail to look deeply enough at) the question of whether the data population used to estimate the model is problematic. Outliers—and the effect they may be having on model estimation—can be difficult to detect using conventional means. Developing descriptive statistics and identifying data points that are one, two, or three standard deviations from the mean (i.e., extreme value analysis) is a straightforward enough exercise, but this does not necessarily tell a modeler (or a model validator) which data points should be excluded.

Machine learning modelers use a variety of proximity and projection methods for filtering outliers from their training data. One proximity method employs the K-means algorithm, which groups data into clusters centered around defined “centroids,” and then identifies data points that do not appear to belong to any particular cluster. Common projection methods include multi-dimensional scaling, which allows analysts to view multi-dimensional relationships among multiple data points in just two or three dimensions. Sophisticated model validators can apply these techniques to identify dataset problems that modelers may have overlooked.


Data Clustering

The tendency of data to cluster presents another opportunity for model validators. Machine learning techniques can be applied to determine the relative compactness of individual clusters and how distinct individual clusters are from one another. Clusters that do not appear well defined and blur into one another are evidence of a potentially problematic dataset—one that may result in non-existent patterns being identified in random data. Such clustering could be the basis of any number of model validation findings.



Feature (Variable) Selection

What conventional predictive modelers typically refer to as variables are commonly referred to by machine learning modelers as features. Features and variables serve essentially the same function, but the way in which they are selected can differ. Conventional modelers tend to select variables using a combination of expert judgment and statistical techniques. Machine learning modelers tend to take a more systematic approach that includes stepwise procedures, criterion-based procedures, lasso and ridge regresssion and dimensionality reduction. These methods are designed to ensure that machine learning models achieve their objectives in the simplest way possible, using the fewest possible number of features, and avoiding redundancy. Because model validators frequently encounter black-box applications, directing applying these techniques is not always possible. In some limited circumstances, however, model validators can add to the robustness of their validations by applying machine learning feature selection methods to determine whether conventionally selected model variables resemble those selected by these more advanced means (and if not, why not).


Benchmarking Applications

Identifying and applying an appropriate benchmarking model can be challenging for model validators. Commercially available alternatives are often difficult to (cost effectively) obtain, and building challenger models from scratch can be time-consuming and problematic—particularly when all they do is replicate what the model in question is doing.

While not always feasible, building a machine learning model using the same data that was used to build a conventionally designed predictive model presents a “gold standard” benchmarking opportunity for assessing the conventionally developed model’s outputs. Where significant differences are noted, model validators can investigate the extent to which differences are driven by data/outlier omission, feature/variable selection, or other factors.


 Sensitivity Analysis and Stress Testing

The sheer quantity of high-dimensional data very large banks need to process in order to develop their stress testing models makes conventional statistical analysis both computationally expensive and problematic. (This is sometimes referred to as the “curse of dimensionality.”) Machine learning feature selection techniques, described above, are frequently useful in determining whether variables selected for stress testing models are justifiable.

Similarly, machine learning techniques can be employed to isolate, in a systematic way, those variables to which any predictive model is most and least sensitive. Model validators can use this information to quickly ascertain whether these sensitivities are appropriate. A validator, for example, may want to take a closer look at a credit model that is revealed to be more sensitive to, say, zip code, than it is to credit score, debt-to-income ratio, loan-to-value ratio, or any other individual variable or combination of variables. Machine learning techniques make it possible for a model validator to assess a model’s relative sensitivity to virtually any combination of features and make appropriate judgments.



Model validators have many tools at their disposal for assessing the conceptual soundness, theory, and reliability of conventionally developed predictive models. Machine learning is not a substitute for these, but its techniques offer a variety of ways of supplementing traditional model validation approaches and can provide validators with additional tools for ensuring that models are adequately supported by the data that underlies them.

Applying Model Validation Principles to Machine Learning Models

Machine learning models pose a unique set of challenges to model validators. While exponential increases in the availability of data, computational power, and algorithmic sophistication in recent years has enabled banks and other firms to increasingly derive actionable insights from machine learning methods, the significant complexity of these systems introduces new dimensions of risk.

When appropriately implemented, machine learning models greatly improve the accuracy of predictions that are vital to the risk management decisions financial institutions make. The price of this accuracy, however, is complexity and, at times, a lack of transparency. Consequently, machine learning models must be particularly well maintained and their assumptions thoroughly understood and vetted in order to prevent wildly inaccurate predictions. While maintenance remains primarily the responsibility of the model owner and the first line of defense, second-line model validators increasingly must be able to understand machine learning principles well enough to devise effective challenge that includes:

  • Analysis of model estimation data to determine the suitability of the machine learning algorithm
  • Assessment of space and time complexity constraints that inform model training time and scalability
  • Review of model training/testing procedure
  • Determination of whether model hyperparameters are appropriate
  • Calculation of metrics for determining model accuracy and robustness

More than one way exists of organizing these considerations along the three pillars of model validation. Here is how we have come to think about it.


Conceptual Soundness

Many of the concepts of reviewing model theory that govern conventional model validations apply equally well to machine learning models. The question of “business fit” and whether the variables the model lands on are reasonable is just as valid when the variables are selected by a machine as it is when they are selected by a human analyst. Assessing the variable selection process “qualitatively” (does it make sense?) as well as quantitatively (measuring goodness of fit by calculating residual errors, among other tests) takes on particular importance when it comes to machine learning models.

Machine learning does not relieve validators of their responsibility assess the statistical soundness of a model’s data. Machine learning models are not immune to data issues. Validators protect against these by running routine distribution, collinearity, and related tests on model datasets. They must also ensure that the population has been appropriately and reasonably divided into training and holdout/test datasets.

Supplementing these statistical tests should be a thorough assessment of the modeler’s data preparation procedures. In addition to evaluating the ETL process—a common component of all model validations—effective validations of machine learning models take particular notice of variable “scaling” methods. Scaling is important to machine learning algorithms because they generally do not take units into account. Consequently, a machine learning model that relies on borrower income (generally ranging between tens of thousands and hundreds of thousands of dollars), borrower credit score (which generally falls within a range of a few hundred points) and loan-to-value ratio (expressed as a percentage), needs to apply scaling factors to normalize these ranges in order for the model to correctly process each variable’s relative importance. Validators should ensure that scaling and normalizations are reasonable.

Model assumptions, when it comes to machine learning validation, are most frequently addressed by looking at the selection, optimization, and tuning of the model’s hyperparameters. Validators must determine whether the selection/identification process undertaken by the modeler (be it grid search, random search, Bayesian Optimization, or another method—see this blog post for a concise summary of these) is conceptually sound.


Process Verification

Machine learning models are no more immune to overfitting and underfitting (the bias-variance dilemma) than are conventionally developed predictive models. An overfitted model may perform well on the in-sample data, but predict poorly on the out-of-sample data. Complex nonparametric and nonlinear methods used in machine learning algorithms combined with high computing power are likely to contribute to an overfitted machine learning model. An underfitted model, on the other hand, performs poorly in general, mainly due to an overly simplified model algorithm that does a poor job at interpreting the information contained within data.

Cross-validation is a popular technique for detecting and preventing the fitting or “generalization capability” issues in machine learning. In K-Fold cross-validation, the training data is partitioned into K subsets. The model is trained on all training data except the Kth subset, and the Kth subset is used to validate the performance. The model’s generalization capability is low if the accuracy ratios are consistently low (underfitted) or higher on the training set but lower on the validation set (overfitted). Conventional models, such as regression analysis, can be used to benchmark performance.


Outcomes Analysis

Outcomes analysis enables validators to verify the appropriateness of the model’s performance measure methods. Performance measures (or “scoring methods”) are typically specialized to the algorithm type, such as classification and clustering. Validators can try different scoring methods to test and understand the model’s performance. Sensitivity analyses can be performed on the algorithms, hyperparameters, and seed parameters. Since there is no right or wrong answer, validators should focus on the dispersion of the sensitivity results.


Many statistical tactics commonly used to validate conventional models apply equally well to machine learning models. One notable omission is the ability to precisely replicate the model’s outputs. Unlike with an OLS or ARIMA model, for which a validator can reasonably expect to be able to match the model’s coefficients exactly if given the same data, machine learning models can be tested only indirectly—by testing the conceptual soundness of the selected features and assumptions (hyperparameters) and by evaluating the process and outputs. Applying model validation tactics specially tailored to machine learning models allows financial institutions to deploy these powerful tools with greater confidence by demonstrating that they are of sound conceptual design and perform as expected.




Security & Compliance

Get Started