Linkedin    Twitter   Facebook

Get Started
Get a Demo


Articles Tagged with: Model Validation

Validating Vendor Models

RiskSpan validates a diverse range of models, including many that have been developed by third-party vendors. Vendor models present unique implications when it comes to model risk management (MRM). In this article, we seek to describe how we align our approach to validating these models with existing regulatory guidance and provide an explanation of what financial institutions should expect when it comes time to validate their vendor models.  

Our clients use third-party vendor models that touch on virtually every risk function. The most common ones include: 

  • Anti-money laundering (AML) solutions for Suspicious Activity Monitoring (SAM) and Customer Due Diligence (CDD). 
  • Asset-Liability Management models that simulate the whole balance sheet under different interest rate scenarios to provide analytics for interest rate risk monitoring. 
  • Structured assets and mortgage loan analytics platforms (similar to RiskSpan’s Edge Platform). 
  • Mortgage pipeline management platforms, including loan pricing, best execution determination, analytics, and trade tracking. 
  • Climate risk models that quantify the risk associated with the future effects of climate change on assets at different locations. 
  • Artificial intelligence (AI) platforms that help model developers optimize the machine learning (ML) algorithm, feature selection, and hyperparameter tuning automatically with the target performance metric. 

Vendor Models and MRM Considerations

Regardless of whether a model is fully home grown or a “black box” purchased from a third-party vendor, the same basic MRM principles apply. Banks are expected to validate their own use of vendor products [OCC 2011-12, p.15] and thus institutions should understand the specifics of vendor models that pose model risk and require considerations for validation. The following table outlines specific risks that vendor models pose, along with mitigating considerations and strategies model risk managers should consider. 

SpecificsDescriptionMRM and Validation Implications
Complexity Some vendor models offer many functionalities and sub-models dedicated to different tasks. These various models are often highly integrated into the client’s internal systems and databases. Well-crafted model documentation is important to make the validation efficient. Validation requires more time since all model functionalities and components must be mapped.
Specialized ExpertiseVendor models are often developed based on accumulated know-how of a specific field of study. Validation requires professionals with specific field of study experience and who understand the model in relation to industry standards.
Regulatory Requirements and Compliance Many models need to comply with existing regulations (ex: fair lending in credit scoring) or are implemented to ensure compliance (BSA/AML and the PATRIOT Act).Validation requires expertise in specific regulatory compliance.
Opaque design, assumptions, and imitations Vendors usually do not provide code for review and some aspects of the model may be based on proprietary research or data. Banks should require the vendor to provide developmental evidence explaining the product components, design, and intended use, to determine whether the model is appropriate for the bank’s products, exposures, and risks. They should also clearly indicate the model’s limitations and assumptions and where the product’s use may be problematic. [OCC 2011-12, pp. 15-16].
Vague or incomplete documentation from the Vendor Often in the name of protecting IP, model documentation provided by the vendor may be vague or incomplete.Banks should ensure that appropriate documentation of the third-party approach is available so that the model can be appropriately validated [OCC 2011-12, p.21].

Institutions must also develop their own internal documentation that describes the intended use of the model, lists all inputs and outputs, lists model assumptions and limitations, and summarizes all relevant information about the model provided by the vendor such as model design, methodology, etc.
Limited Model Testing Model Testing is critical in assessing whether a model is performing as intended.

However, vendors may not provide detailed results of their thorough testing of model performance, outcomes, sensitivity, assumptions appropriateness, or the results of ongoing monitoring.

Moreover, there are usually limited possibilities to perform testing by the client or the validator since many parts of the model are proprietary.
Vendors should provide appropriate testing results demonstrating that the model works as expected. Banks should expect vendors to conduct ongoing performance monitoring and outcomes analysis [OCC 2011-12, pp. 15-16]. A bank also should conduct ongoing monitoring and outcomes analysis of vendor model performance using the bank’s own outcomes [OCC 2011-12, pp. 15-16].

Validation should consist of a review of the testing results provided by the vendor and of any additional testing that is feasible and practical. This usually includes analysis of outcomes and benchmarking, sometimes also manual replication, sensitivity analysis, or stress testing. Benchmarking may, however, be limited due to the uniqueness or complexity of the model, or because proprietary data were used for development.
CustomizationOut-of-the-box solutions often need to be customized to meet the internal systems, policies, and specific intended use of a particular institution.

A bank’s customization choices should be documented and justified as part of the validation [OCC 2011-12, p.15].
External DataVendor models often rely on external input data or external data used for its development. An important part of any validation is to determine all input data sources and assess the quality, completeness, and appropriateness of the data.

OCC 2011-12, p. 16, states that banks should obtain information regarding the data used to develop the model and assess the extent to which that data is representative of the bank’s situation.

OCC 2011-12, p.6, stresses that a rigorous review is particularly important for external data and information (from a vendor or outside party), especially as they relate to new products, instruments, or activities.

Moreover, AB-2022-03, p.3, states that regulated entities should map their external dependencies to significant internal systems and processes to determine their systemic dependencies and interconnections. In particular, the regulated entities should have an inventory of key dependencies on externally sourced models, data, software, and cloud providers. This inventory should be regularly updated and reviewed by senior management and presented to the board of directors, as deemed appropriate.
Reliance on Vendor’s Support Since the access to the code and implementation details is limited for vendor models, ongoing servicing and support is necessary.Roles and responsibilities around the model should be defined and the bank’s point of contact with their vendor should not rely solely on one person. It is also critical that the bank has in-house knowledge, in case the vendor or the bank terminates the contract for any reason, or if the vendor goes out of business or otherwise ceases to support the model [OCC 2011-12, p. 16].

Validation Approach

Validation of vendor models follows the same general principles as validation of any other model. These principles are laid out in regulatory guidance. This guidance, along with general MRM principles, provides details specifically about model risk management related to vendor models and specifically addresses vendor and other third-party products. Based on these guidelines and our experience validating numerous vendor models, RiskSpan’s approach includes the following:

  • Request documents and access to:
    • internal model documentation,
    • vendor documentation and user manual,
    • implementation documentation with a description of any customizations to the model (see Customization point in the section above), 
    • performance testing conducted by the model owner or vendor,
    • vendor certifications,
    • the model interface, if applicable, to conduct independent testing. 
  • Documentation review: We review both the internal documentation and vendor documentation and assess its thoroughness and completeness. According to OCC 11-12, p.21, documentation should be sufficiently detailed so that parties unfamiliar with a model can understand how the model operates, its limitations, and its key assumptions. For internal documentation, we focus on the statement of model purpose, list of inputs and their sources, documentation of assumptions and limitation, description of outputs and their use, controls and governance, and any testing conducted internally. We also review the documentation of the customizations made to the vendor model. 
  • Conceptual soundness review: Combining information from both the internal and vendor documentation, information from the model owner, and the industry expertise of our SMEs, we assess whether the model meets the stated model purpose, as well as whether the design, underlying theory, and logic are justifiable and supportable by existing research and industry standards. We also critically assess all known model assumptions and limitations and possibly identify additional assumptions that might be hidden or limitations that were not documented.  
  • Data review: We aim to identify all data inputs, their sources, and controls related to gathering, loading, and quality of data. We also assess the quality of data by performing exploratory data analysis. Assessing development data is often not possible as the data are proprietary to the vendor. 
  • Independent testing: To supplement, update, or verify the testing performed by the vendor, we perform internal testing where applicable. Typically, different models allow different testing methods but permission to access model interfaces is often needed for validators. This is also acknowledged in OCC 11-12, p.15: External models may not allow full access to computer coding and implementation details, so the bank may have to rely more on sensitivity analysis and benchmarking. The following are the testing methods we often use to devise effective challenges for specific models in our practice:
    • AML systems for transaction monitoring and customer due diligence: manual replication for a sample of customers/alerts, exploratory data analysis, outcomes analysis  
    • Asset-Liability Management models: outcomes analysis and review of reporting, sensitivity analysis and stress testing 
    • Loan pricing models: manual replication, outcomes analysis, sensitivity analysis, stress testing, benchmarking to RS Edge 
    • Climate risk models that quantify the risk associated with the future effects of climate change on assets at different locations: Outcomes analysis, benchmarking to online services with open access such as National Risk Index, ClimateCheck, and Risk Factor. 
    • ML AI system: outcome analysis based on the performance metrics, manual replication of the final model in Python, benchmarking with the alternative algorithm. 
  • Ongoing monitoring review: As explained in the previous section, vendors are expected to conduct ongoing monitoring of their models, but banks should monitor their own outcome as well. Our review thus consists of an assessment of the client’s ongoing monitoring plan as well as the results of both the client’s and vendor’s monitoring results. When the validated model does not produce predictions or estimations such as AML models, the ongoing monitoring typically consists of periodical revalidations and data quality monitoring. 
  • Governance review: We review the client’s policies, roles, and responsibilities defined for the model. We also investigate whether a contingency plan is in place for instances when the vendor is no longer supporting the model. We also typically investigate and assess controls around the model’s access and use. 
  • Compliance review: If a model is implemented to make the institution compliant to certain regulations (BSA/AML, PATRIOT Act) or the model itself must comply to regulations, we conduct a compliance review with the assistance of subject matter experts (SMEs) who possess industry experience. This review is conducted to verify that the model and its implementation align with the regulatory requirements and standards set forth by the relevant authorities. The expertise of the SMEs helps ensure that the model effectively addresses compliance concerns and operates within the legal and ethical boundaries of the industry. 

Project Management Considerations

In order for validation projects to be successful, a strong project management discipline must be followed to ensure it is completed on schedule, within budget and meets all key stakeholder objectives. In addition to adapting our validation approach, we thus also take our project management approach into consideration. For vendor model validation projects, we specifically follow these principles: 

  • Schedule a periodical status meeting: We typically hold weekly meetings with the client’s MRM to communicate the status of the validation, align client’s expectation, discuss observations, and address any concerns. Since vendor models are often complex, these meetings also serve as a place to discuss any road blockers such as access to the model’s UI, shared folders, database, etc. 
  • Schedule a model walkthrough session with the model owner: Vendor models are often complex and the client may use only specific components/functionalities. The most efficient way to understand the big picture and the particular way the model is used proved to be a live (typically remote) session with the model owner. Asking targeted questions right at the beginning of the engagement helps us to quickly get grasp of the critical areas to focus on during the validation. 
  • Establish a communication channel with the model owner: Be it direct messages or emails sent to and forwarded by the client’s MRM, it is important to be in touch with the model owner as not every detail may be documented. 


Vendor models pose unique risks and challenges for MRM and validation. Taking additional steps to mitigate these risks is vital to ensuring a well-functioning MRM program. An effective model validation approach takes these unique considerations into account and directly applies guidelines related specifically to validation of vendor models outlined in SR 11-7 (OCC 11-12). Effectively carrying out this type of testing often requires making adjustments to the management of vendor model validation projects. 


OCC 2011-12, p.6: The data and other information used to develop a model are of critical importance; there should be rigorous assessment of data quality and relevance, and appropriate documentation. Developers should be able to demonstrate that such data and information are suitable for the model and that they are consistent with the theory behind the approach and with the chosen methodology. If data proxies are used, they should be carefully identified, justified, and documented. If data and information are not representative of the bank’s portfolio or other characteristics, or if assumptions are made to adjust the data and information, these factors should be properly tracked and analyzed so that users are aware of potential limitations. This is particularly important for external data and information (from a vendor or outside party), especially as they relate to new products, instruments, or activities. 

OCC 2011-12, p.9: All model components, including input, processing, and reporting, should be subject to validation; this applies equally to models developed in-house and to those purchased from or developed by vendors or consultants. The rigor and sophistication of validation should be commensurate with the bank’s overall use of models, the complexity and materiality of its models, and the size and complexity of the bank’s operations. 

OCC 2011-12, p.12: Many of the tests employed as part of model development should be included in ongoing monitoring and be conducted on a regular basis to incorporate additional information as it becomes available. New empirical evidence or theoretical research may suggest the need to modify or even replace original methods. Analysis of the integrity and applicability of internal and external information sources, including information provided by third-party vendors, should be performed regularly. 

A whole section in OCC 2011-12 dedicated to validation of vendor models on pp. 15-16: 

Validation of Vendor and Other Third-Party Products  

The widespread use of vendor and other third-party products—including data, parameter values, and complete models—poses unique challenges for validation and other model risk management activities because the modeling expertise is external to the user and because some components are considered proprietary. Vendor products should nevertheless be incorporated into a bank’s broader model risk management framework following the same principles as applied to in-house models, although the process may be somewhat modified. 

As a first step, banks should ensure that there are appropriate processes in place for selecting vendor models. Banks should require the vendor to provide developmental evidence explaining the product components, design, and intended use, to determine whether the model is appropriate for the bank’s products, exposures, and risks. Vendors should provide appropriate testing results that show their product works as expected. They should also clearly indicate the model’s limitations and assumptions and where use of the product may be problematic. Banks should expect vendors to conduct ongoing performance monitoring and outcomes analysis, with disclosure to their clients, and to make appropriate modifications and updates over time. Banks are expected to validate their own use of vendor products. External models may not allow full access to computer coding and implementation details, so the bank may have to rely more on sensitivity analysis and benchmarking. Vendor models are often designed to provide a range of capabilities and so may need to be customized by a bank for its particular circumstances. A bank’s customization choices should be documented and justified as part of validation. If vendors provide input data or assumptions, or use them to build models, their relevance to the bank’s situation should be investigated. Banks should obtain information regarding the data used to develop the model and assess the extent to which that data is representative of the bank’s situation. The bank also should conduct ongoing monitoring and outcomes analysis of vendor model performance using the bank’s own outcomes. Systematic procedures for validation help the bank to understand the vendor product and its capabilities, applicability, and limitations. Such detailed knowledge is necessary for basic controls of bank operations. It is also very important for the bank to have as much knowledge in-house as possible, in case the vendor or the bank terminates the contract for any reason, or the vendor is no longer in business. Banks should have contingency plans for instances when the vendor model is no longer available or cannot be supported by the vendor. 

OCC 2011-12, p.17: Policies should emphasize testing and analysis, and promote the development of targets for model accuracy, standards for acceptable levels of discrepancies, and procedures for review of and response to unacceptable discrepancies. They should include a description of the processes used to select and retain vendor models, including the people who should be involved in such decisions. 

OCC 2011-12, p.21, Documentation: For cases in which a bank uses models from a vendor or other third party, it should ensure that appropriate documentation of the third-party approach is available so that the model can be appropriately validated. 

AB 2022-03, p.3: Since the publication of AB 2013-07, FHFA has observed a wider adoption of technologies in the mortgage industry. Many of these technologies reside externally to the regulated entities and are largely outside of the regulated entities’ control. Examples of such technologies are cloud servers, vendor models, and external data used by the regulated entities as inputs for their models. Although FHFA has published guidance related to externally sourced technologies, such as AB 2018-04: Cloud Computing Risk Management (Aug. 14, 2018) and AB 2018-08: Oversight of Third-Party Provider Relationships (Sept. 28, 2018), FHFA expects the regulated entities to take a macro-prudential view of the risks posed by externally sourced data and technologies. The regulated entities should map their external dependencies to significant internal systems and processes to determine their systemic dependencies and interconnections. In particular, the regulated entities should have an inventory of key dependencies on externally sourced models, data, software, and cloud providers. This inventory should be regularly updated and reviewed by senior management and presented to the board of directors, as deemed appropriate. 

AB-2022-03, p.3: The regulated entities should map their external dependencies to significant internal systems and processes to determine their systemic dependencies and interconnections. In particular, the regulated entities should have an inventory of key dependencies on externally sourced models, data, software, and cloud providers. This inventory should be regularly updated and reviewed by senior management and presented to the board of directors, as deemed appropriate. 

AB 2022-03, p.5: When using an external vendor to complete an independent model validation, the regulated entity’s model validation group is accountable for the quality, recommendations, and opinions of any third-party review. When evaluating a third-party model validation, a regulated entity should implement model risk management policies and practices that align the vendor-completed specific standards for an independent validation with the specific standards included in AB 2013-07. 

Automating Compliance Risk Analytics

 Recorded: August 4th | 1:00 p.m. EDT

Completing the risk sections of Form PF, AIFMD, Open Protocol and other regulatory filings requires submitters to first compute an extensive battery of risk analytics, often across a wide spectrum of trading strategies and instrument types. This “pre-work” is both painstaking and prone to human error. Automating these upstream analytics greatly simplifies life downstream for those tasked with completing these filings.

RiskSpan’s Marty Kindler walks through a process for streamlining delta equivalent exposure, 10 year bond equivalent exposure, DV01/CS01, option greeks, stress scenario impacts and VaR in support not only of downstream regulatory filings but of an enhanced, overall risk management regime.

Featured Speaker

Martin Kindler

Managing Director, RiskSpan

Is Your Enterprise Risk Management Keeping Up with Recent Regulatory Changes?

Recorded: June 30th | 1:00 p.m. EDT

Nick Young, Head of RiskSpan’s Model Risk Management Practice, and his team of model validation analysts walk through the most important regulatory updates of the past 18 months from the Federal Reserve, OCC, and FDIC pertaining to enterprise risk management in general (and model risk management in particular).

Nick’s team present tips for ensuring that your policies and practices are keeping up with recent changes to AML and other regulatory requirements.

Featured Speakers

Nick Young

Head of Model Risk Management, RiskSpan

Three Principles for Effectively Monitoring Machine Learning Models

The recent proliferation in machine learning models in banking and structured finance is becoming impossible to ignore. Rarely does a week pass without a client approaching us to discuss the development or validation (or both) of a model that leverages at least one machine learning technique. RiskSpan’s own model development team has also been swept up in the trend – deep learning techniques have featured prominently in developing the past several versions of our in-house residential mortgage prepayment model.  

Machine learning’s rise in popularity is attributable to multiple underlying trends: 

  1. Quantity and complexity of data. Nowadays, firms store every conceivable type of data relating to their activities and clients – and frequently supplement this with data from any number of third-party providers. The increasing dimensionality of data available to modelers makes traditional statistical variable selection more difficult. The tradeoff between a model’s complexity and the rules adapted in variable selection can be hard to balance. An advantage of ML approaches is that they can handle multi-dimensional data more efficiently. ML frameworks are good at identifying trends and patterns – without the need for human intervention. 
  2. Better learning algorithms. Because ML algorithms learn to make more accurate projections as new data is introduced to the framework (assuming there is no data bias in the new data) model features based on newly introduced data are more likely to resemble features created using model training data.  
  3. Cheap computation costsNew techniques, such as XGBoost, are designed to be memory efficient. It introduces an innovated system design that helps in reducing the computation cost. 
  4. Proliferation breeds proliferation. As the number of machine learning packages in various programming tools increases, it facilitates implementation and promotes further ML model development. 

Addressing Monitoring Challenges 

Notwithstanding these advances, machine learning models are by no means easy to build and maintain. Feature engineering and parameter tuning procedures are time consuming. And once a ML model has been put into production, monitoring activities must be implemented to detect anomalies to make sure the model works as expected (just like with any other model). According to the OCC 2011-12 supervisory guidance on the model risk management, ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid. While monitoring ML models resembles monitoring conventional statistical models in many respects, the following activities take on particular importance with ML model monitoring: 

  1. Review the underlying business problem. Defining the business problem is the first step in developing any ML model. This should be carefully articulated in the list of business requirements that the ML model is supposed to follow. Any shift in the underlying business problem will likely create drift in the training data and, as a result, new data coming to the model may no longer be relevant to the original business problem. The ML model becomes degraded and the new process of feature engineering and parameter tuning needs to be considered to remediate the impact. This review should be conducted whenever the underlying problem or requirements change. 
  2.  Review of data stability (model input). In the real world, even if the underlying business problem is unchanged, there might be shifts in the predicting data caused by changing borrower behaviors, changes in product offerings, or any other unexpected market drift. Any of these things could result in the ML model receiving data that it has not been trained on. Model developers should measure the data population stability between the training dataset and the predicting dataset. If there is evidence of the data having shifted, model recalibration should be considered. This assessment should be done when the model user identifies significant shift in the model’s performance or when a new testing dataset is introduced to the ML model. Where data segmentation has been used in the model development process, this assessment should be performed at the individual segment level, as well. 
  3. Review of performance metrics (model output). Performance metrics quantify how well an ML model is trained to explain the data. Performance metrics should fit the model’s type. For instance, the developer of a binary classification model could use Kolmogorov-Smirnov (KS) table, receiver operating characteristic (ROC) curve, and area under the curve (AUC) to measure the model’s overall rank order ability and its performance at different cutoffs. Any shift (upward or downward) in performance metrics between a new dataset and the training dataset should raise a flag in monitoring activity. All material shifts need to be reviewed by the model developer to determine their cause. Such assessments should be conducted on an annual basis or whenever new data is available. 

Like all models, ML models are only as good as the data they are fed. But ML models are particularly susceptible to data shifts because their processing components are less transparent. Taking these steps to ensure they are learning based on valid and consistent data are essential to managing a functional inventory of ML models. 

Value Beyond Validation: The Future of Automated Continuous Model Monitoring Has Arrived

Imagine the peace of mind that would accompany being able to hand an existing model over to the validators with complete confidence in how the outcomes analysis will turn out. Now imagine being able to do this using a fully automated process.

The industry is closer to this than you might think.

The evolution of ongoing model monitoring away from something that happens only periodically (or, worse, only at validation time) and toward a more continuous process has been underway for some time. Now, thanks to automation and advanced process design, this evolutionary process has reached an inflection point. We stand today at the threshold of a future where:

  • Manual, painful processes to generate testing results for validation are a thing of the past;
  • Models are continuously monitored for fit, and end users are empowered with the tools to fully grasp model strengths and weaknesses;
  • Modeling and MRM experts leverage machine learning to dive more deeply into the model’s underlying data, and;
  • Emerging trends and issues are identified early enough to be addressed before they have time to significantly hamper model performance.

Sound too good to be true? Beginning with its own internally developed prepayment and credit models, RiskSpan data scientists are laying out a framework for automated, ongoing performance monitoring that has the potential to transform behavioral modeling (and model validation) across the industry.

The framework involves model owners working collaboratively with model validators to create recurring processes for running previously agreed-upon tests continuously and receiving the results automatically. Testing outcomes continuously increases confidence in their reliability. Testing them automatically frees up high-cost modeling and validation resources to spend more time evaluating results and running additional, deeper analyses.

The Process:

Irrespective of the regulator, back-testing, benchmarking, and sensitivity analysis are the three pillars of model outcomes analysis. Automating the data and analytical processes that underlie these three elements is required to get to a fully comprehensive automated ongoing monitoring scheme.

In order to be useful, the process must stage testing results in a central database that can:

  • Automatically generate charts, tables, and statistical tests to populate validation reports;
  • Support dashboard reporting that allows model owners, users and validators to explore test results, and;
  • Feed advanced analytics and machine learning platforms capable of 1) helping with automated model calibration, and 2) identifying model weaknesses and blind spots (as we did with a GSE here).

Perhaps not surprisingly, achieving the back-end economies of a fully automated continuous monitoring and reporting regime requires an upfront investment of resources. This investment takes the form of time from model developers and owners as well as (potentially) some capital investment in technology necessary to host and manage the storage of results and output reports.

A good rule of thumb for estimating these upfront costs is between 2 and 3 times the cost of a single annual model test performed on an ad-hoc, manual basis. Consequently, the automation process can generally be expected to pay for itself (in time savings alone) over 2 to 3 cycles of performance testing. But the benefits of automated, continuous model monitoring go far beyond time savings. They invariably result in better models.

Output Applications

Continuous model monitoring produces benefits that extend well beyond satisfying model governance requirements. Indeed, automated monitoring has significantly informed the development process for RiskSpan’s own, internally developed credit and prepayment models – specifically in helping to identify sub-populations where model fit is a problem.

Continuous monitoring also makes it possible to quickly assess the value of newly available data elements. For example, when the GSEs start releasing data on mortgages with property inspection waivers (PIWs) (as opposed to traditional appraisals) we can immediately combine that data element with the results of our automated back-testing to determine whether the PIW information can help predict model error from those results. PIW currently appears to have value in predicting our production model error, and so the PIW feature is now slated to be added to a future version of our model. Having an automated framework in place accelerates this process while also enabling us to proceed with confidence that we are only adding variables that improve model performance.

The continuous monitoring results can also be used to develop helpful dashboard reports. These provide model owners and users with deeper insights into a model’s strengths and weaknesses and can be an important tool in model tuning. They can also be shared with model validators, thus facilitating that process as well.

The dashboard below is designed to give our model developers and users a better sense of where model error is greatest. Sub-populations with the highest model error are deep red. This makes it easy for model developers to visualize that the model does not perform well when FICO and LTV data are missing, which happens often in the non-agency space. The model developers now know that they need to adjust their modeling approach when these key data elements are not available.

The dashboard also makes it easy to spot performance disparities by shelf, for example, and can be used as the basis for applying prepayment multipliers to certain shelves in order to align results with actual experience.

Continuous model monitoring is fast becoming a regulatory expectation and an increasingly vital component of model governance. But the benefits of continuous performance monitoring go far beyond satisfying auditors and regulators. Machine learning and other advanced analytics are also proving to be invaluable tools for better understanding model error within sub-spaces of the population.

Watch this space for a forthcoming post and webinar explaining how RiskSpan leverages its automated model back-testing results and machine learning platform, Edge Studio, to streamline the calibration process for its internally developed residential mortgage prepayment model.

Validating Structured Finance Models

Introduction: Structured Finance Models

Models used to govern the valuation and risk management of structured finance instruments take a variety of forms. Unlike conventional equity investments, structured finance instruments are often customized to meet the unique needs of specific investors. They are tailored to mitigate various types of risks, including interest rate risk, credit risk, market risk and counterparty risks. Therefore, structured finance instruments may be derived from a compilation of loans, stocks, indices, or derivatives. Mortgage-backed securities (MBS) are the most ubiquitous example of this, but structured finance instruments also include:

  • Derivatives
  • Collateralized Mortgage Obligations (CMO)
  • Collateralized Bond Obligations (CBO)
  • Collateralized Debt Obligations (CDO)
  • Credit Default Swaps (CDS)
  • Hybrid Securities

Pricing and measuring the risk of these instruments is typically carried out using an integrated web of models. One set of models might be used to derive a price based on discounted cash flows. Once cash flows and corresponding discounting factors have been established, other models might be used to compute risk metrics (duration and convexity) and financial metrics (NII, etc.).

These models can be grouped into three major categories:

  • Curve Builder and Rate Models: Market rates are fundamental to valuing most structured finance instruments. Curve builders calibrate market curves (treasury yield curve, Libor/Swap Rate curve, or SOFR curve) using the market prices of the underlying bond, future, or swap. Interest rate models take the market curve as an input and generate simulated rate paths as the future evolution of the selected type of the market curve.

  • Projection Models: Using the market curve (or the single simulated rate path), a current coupon projection model projects forward 30-year and 15-year fixed mortgage rates. Macroeconomic models project future home values using a housing-price index (HPI). Prepayment models estimate how quickly loans are likely to pay down based on mortgage rate projections and other macroeconomic projections. And roll-rate models forecast the probability of a loan’s transitioning from one current/default state to another.

  • Cash Flow Models and Risk Metrics: Cash flow models combine the deal information of the underlying structured instrument with related rate projections to derive an interest-rate-path-dependent cash flow.

The following illustrates how the standard discounted cash flow approach works for a mortgage-related structured finance instrument:

Most well-known analytic solutions apply this discounted cash flow approach, or some adaptation of it, in analyzing structured finance instruments.

Derivatives introduce an additional layer of complexity that often calls for approaches and models beyond the standard discounted cash flow approach. Swaption and interest rate cap and floors, for example, require a deterministic approach, such as the Black model. For bond option pricing, lattice models or tree structures are commonly used. The specifics of these models are beyond the scope of this presentation, but many of the general model validation principles applied to discounted cash flow models are equally applicable to derivative pricing models.

Validating Curve Builder and Rate Models

Curve Builders

Let’s begin with the example of a curve builder designed for calibrating the on-the-run U.S. Treasury yield curve. To do this, the model takes a list of eligible on-the-run Treasury bonds as the key model inputs, which serves as the fitting knots[1]. A proper interpolator that connects all the fitting knots is then used to smooth the curve and generate monthly or quarterly rates for all maturities up to 30 years. If abnormal increments or decrements are observed in the calibrated yield curve, adjustments are made to alleviate deviations between the fitting knots until the fitted yield curve is stable and smooth. A model validation report should include a thorough conceptual review of how the model carries out this task.

Based on the market-traded securities selected, the curve builder is able to generate an on or off-the-run Treasury yield as well as LIBOR swap curve SOFR curve, or whatever is needed. The curve builder serves as the basis for measuring nominal and option‐adjusted spreads for many types of securities and for applying spreads whenever spread is used to determine model price.

A curve builder’s inputs are therefore a set of market-traded securities. To validate the inputs, we take the market price of the fitting knots for three month-end trading dates and compare them against the market price inputs used in the curve builder. We then calibrate the par rate and spot rate based on the retrieved market price and compare it with the fitted curve generated from the curve builder.

To validate curve builder’s model structure and development, we check the internal transition between the model-provided par rate, spot rate and forward rate on three month-end trading dates. Different compounding frequencies can significantly impact these transitions. We also review the model’s assumptions, limitations and governance activities established by the model owner.

Validating model outputs usually begins by benchmarking the outputs against a similar curve provided by Bloomberg or another reputable challenger system. Next, we perform a sensitivity analysis to check the locality and stability of the forward curve by shocking the input fitting knots and analyzing its impact on the model-provided forward curve. For large shocks (i.e., 300 bp or more) we test boundary conditions, paying particular attention to the forward curve. Normally, we expect to see forwards not becoming negative, as this would breach no-arbitrage conditions.

For the scenario analysis, we test the performance of the curve builder during periods of stress and other significant events, including bond market movement dates, Federal Open Market Committee (FOMC) dates and treasury auction dates. The selected dates cover significant events for Treasury/bond markets and provide meaningful analysis for the validation.

Interest Rate Models

An interest rate model is a mathematical model that is mainly used to describe the future evolution of interest rates. Its principal output is a simulated term structure, which is the fundamental component of a Monte Carlo simulation. Interest rate models typically fall into one of two broad categories:

  • Short-rate models: A short-rate model describes the future evolution of the short rate (instantaneous spot rate, usually written).
  • LIBOR Market Model (LMM): An LMM describes the future evolution of the forward rate, usually written. Unlike the instantaneous spot rate, forward rates can be observed directly from the market, as can their implied volatility.

This blog post provides additional commentary around interest rate model validations.

Conceptual soundness and model theory reviews are conducted based on the specific interest rate model’s dynamics. The model inputs, regardless of the model structure selected, include the selected underlying curve and its corresponding volatility surface as of the testing date. We normally benchmark model inputs against market data from a challenger system and discuss any observed differences.

We then examine the model’s output, which is the set of stochastic paths comprising a variety of required spot rates or forward LIBOR and swap rates, as well as the discount factors consistent with the simulated rates. To check the non-arbitrage condition in the simulated paths, we compare the mean and median path with the underlying curve and comment on the differences. We measure the randomness from the simulated paths and compare it against the interest rate model’s volatility parameter inputs.

Based on the simulated paths, an LMM also provides calibrated ATM swaption volatility. We compare the LMM’s implied ATM swaption volatility with its inputs and the market rates from the challenger system as a review of the model calibration. For the LMM, we also compare the model against history on the correlation of forward swap rates and serial correlation of a forward LIBOR rate. An LMM allows a good choice of structures that generate realistic swap rates, whose correlation is consistent with historical value.

Validating Projection Models

Projection models come in various shapes and sizes.

“Current Coupon” Models

Current coupon models generate mortgage rate projections based on a market curve or a single simulated interest rate path. These projections are a key driver to prepayment projection models and mortgage valuation models. There are a number of model structures that can explain the current coupon projection, ranging from the simple constant-spread method to the recursive forward-simulation method. Since it has been traditionally assumed that the ten-year part of the interest rate curve drives mortgage rates, a common assumption involves holding the spread between current coupon and the ten-year swap or treasury rates constant. However, this simple and intuitive approach has a basic problem: primary market mortgage rates nowadays depend on secondary-market MBS current-coupon yields. Hence, current coupon depends not just on the ten-year part of the curve, but also on other factors that affect MBS current-coupon yields. Such factors include:

  • The shape of the yield curve
  • Tenors on the yield curve
  • Volatilities

A conceptual review of current coupon models includes a discussion around the selected method and comparisons with alternative approaches. To validate model inputs, we focus on the data transition procedures between the curve builder and current coupon model or between the interest rate model and the current coupon model. To validate model outputs, we perform a benchmarking analysis against projections from a challenger approach. We also perform back-testing to measure the differences between model projections and actual data over a testing period, normally 12 months. We use mean absolute error (MAE) to measure the back-testing results. If the MAE is less than 0.5%, we conclude that the model projection falls inside the acceptable range. For the sensitivity analysis, we examine the movements of the current coupon projection under various shock scenarios (including key-rate shocks and parallel shifting) on the rate inputs.

Prepayment Models

Prepayment models are behavioral models that help investor understand and forecast loan portfolio’s likely prepayment behavior and identify the corresponding major drivers.

The prepayment model’s modeling structure is usually econometric in nature. It assumes that the same set of drivers that affected prepayment and default behavior in the past will drive them in the future under all scenarios, even though the period in the past that is most applicable may vary by scenario in the future.

Major drivers are identified and modeled separately as a function of collateral characteristics and macroeconomic variables. Each type of prepayment effect is then scaled based on the past prepayment and default experience of similar collateral. Assumed is that if the resulting model can explain and reasonably fit historical prepayments, then it may be a good model to project the future, subject to a review of the future projections after careful assessment.

Prepayment effects normally include housing turnover, refinancing and burnout[2]. Each prepayment effect is modeled separately and then combined together. A good conceptual review of prepayment modeling methodology will discuss the mathematical fundamentals of the model, including an assessment of the development procedure for each prepayment effect and comparisons with alternative statistical approaches.

Taking for example a model that projects prepayment rates on tradable Agency mortgage collateral (or whole-loan collateral comparable to Agencies) from settlement date to maturity, development data includes the loan-level or pool-level transition data originally from Fannie Mae, Freddie Mac, Ginnie Mae and third-party servicers. Data obtained from third parties is marked as raw data. We review the data processing procedures used to get from the raw data to the development data. These procedures include reviewing data characteristics, data cleaning, data preparation and data transformation processes.

After the development data preparation, variable selection and loan segmentation become key to explaining each prepayment effect. Model developers seek to select a set of collateral attributes with clear and constant evidence of impact to the given prepayment effect. We validate the loan segmentation process by checking whether the historical prepayment rate from different loan segments demonstrates level differences based on the set of collateral attributes selected.

A prepayment model’s implementation process is normally a black box. This increases the importance of the model output review, which includes performance testing, stress testing, sensitivity analysis, benchmarking and back-testing. An appropriate set of validation tests will capture:

  • Sensitivity to collateral and borrower characteristics (loan-to-value, loan size, etc.)
  • Sensitivity to significant assumptions
  • Benchmarking of prepayment projections
  • Performance during various historical events
  • Back-testing
  • Scenario stability
  • Model projections compared with projections from dealers
  • Performance by different types of mortgages, including CMOs and TBAs

A prepayment model sensitivity analysis might take a TBA security and gradually change the value of input variables, one at a time, to isolate the impact of each variable. This procedure provides an empirical understanding of how the model performs with respect to parameter changes. If the prepayment model has customized tuning functionality, we can apply the sensitivity analysis independently to each prepayment effect by setting the other tuning parameters at zero.

For the benchmarking analysis, we compare the model’s cohort-level, short-term conditional prepayment rate (CPR) projection against other dealer publications, including Barclays and J.P. Morgan (as applicable and available). We also compare the monthly CPR projections against those of the challenger model, such as Bloomberg Agency Model (BAM), for the full stack Agency TBA and discuss the difference. Discrepancies identified during the course of a benchmarking analysis may trigger further investigation into the model’s development. However, it doesn’t necessarily mean that the underlying model is in error since the challenger model itself is simply an alternative projection. Differences might be caused by any number of factors, including different development data or modeling methodologies.

Prepayment model back-testing involves selecting a set of market-traded MBS and a set of hypothetical loan cohorts and comparing the actual monthly CPR against the projected CPR over a prescribed time window (normally one year). Thresholds should be established prior to testing and differences that exceed these thresholds should be investigated and discussed in the model validation report.

Validating Cash Flow Models and Risk Metrics

A cash flow model combines the simulated paths from interest rate, prepayment, default, and delinquency models to compute projected cash flows associated with monthly principal and interest payments.

Cash flow model inputs include the underlying instrument’s characteristics (e.g., outstanding balance, coupon rate, maturity date, day count convention, etc.) and the projected vectors associated the CPR, default rate, delinquency, and severity (if applicable). A conceptual review of a cash flow model involves a verification of the data loading procedure to ensure that the instrument’s characteristics are captured correctly within the model. It should also review the underlying mathematical formulas to verify the projected vectors are correctly applied.

Model outputs can be validated via sensitivity analysis. This often involves shocking each input variable, one at a time, and examining its resulting impacts on the monthly remaining balance. Benchmarking can be accomplished by developing a challenger model and compare the resulting cash flows.

Combining the outputs of all the sub-models, a price of the underlying structured finance instrument can be generated (and tested) along with its related risk metrics (duration, convexity, option adjusted spread, etc.).

Using MBS as an example, an option adjusted spread (OAS) analysis is commonly used. Theoretically, OAS is calibrated by matching the model price with the market price. The OAS can be viewed as a constant spread that is applied to the discounting curve when computing the model price. Because it deals with the differences between model price and market price, OAS is particularly useful in MBS valuation. It is particularly helpful in measuring prepayment risk and market risk. A comprehensive analysis reviews the following:

  • Impact of interest rate shocks on a TBA stack in terms of price, OAS, effective duration, and effective convexity.
  • Impact of projected prepayment rate shock on a TBA stack in terms of price, OAS, effective duration, and effective convexity.
  • Impact of projected prepayment rate shock on the option cost (measured as basis point, zero-volatility spread minus OAS).

Beyond OAS, the validation should include independent benchmarking of the model price. Given a sample portfolio that contains the deal information for a list of structured finance instruments, validators derive a model price using the same market rate as the subject model as a basis for comparison. Analyzing the shock profiles enables validators to conclude whether the given discounting cash flow method is generating satisfactory model performance.


Structured finance model validations are complex because they invariably involve testing a complicated array of models, sub-models, and related models. The list of potential sub-models (across all three categories discussed above) significantly exceeds the examples cited.

Validators must design validation tasks specific to each model type in order to adequately assess the risks posed by potential shortcomings associated with model inputs, structure, theory, development, outputs and governance practices.

When it comes to models governing structured finance instruments, validators must identify any model risk not only at the independent sub-model level but at the broader system level, for which the final outputs include model price and risk metrics. This requires a disciplined and integrated approach.



[1] Knots represent a set of predefined points on the curve

[2] Burnout effect describes highly seasoned mortgage pools in which loans likely to repay have already done so, resulting in relatively slow prepayment speeds despite falling interest rates.


Why Model Validators Need to Care About the LIBOR Transition

The transition to the Secured Overnight Financing Rate (SOFR) as a LIBOR replacement after 2021 creates layers of risk for banks. Many of these risks are readily apparent, others less so. But the factors banks must consider while choosing replacement rates and correctly implementing contractual fallback language makes a seamless transition a daunting proposition. Though sometimes overlooked, model risk managers have an important role in ensuring this happens correctly and in a way that does not jeopardize the reliability of model outputs.   

LIBOR, SOFR and the need for transition

A quick refresher: The London Interbank Offered Rate (LIBOR) currently serves as the benchmark at which major global banks lend to one another on a short-term basis in the international interbank market. LIBOR is calculated by the Intercontinental Exchange (ICE) and is published daily. LIBOR is based on a combination of five currencies and seven maturities. The most common of these is the three-month U.S. Dollar rate.

Accusations of manipulation by major banks going back as early as 2008, however, raised concerns about the sustainability of LIBOR. A committee convened by the Federal Reserve Board and the Federal Reserve Bank of New York in 2017—the Alternative Reference Rates Committee (ARRC)—identified a broad Treasury repurchase agreement (repo) financing rate as its preferred alternative reference rate to replace LIBOR after 2021. This repo rate (now known as SOFR) was chosen for its ability to provide liquidity to underlying markets and because the volumes underlying SOFR are far larger than any other U.S. money market. This combination of size and liquidity contributes to SOFR’s transparency and protects market participants from attempts at manipulation.

What Does This Mean for MRM?

Because the transition has potential bearing on so many layers of risk—market risk, operational risk, strategic risk, reputation risk, compliance risk, not to mention the myriad risks associated with mispricing assets—any model in a bank’s existing inventory that is tasked with gauging or remediating these risks is liable to be impacted. Understanding how and the extent to which models are considering how LIBOR transition may affect pricing and other core processes are (or should be) of principal concern to model validators.

Ongoing Monitoring and Benchmarking

Regulatory guidance and model validation best practices require testing model inputs and benchmarking how the model performs with the selected inputs relative to alternatives. For this reason, the validation any model whose outputs are sensitive to variable interest rates should include an assessment of how a replacement index (such as SOFR) and adjustment methodology were selected.

Model validators should be able to ascertain whether the model developer has documented enough evidence relating to:

  • Available reference rates and the appropriateness of each to the bank’s specific products
  • System capabilities for using these replacement rates with the bank’s products.
  • Control risks associated with unavailable alternative rates

Fallback Language considerations:

Fallback language—contractual provisions that govern the process for selecting a replacement rate in the event of LIBOR termination—should also factor into a validator’s assessment of model inputs. While many existing fallback provisions can be frustratingly vague when it comes to dealing with a permanent cessation of LIBOR, validators of models that rely on reference rates as inputs have an obligation to determining compliance with fallback language containing clear and executable terms. These include:

  • Specific triggers to enact the replacement rate
  • Clarity regarding the replacement rate and spread adjustments
  • Permissible options under fallback language – and whether other options might be more appropriate than the one ultimately selected based on the potential for valuation changes, liquidity impact, hedging implications, system changes needed, and customer impact

In November 2019, the ARRC published the finalized fallback language for residential adjustable rate mortgages, bilateral business loans, floating rate notes, securitizations, and syndicated loans. It has also actively engaged with the International Swap Derivatives Association (ISDA) to finalize the fallback parameters for derivatives.

The ARRC also recommended benchmark replacement rates adjusted for spread that would replace the current benchmark due to circumstances that trigger the replacement. The recommendation included the following benchmark replacement waterfalls. Validators of models relying on these replacements may choose, as part of their best practices review, to determine the extent to which existing fallback provisions align with the recommendations.

Replacement Description
Term SOFR + spread adjustment Forward-looking term SOFR for the applicable corresponding tenor. Note: Loan recommendations allow use of the next longest tenor term SOFR rate if the corresponding tenor is unavailable  
Compounded SOFR + spread Adjustment Compounded average of daily SOFRs over the relevant period depending on the tenor of USD LIBOR being replaced
Relevant selected rate + spread adjustment   Rate selected by the Relevant Governmental Body, lender, or borrower & administrative agent
Relevant ISDA replacement rate + spread adjustment The applicable replacement rate (without spread adjustment) that is embedded in ISDA’s standard definitions  
Issuer, designated transaction representative or noteholder replacement + spread adjustment An identified party will select a replacement rate, in some cases considering any industry-accepted rate in the related market. Note: in certain circumstances this step could be omitted

Model risk managers can sometimes be lulled into believing that the validation of interest rate inputs consists solely of verifying their source and confirming that they have been faithfully brought into the model. Ultimately, however, model validators are responsible for verifying not only the provenance of model inputs but also their appropriateness. Consequently, ensuring a smooth transition to the most appropriate available reference rate replacement is of paramount importance to risk management efforts related to the models these rates feed.


The information within this section has been taken directly from the [AR1]

Managing Machine Learning Model Risk

Though the terms are often used interchangeably in casual conversation, machine learning is a subset of artificial intelligence. Simply put, ML is the process of getting a computer to learn the properties of one dataset and generalizing this “knowledge” on other datasets.

ML Financial Models

ML models have crept into virtually every corner of banking and finance — from fraud and money-laundering prevention to credit and prepayment forecasting, trading, servicing, and even marketing. These models take various forms (see Table 1, below). Modelers base their selection of a particular ML technique on a model’s objective and data availability.   

Table 1. ML Models and Application in Finance

Model Application
Linear Regression Credit Risk; Forecasting
Logistic Regression Credit Risk
Monte Carlo Simulation Capital Market; (ALM)
Artificial Neutral Networks Score Card and AML
Decision Trees Regression Models (Random Forest, Bagging) Score Card
Multinomial Logistic Regression Prepayment Projection
Deep Learning Prepayment Projection
Time Series Model Capital Forecasting; Macroeconomics Forecasting Model
Linear Regression with ARIMA Errors Capital Forecasting
Factor Models Short Rate Evolution
Fuzzy Matching AML; OFAC
Linear Discriminant Analysis (LDA) AML; OFAC
K Means Clustering AML; OFAC


ML models require large datasets relative to conventional models as well as more sophisticated computer programing and econometric/statistical skills. ML model developers are required to have deep knowledge about the ML model they want to use, its assumptions and limitations, and alternative approaches.


ML Model Risk

ML models present many of the same risks that accompany conventional models. As with any model, errors in design or application can lead to performance issues resulting in financial losses, poor decisions, and damage to reputation.

ML is all about algorithms. Failing to understand the mathematical aspects of these algorithms can lead to adopting inefficient optimization algorithms without knowing the nature or the interpretation of the optimization being solved. Making decisions under these circumstances increases model risk and can lead to unreliable outputs.

As sometimes befalls conventional regression models, ML models may perform well on the training data but not on the test data. Their complexity and high dimensionality makes them especially susceptible to overfitting. The poor performance of some ML models when applied beyond the training dataset can translate into a huge source of risk.

Finally, ML models can give rise to unintended consequences when used inappropriately or incorrectly. Model risk is magnified when the goal of a ML model’s algorithm is not aligned with the business problem or doesn’t consider all relevant considerations of the business problem. Model risk also arises when an ML model is used outside the environment for which it was designed. These risks include overstated/understated model outputs and lack of fairness. Table 2, below, presents a more comprehensive list of these risks.

Table 2. Potential risk from ML models

Bias toward protected groups
Use of poor-quality data
Job displacement
Models may produce socially unacceptable results
Automation may create model governance issues


Managing ML Model Risk

managing ML model risk

It may seem self-evident, but the first step in managing ML model risk consists of reliably  identifying every model in the inventory that relies on machine learning. This exercise is not always as straightforward as it might seem. Successfully identifying all ML models requires MRM departments to incorporate the right information requests into their model determination or model assessment forms. These should include questions designed to identify specific considerations of ML model techniques, algorithms, platforms and capabilities. MRM departments need to adopt a consistent but flexible definition about what constitutes an ML model across the institution. Models developers, owners and users should be trained in identifying ML models and those features that need to be reported in the model identification assessment form.

MRM’s next step involves risk assessing ML models in the inventory. As with traditional models, ML models should be risk assessed based on their complexity, materiality and frequency of use. Because of their complexity, however, ML models require an additional level of screening in order to account for data structure, level of algorithm sophistication, number of hyper-parameters, and how the models are calibrated. The questionnaire MRM uses to assess the risk of its conventional models often needs to be enhanced in order to adequately capture the additional risk dimensions introduced by ML models.

Managing ML model risk also involves not only ensuring that a clear model development and implementation process is in place but also that it is consistent with the business objective and the intended use of the models. Thorough documentation is important for any model, but the need to describe model theory, methodology, design and logic takes on added importance when it comes to ML models. This includes specifying the methodology (regression or classification), the type of model (linear regression, logistic regression natural language processing, etc.), the resampling method (cross-validation, bootstrap) and the subset selection method such as backward, forward or stepwise selection. Obviously, simply stating that the model “relies on a variety of machine learning techniques” is not going to pass muster.

As with traditional models, developers must document the data source, quality and any transformations that are performed. This includes listing the data sources, normalization and sampling techniques, training and test data size, the data dimension reduction technique (principal component, partial least squares, etc.) as well as controls around them. An assessment of the risk around the utilization of certain data should also be assessed.

A model implementation plan and controls around the model should be also be developed.

Finally, all model performance testing should be clearly stated, and the results documented. This helps assess whether the model is performing as intended and in line with its design and business objective. Limitations and calibrations around the models should also be documented.

Like traditional models, ML models require independent validation to ensure they are sound and performing as intended and to identify potential limitations. All components of ML models should be subject to validation, including conceptual soundness, outcomes analysis and ongoing monitoring.

Validators can assess the conceptional soundness of an ML model by evaluating its design and construction, focusing on the theory, methodology, assumptions and limitations, data quality and integrity, hyper-parameter calibration and overlays, bias and interpretability.

Validators can assess outcomes analysis by checking whether the model outputs are appropriate and in line with a priori expectations. Results of the performance metrics should also be assessed for accuracy and degree of precision. Performance metrics for ML models vary by model type. Similar to traditional predictive models, common performance metrics for ML models include the mean-squared-error (MSE), Gini coefficient, entropy, the confusion matrix, and the receiver operating characteristic (ROC) curve.

Outcomes analysis should also include out-of-sample testing, which can be conducted using cross-validation techniques. Finally, ongoing monitoring should be reviewed as a core element of the validation process. Validators should evaluate whether model use is appropriate given changes in products, exposures and market conditions. Validators should also ensure performance metrics are being monitored regularly based on the inherent risk of the model and frequency of use. Validators should ensure that a continuous performance monitoring plan exists and captures the most important metrics. Also, a change control document and access control document should be available.  

The principles outlined above will sound familiar to any experienced model validator—even one with no ML training or experience. ML models do not upend the framework of MRM best practices but rather add a layer of complexity to their implementation. This complexity requires MRM departments in many cases to adjust their existing procedures to property identify ML models and suitably capture the risk emerging from them. As is almost always the case, aggressive staff training to ensure that their well-considered process enhancements are faithfully executed and have their desired effect.       

September 30 Webinar: Machine Learning in Model Validation

Recorded: September 30th | 1:00 p.m. EDT

Join our panel of experts as they share their latest work using machine learning to identify and validate model inputs.

  • Suhrud Dagli, Co-Founder & Fintech Lead, RiskSpan
  • Jacob Kosoff, Head of Model Risk Management & Validation, Regions Bank
  • Nick Young, Head of Model Validation, RiskSpan
  • Sanjukta Dhar, Consulting Partner, Risk and Regulatory Compliance Strategic Initiative, TCS Canada

Featured Speakers


Suhrud Dagli

Co-Founder & Fintech Lead, RiskSpan

Jacob Kosoff

Head of Model Risk Management & Validation, Regions Bank


Nick Young

Head of Model Validation, RiskSpan

Sanjukta Dhar

Sanjukta Dhar

Consulting Partner, Risk and Regulatory Compliance Strategic Initiative, Tata Consulting

The Why and How of a Successful SAS-to-Python Model Migration

A growing number of financial institutions are migrating their modeling codebases from SAS to Python. There are many reasons for this, some of which may be unique to the organization in question, but many apply universally. Because of our familiarity not only with both coding languages but with the financial models they power, my colleagues and I have had occasion to help several clients with this transition.

Here are some things we’ve learned from this experience and what we believe is driving this change.

Python Popularity

The popularity of Python has skyrocketed in recent years. Its intuitive syntax and a wide array of packages available to aid in development make it one of the most user-friendly programming languages in use today. This accessibility allows users who may not have a coding background to use Python as a gateway into the world of software development and expand their toolbox of professional qualifications.

Companies appreciate this as well. As an open-source language with tons of resources and low overhead costs, Python is also attractive from an expense perspective. A cost-conscious option that resonates with developers and analysts is a win-win when deciding on a codebase.

Note: R is another popular and powerful open-source language for data analytics. Unlike R, however, which is specifically used for statistical analysis, Python can be used for a wider range of uses, including UI design, web development, business applications, and others. This flexibility makes Python attractive to companies seeking synchronicity — the ability for developers to transition seamlessly among teams. R remains popular in academic circles where a powerful, easy-to-understand tool is needed to perform statistical analysis, but additional flexibility is not necessarily required. Hence, we are limiting our discussion here to Python.

Python is not without its drawbacks. As an open-source language, less oversight governs newly added features and packages. Consequently, while updates may be quicker, they are also more prone to error than SAS’s, which are always thoroughly tested prior to release.


Visualization Capabilities

While both codebases support data visualization, Python’s packages are generally viewed more favorably than SAS’s, which tend to be on the more basic side. More advanced visuals are available from SAS, but they require the SAS Visual Analytics platform, which comes at an added cost.

Python’s popular visualization packages — matplotlib, plotly, and seaborn, among others — can be leveraged to create powerful and detailed visualizations by simply importing the libraries into the existing codebase.


SAS is a command-driven software package used for statistical analysis and data visualization. Though available only for Windows operating systems, it remains one of the most widely used statistical software packages in both industry and academia.

It’s not hard to see why. For financial institutions with large amounts of data, SAS has been an extremely valuable tool. It is a well-documented language, with many online resources and is relatively intuitive to pick up and understand – especially when users have prior experience with SQL. SAS is also one of the few tools with a customer support line.

SAS, however, is a paid service, and at a standalone level, the costs can be quite prohibitive, particularly for smaller companies and start-ups. Complete access to the full breadth of SAS and its supporting tools tends to be available only to larger and more established organizations. These costs are likely fueling its recent drop-off in popularity. New users simply cannot access it as easily as they can Python. While an academic/university version of the software is available free of charge for individual use, its feature set is limited. Therefore, for new users and start-up companies, SAS may not be the best choice, despite being a powerful tool. Additionally, with the expansion and maturity of the variety of packages that Python offers, many of the analytical abilities of Python now rival those of SAS, making it an attractive, cost-effective option even for very large firms.

Future of tech

Many of the expected advances in data analytics and tech in general are clearly pointing toward deep learning, machine learning, and artificial intelligence in general. These are especially attractive to companies dealing with large amounts of data.

While the technology to analyze data with complete independence is still emerging, Python is better situated to support companies that have begun laying the groundwork for these developments. Python’s rapidly expanding libraries for artificial intelligence and machine learning will likely make future transitions to deep learning algorithms more seamless.

While SAS has made some strides toward adding machine learning and deep learning functionalities to its repertoire, Python remains ahead and consistently ranks as the best language for deep learning and machine learning projects. This creates a symbiotic relationship between the language and its users. Developers use Python to develop ML projects since it is currently best suited for the job, which in turn expands Python’s ML capabilities — a cycle which practically cements Python’s position as the best language for future development in the AI sphere.

Overcoming the Challenges of a SAS-to-Python Migration

SAS-to-Python migrations bring a unique set of challenges that need to be considered. These include the following.

Memory overhead

Server space is getting cheaper but it’s not free. Although Python’s data analytics capabilities rival SAS’s, Python requires more memory overhead. Companies working with extremely large datasets will likely need to factor in the cost of extra server space. These costs are not likely to alter the decision to migrate, but they also should not be overlooked.

The SAS server

All SAS commands are run on SAS’s own server. This tightly controlled ecosystem makes SAS much faster than Python, which does not have the same infrastructure out of the box. Therefore, optimizing Python code can be a significant challenge during SAS-to-Python migrations, particularly when tackling it for the first time.

SAS packages vs Python packages

Calculations performed using SAS packages vs. Python packages can result in differences, which, while generally minuscule, cannot always be ignored. Depending on the type of data, this can pose an issue. And getting an exact match between values calculated in SAS and values calculated in Python may be difficult.

For example, the true value of “0” as a float datatype in SAS is approximated to 3.552714E-150, while in Python float “0” is approximated to 3602879701896397/255. These values do not create noticeable differences in most calculations. But some financial models demand more precision than others. And over the course of multiple calculations which build upon each other, they can create differences in fractional values. These differences must be reconciled and accounted for.

Comparing large datasets

One of the most common functions when working with large datasets involves evaluating how they change over time. SAS has a built-in function (proccompare) which compares datasets swiftly and easily as required. Python has packages for this as well; however, these packages are not as robust as their SAS counterparts. 


In most cases, the benefits of migrating from SAS to Python outweigh the challenges associated with going through the process. The envisioned savings can sometimes be attractive enough to cause firms to trivialize the transition costs. This should be avoided. A successful migration requires taking full account of the obstacles and making plans to mitigate them. Involving the right people from the outset — analysts well versed in both languages who have encountered and worked through the pitfalls — is key.

Get Started
Get A Demo

Linkedin    Twitter    Facebook