So you’re updating all your modeling assumptions. Don’t forget about governance.
Modelers have now been grappling with how COVID-19 should affect assumptions and forecasts for nearly two months. This exercise is raising at least as many questions as it is answering.
No credit model (perhaps no model at all) is immune. Among the latest examples are mortgage servicers having to confront how to bring their forbearance and loss models into alignment with new realities.
These new realities are requiring servicers to model unprecedented macroeconomic conditions in a new and changing regulatory environment. The generous mortgage forbearance provisions ushered in by March’s CARES Act are not tantamount to loan forgiveness. But servicers probably shouldn’t count on reimbursement of their forbearance advances until loan liquidation (irrespective of what form the payoff takes).
The ramifications of these costs and how servicers should modeling them is a central topic to be addressed in a Mortgage Bankers Association webinar on Wednesday, May 13, “Modeling Forbearance Losses in the COVID-19 world” (free for MBA members). RiskSpan CEO Bernadette Kogler will lead a panel consisting of Faith Schwartz, Suhrud Dagli, and Morgan Snyder in a discussion of the forbearance’s regulatory implications, the limitations of existing models, and best practices for modeling forbearance-related advances, losses, and operational costs.
Models, of course, are only as good as their underlying data and assumptions. When it comes to forbearance modeling, those assumptions obviously have a lot to do with unemployment, but also with the forbearance take-up rate layered on top of more conventional assumptions around rates of delinquency, cures, modifications, and bankruptcies.
The unique nature of this crisis requires modelers to expand their horizons in search of applicable data. For example, GSE data showing how delinquencies trend in rising unemployment scenarios might need to be supplemented by data from Greek or other European crises to better simulate extraordinarily high unemployment rates. Expense and liquidation timing assumptions will likely require looking at GSE and private-label data from the 2008 crisis. Having reliable assumptions around these is critically important because liquidity issues associated with servicing advances are often more an issue of timing than of anything else.
Model adjustments of the magnitude necessary to align them with current conditions almost certainly qualify as “material changes” and present a unique set of challenges to model validators. In addition to confronting an expanded workload brought on by having to re-validate models that might have been validated as recently as a few months ago, validators must also effectively challenge the new assumptions themselves. This will likely prove challenging absent historical context.
RiskSpan’s David Andrukonis will address many of these challenges—particularly as they relate to CECL modeling—as he participates in a free webinar, “Model Risk Management and the Impacts of COVID-19,” sponsored by the Risk Management Association. Perhaps fittingly, this webinar will run concurrent with the MBA webinar discussed above.
As is always the case, the smoothness of these model-change validations will depend on the lengths to which modelers are willing to go to thoroughly document their justifications for the new assumptions. This becomes particularly important when introducing assumptions that significantly differ from those that have been used previously. While it will not be difficult to defend the need for changes, justifying the individual changes themselves will prove more challenging. To this end, meticulously documenting every step of feature selection during the modeling process is critical not only in getting to a reliable model but also in ensuring an efficient validation process.
Documenting what they’re doing and why they’re doing it is no modeler’s favorite part of the job—particularly when operating in crisis mode and just trying to stand up a workable solution as quickly as possible. But applying assumptions that have never been used before always attracts increased scrutiny. Modelers will need to get into the habit of memorializing not only the decisions made regarding data and assumptions, but also the other options considered, and why the other considered options were ultimately passed over.
Documenting this decision-making process is far easier at the time it happens, while the details are fresh in a modeler’s mind, than several months down the road when people inevitably start probing.
Invest in the “ounce of prevention” now. You’ll thank yourself when model validation comes knocking.
Watch RiskSpan Managing Director, Tim Willis, discuss how to optimize model validation programs. RiskSpan’s model risk management practice has experience in both building and validating models, giving us unique expertise to provide very high quality validations without diving into activities and exercises of marginal value.
In addition to transforming the way in which financial institutions approach predictive modeling, machine learning techniques are beginning to find their way into how model validators assess conventional, non-machine-learning predictive models. While the array of standard statistical techniques available for validating predictive models remains impressive, the advent of machine learning technology has opened new avenues of possibility for expanding the rigor and depth of insight that can be gained in the course of model validation. In this blog post, we explore how machine learning, in some circumstances, can supplement a model validator’s efforts related to:
- Outlier detection on model estimation data
- Clustering of data to better understand model accuracy
- Feature selection methods to determine the appropriateness of independent variables
- The use of machine learning algorithms for benchmarking
- Machine learning techniques for sensitivity analysis and stress testing
Conventional model validations include, when practical, an assessment of the dataset from which the model is derived. (This is not always practical—or even possible—when it comes to proprietary, third-party vendor models.) Regardless of a model’s design and purpose, virtually every validation concerns itself with at least a cursory review of where these data are coming from, whether their source is reliable, how they are aggregated, and how they figure into the analysis.
Conventional model validation techniques sometimes overlook (or fail to look deeply enough at) the question of whether the data population used to estimate the model is problematic. Outliers—and the effect they may be having on model estimation—can be difficult to detect using conventional means. Developing descriptive statistics and identifying data points that are one, two, or three standard deviations from the mean (i.e., extreme value analysis) is a straightforward enough exercise, but this does not necessarily tell a modeler (or a model validator) which data points should be excluded.
Machine learning modelers use a variety of proximity and projection methods for filtering outliers from their training data. One proximity method employs the K-means algorithm, which groups data into clusters centered around defined “centroids,” and then identifies data points that do not appear to belong to any particular cluster. Common projection methods include multi-dimensional scaling, which allows analysts to view multi-dimensional relationships among multiple data points in just two or three dimensions. Sophisticated model validators can apply these techniques to identify dataset problems that modelers may have overlooked.
The tendency of data to cluster presents another opportunity for model validators. Machine learning techniques can be applied to determine the relative compactness of individual clusters and how distinct individual clusters are from one another. Clusters that do not appear well defined and blur into one another are evidence of a potentially problematic dataset—one that may result in non-existent patterns being identified in random data. Such clustering could be the basis of any number of model validation findings.
Feature (Variable) Selection
What conventional predictive modelers typically refer to as variables are commonly referred to by machine learning modelers as features. Features and variables serve essentially the same function, but the way in which they are selected can differ. Conventional modelers tend to select variables using a combination of expert judgment and statistical techniques. Machine learning modelers tend to take a more systematic approach that includes stepwise procedures, criterion-based procedures, lasso and ridge regresssion and dimensionality reduction. These methods are designed to ensure that machine learning models achieve their objectives in the simplest way possible, using the fewest possible number of features, and avoiding redundancy. Because model validators frequently encounter black-box applications, directing applying these techniques is not always possible. In some limited circumstances, however, model validators can add to the robustness of their validations by applying machine learning feature selection methods to determine whether conventionally selected model variables resemble those selected by these more advanced means (and if not, why not).
Identifying and applying an appropriate benchmarking model can be challenging for model validators. Commercially available alternatives are often difficult to (cost effectively) obtain, and building challenger models from scratch can be time-consuming and problematic—particularly when all they do is replicate what the model in question is doing.
While not always feasible, building a machine learning model using the same data that was used to build a conventionally designed predictive model presents a “gold standard” benchmarking opportunity for assessing the conventionally developed model’s outputs. Where significant differences are noted, model validators can investigate the extent to which differences are driven by data/outlier omission, feature/variable selection, or other factors.
Sensitivity Analysis and Stress Testing
The sheer quantity of high-dimensional data very large banks need to process in order to develop their stress testing models makes conventional statistical analysis both computationally expensive and problematic. (This is sometimes referred to as the “curse of dimensionality.”) Machine learning feature selection techniques, described above, are frequently useful in determining whether variables selected for stress testing models are justifiable.
Similarly, machine learning techniques can be employed to isolate, in a systematic way, those variables to which any predictive model is most and least sensitive. Model validators can use this information to quickly ascertain whether these sensitivities are appropriate. A validator, for example, may want to take a closer look at a credit model that is revealed to be more sensitive to, say, zip code, than it is to credit score, debt-to-income ratio, loan-to-value ratio, or any other individual variable or combination of variables. Machine learning techniques make it possible for a model validator to assess a model’s relative sensitivity to virtually any combination of features and make appropriate judgments.
Model validators have many tools at their disposal for assessing the conceptual soundness, theory, and reliability of conventionally developed predictive models. Machine learning is not a substitute for these, but its techniques offer a variety of ways of supplementing traditional model validation approaches and can provide validators with additional tools for ensuring that models are adequately supported by the data that underlies them.
Machine learning models pose a unique set of challenges to model validators. While exponential increases in the availability of data, computational power, and algorithmic sophistication in recent years has enabled banks and other firms to increasingly derive actionable insights from machine learning methods, the significant complexity of these systems introduces new dimensions of risk.
When appropriately implemented, machine learning models greatly improve the accuracy of predictions that are vital to the risk management decisions financial institutions make. The price of this accuracy, however, is complexity and, at times, a lack of transparency. Consequently, machine learning models must be particularly well maintained and their assumptions thoroughly understood and vetted in order to prevent wildly inaccurate predictions. While maintenance remains primarily the responsibility of the model owner and the first line of defense, second-line model validators increasingly must be able to understand machine learning principles well enough to devise effective challenge that includes:
- Analysis of model estimation data to determine the suitability of the machine learning algorithm
- Assessment of space and time complexity constraints that inform model training time and scalability
- Review of model training/testing procedure
- Determination of whether model hyperparameters are appropriate
- Calculation of metrics for determining model accuracy and robustness
More than one way exists of organizing these considerations along the three pillars of model validation. Here is how we have come to think about it.
Many of the concepts of reviewing model theory that govern conventional model validations apply equally well to machine learning models. The question of “business fit” and whether the variables the model lands on are reasonable is just as valid when the variables are selected by a machine as it is when they are selected by a human analyst. Assessing the variable selection process “qualitatively” (does it make sense?) as well as quantitatively (measuring goodness of fit by calculating residual errors, among other tests) takes on particular importance when it comes to machine learning models.
Machine learning does not relieve validators of their responsibility assess the statistical soundness of a model’s data. Machine learning models are not immune to data issues. Validators protect against these by running routine distribution, collinearity, and related tests on model datasets. They must also ensure that the population has been appropriately and reasonably divided into training and holdout/test datasets.
Supplementing these statistical tests should be a thorough assessment of the modeler’s data preparation procedures. In addition to evaluating the ETL process—a common component of all model validations—effective validations of machine learning models take particular notice of variable “scaling” methods. Scaling is important to machine learning algorithms because they generally do not take units into account. Consequently, a machine learning model that relies on borrower income (generally ranging between tens of thousands and hundreds of thousands of dollars), borrower credit score (which generally falls within a range of a few hundred points) and loan-to-value ratio (expressed as a percentage), needs to apply scaling factors to normalize these ranges in order for the model to correctly process each variable’s relative importance. Validators should ensure that scaling and normalizations are reasonable.
Model assumptions, when it comes to machine learning validation, are most frequently addressed by looking at the selection, optimization, and tuning of the model’s hyperparameters. Validators must determine whether the selection/identification process undertaken by the modeler (be it grid search, random search, Bayesian Optimization, or another method—see this blog post for a concise summary of these) is conceptually sound.
Machine learning models are no more immune to overfitting and underfitting (the bias-variance dilemma) than are conventionally developed predictive models. An overfitted model may perform well on the in-sample data, but predict poorly on the out-of-sample data. Complex nonparametric and nonlinear methods used in machine learning algorithms combined with high computing power are likely to contribute to an overfitted machine learning model. An underfitted model, on the other hand, performs poorly in general, mainly due to an overly simplified model algorithm that does a poor job at interpreting the information contained within data.
Cross-validation is a popular technique for detecting and preventing the fitting or “generalization capability” issues in machine learning. In K-Fold cross-validation, the training data is partitioned into K subsets. The model is trained on all training data except the Kth subset, and the Kth subset is used to validate the performance. The model’s generalization capability is low if the accuracy ratios are consistently low (underfitted) or higher on the training set but lower on the validation set (overfitted). Conventional models, such as regression analysis, can be used to benchmark performance.
Outcomes analysis enables validators to verify the appropriateness of the model’s performance measure methods. Performance measures (or “scoring methods”) are typically specialized to the algorithm type, such as classification and clustering. Validators can try different scoring methods to test and understand the model’s performance. Sensitivity analyses can be performed on the algorithms, hyperparameters, and seed parameters. Since there is no right or wrong answer, validators should focus on the dispersion of the sensitivity results.
Many statistical tactics commonly used to validate conventional models apply equally well to machine learning models. One notable omission is the ability to precisely replicate the model’s outputs. Unlike with an OLS or ARIMA model, for which a validator can reasonably expect to be able to match the model’s coefficients exactly if given the same data, machine learning models can be tested only indirectly—by testing the conceptual soundness of the selected features and assumptions (hyperparameters) and by evaluating the process and outputs. Applying model validation tactics specially tailored to machine learning models allows financial institutions to deploy these powerful tools with greater confidence by demonstrating that they are of sound conceptual design and perform as expected.