Validating Model Inputs: How Much Is Enough?
In some respects, the OCC 2011-12/SR 11-7 mandate to verify model inputs could not be any more straightforward: “Process verification … includes verifying that internal and external data inputs continue to be accurate, complete, consistent with model purpose and design, and of the highest quality available.” From a logical perspective, this requirement is unambiguous and non-controversial. After all, the reliability of a model’s outputs cannot be any better than the quality of its inputs. From a functional perspective, however, it raises practical questions around the amount of work that needs to be done in order to consider a particular input “verified.” Take the example of a Housing Price Index (HPI) input assumption. It could be that the modeler obtains the HPI assumption from the bank’s finance department, which purchases it from an analytics firm. What is the model validator’s responsibility? Is it sufficient to verify that the HPI input matches the data of the finance department that supplied it? If not, is it enough to verify that the finance department’s HPI data matches the data provided by its analytics vendor? If not, is it necessary to validate the analytics firm’s model for generating HPI assumptions? It depends. Just as model risk increases with greater model complexity, higher uncertainty about inputs and assumptions, broader use, and larger potential impact, input risk increases with increases in input complexity and uncertainty. The risk of any specific input also rises as model outputs become increasingly sensitive to it.
Validating Model Inputs Best Practices
So how much validation of model inputs is enough? As with the management of other risks, the level of validation or control should be dictated by the magnitude or impact of the risk. Like so much else in model validation, no ‘one size fits all’ approach applies to determining the appropriate level of validation of model inputs and assumptions. In addition to cost/benefit considerations, model validators should consider at least four factors for mitigating the risk of input and assumption errors leading to inaccurate outputs.
- Complexity of inputs
- Manual manipulation of inputs from source system prior to input into model
- Reliability of source system
- Relative importance of the input to the model’s outputs (i.e., sensitivity)
Consideration 1: Complexity of Inputs
The greater the complexity of the model’s inputs and assumptions, the greater the risk of errors. For example, complex yield curves with multiple data points will be inherently subject to greater risk of inaccuracy than binary inputs such as “yes” and “no.” In general, the more complex an input is, the more scrutiny it requires and the “further back” a validator should look to verify its origin and reasonability.
Consideration 2: Manual Manipulation of Inputs from Source System Prior to Input into Model
Input data often requires modification from the source system to facilitate input into the model. More handling and manual modifications increase the likelihood of error. For example, if a position input is manually copied from Bloomberg and then subjected to a manual process of modification of format to enable uploading to the model, there is a greater likelihood of error than if the position input is extracted automatically via an API. The accuracy of the input should be verified in either case, but the more manual handling and manipulation of data that occurs, the more comprehensive the testing should be. In this example, more comprehensive testing would likely take the form of a larger sample size.
In addition, the controls over the processes to extract, transform, and load data from a source system into the model will impact the risk of error. More mature and effective controls, including automation and reconciliation, will decrease the likelihood of error and therefore likely require a lighter verification procedure.
Consideration 3: Reliability of Source Systems
More mature and stable source systems generally produce more consistently reliable results. Conversely, newer systems and those that have produced erroneous results increase the risk of error. The results of previous validation of inputs, from prior model validations or from third parties, including internal audit and compliance, can be used as an indicator of the reliability of information from source systems and the magnitude of input risk. The greater the number of issues identified, the greater the risk, and the more likely it is that a validator should seek to drill deeper into the fundamental sources of source data.
Consideration 4: Output Sensitivity to Inputs
No matter how reliable an input data’s source system is deemed to be, or the amount of manual manipulation to which an input is subjected, perhaps the most important consideration is the individual input’s power to affect the model’s outputs. Returning to our original example, if a 50 percent change in the HPI assumption has only a negligible impact on the model’s outputs, then a quick verification against the report supplied by the finance department may be sufficient. If, however, the model’s outputs are extremely sensitive to even small shifts in the HPI assumption, then additional testing is likely warranted—perhaps even to include a validation of the analytics vendor’s HPI model (along with all of its inputs).
A Cost-Effective Model Input Validation Strategy
When it comes to verifying model inputs, there is no theoretical limitation to the lengths to which a model validator can go. Model risk managers, who do not have unlimited time or budgets, would benefit from applying practical limits to validation procedures using a risk-based approach to determine the most cost-effective strategies to ensure that models are sufficiently validated. Applying the considerations listed above on a case-by-case basis will help validators appropriately define and scope model input reviews in a manner commensurate with appropriate risk management principles.

Let’s first consider the sample required for a single pool of nearly identical loans. In the case of a uniform pool of loans — with the same FICO, loan-to-value (LTV) ratio, loan age, etc. — there is a straightforward formula to calculate the sample size we need to estimate the pool’s default rate, shown in Exhibit 1.1 As the formula shows, the sample size depends on several variables, some of which must be estimated:
In this case, our internal sample of 500 loans is more than enough to give us a statistical confidence interval that is narrower than our materiality thresholds. We do not need proxy data to inform our CECL model in this case.
Again assume we segregate loans into four categories based on this third variable. Now we have 4^3= 64 equal-sized buckets. With loan-level modeling we need around 12,000 loans. With bucketing we need around 100,000 loans, an average of around 1,600 per bucket. As the graph shows in Exhibit 2, a bucketing approach forces us to choose between less insight and an astronomical sample size requirement. As we increase the number of variables used to forecast credit losses, the sample needed for loan-level modeling increases slightly, but the sample needed for bucketing explodes. This points to loan-level modeling as the best solution because well-performing CECL models incorporate many variables. (Another benefit of loan-level credit models, one that is of
We can observe an MI Haircut Rate averaging at 19.50% for vintages 1999 to 2011 with higher haircuts for the distressed vintages 2003 to 2008 at 23.50%. Exhibit 2: Mortgage Insurance Haircut Rate by Disposition Year
Our analysis shows the MI Haircut Rate prior to 2008 on average was 6.5% and steadily increased to an average of 25% from 2009 thru 2014. We will explain below. Exhibit 3: Mortgage Insurance Haircut Rate and Expense to Delinquent UPB Percentage by Months Non-Performing
In this analysis, we observe the MI Haircut Rate steadily increased by the number of months between when a loan was first classified as non-performing and when a loan liquidated. This increase can be explained by increased curtailments tied to expenses that increase over time, such as expenses associated with physical damage of the property, tax penalties, delinquent interest, insurance and taxes outside the coverage period, and excessive maintenance or attorney fees. Interest, taxes, and insurance typically constitute 85% of all loss expenses.

