Get Started
Articles Tagged with: Credit Analytics

Analytics-as-a-Service – CECL Forecasting

The RiskSpan Edge Platform CECL Module delivers the technology platform and expertise to take you from where you are today to producing audit-ready CECL estimates. Our dedicated CECL Module executes your monthly loss reserving and reporting process under the new CECL standard, covering data intake, segmentation, modeling, and report generation within a single platform. Watch RiskSpan Director David Andrukonis explain the Edge CECL Module in this video.

Get a Demo

CRT Deal Monitor: Understanding When Credit Becomes Risky

This analysis tracks several metrics related to deal performance and credit profile, putting them into a historical context by comparing the same metrics for recent-vintage deals against those of ‘similar’ cohorts in the time leading up to the 2008 housing crisis. You’ll see how credit metrics are trending today and understand the significance of today’s shifts in the context of historical data.

Some of the charts in this post have interactive features, so click around! We’ll be tweaking the analysis and adding new metrics in subsequent months. Please shoot us an email if you have an idea for other metrics you’d like us to track.


  • Performance metrics signal steadily increasing credit risk, but no cause for alarm.
    • We’re starting to see the hurricane-related (2017 Harvey and Irma) delinquency spikes subside in the deal data. Investors should expect a similar trend in 2019 due to Hurricane Florence.
    • The overall percentage of delinquent loans is increasing steadily due to the natural age ramp of delinquency rates and the ramp-up of the program over the last 5 years.
    • Overall delinquency levels are still far lower than historical rates.
    • While the share of delinquency is increasing, loans that go delinquent are ending up in default at a lower rate than before.
  • Deal Profiles are becoming riskier as new GSE acquisitions include higher-DTI business.
    • It’s no secret that both GSEs started acquiring a lot of high-DTI loans (for Fannie this moved from around 16% of MBS issuance in Q2 2017 to 30% of issuance as of Q2 this year). We’re starting to see a shift in CRT deal profiles as these loans are making their way into CRT issuance.
    • The credit profile chart toward the end of this post compares the credit profiles of recently issued deals with those of the most recent three months of MBS issuance data to give you a sense of the deal profiles we’re likely to see over the next 3 to 9 months. We also compare these recently issued deals to a similar cohort from 2006 to give some perspective on how much the credit profile has improved since the housing crisis.
    • RiskSpan’s Vintage Quality Index reflects an overall loosening of credit standards–reminiscent of 2003 levels–driven by this increase in high-DTI originations.
  • Fannie and Freddie have fundamental differences in their data disclosures for CAS and STACR.
    • Delinquency rates and loan performance all appear slightly worse for Fannie Mae in both the deal and historical data.
    • Obvious differences in reporting (e.g., STACR reporting a delinquent status in a terminal month) have been corrected in this analysis, but some less obvious differences in reporting between the GSEs may persist.
    • We suspect there is something fundamentally different about how Freddie Mac reports delinquency status—perhaps related to cleaning servicing reporting errors, cleaning hurricane delinquencies, or the way servicing transfers are handled in the data. We are continuing our research on this front and hope to follow up with another post to explain these anomalies.

The exceptionally low rate of delinquency, default, and loss among CRT deals at the moment makes analyzing their credit-risk characteristics relatively boring. Loans in any newly issued deal have already seen between 6 and 12 months of home price growth, and so if the economy remains steady for the first 6 to 12 months after issuance, then that deal is pretty much in the clear from a risk perspective. The danger comes if home prices drift downward right after deal issuance. Our aim with this analysis is to signal when a shift may be occurring in the credit risk inherent in CRT deals.

Many data points related to the overall economy and home prices are available to investors seeking to answer this question. This analysis focuses on what the Agency CRT data—both the deal data and the historical performance datasets—can tell us about the health of the housing market and the potential risks associated with the next deals that are issued.

Current Performance and Credit Metrics

Delinquency Trends

The simplest metric we track is the share of loans across all deals that is 60+ days past due (DPD). The charts below compare STACR (Freddie) vs. CAS (Fannie), with separate charts for high-LTV deals (G2 for CAS and HQA for STACR) vs. low-LTV deals (G1 for CAS and DNA for STACR). Both time series show a steadily increasing share of delinquent loans. This slight upward trend is related to the natural aging curve of delinquency and the ramp-up of the CRT program. Both time series show a significant spike in delinquency around January of this year due to the 2017 hurricane season. Most of these delinquent loans are expected to eventually cure or prepay.

For comparative purposes, we include a historical time series of the share of loans 60+ DPD for each LTV group. These charts are derived from the Fannie Mae and Freddie Mac loan-level performance datasets. Comparatively, today’s deal performance is much better than even the pre-2006 era.

You’ll note the systematically higher delinquency rates of CAS deals. We suspect this is due to reporting differences rather than actual differences in deal performance. We’ll continue to investigate and report back on our findings.

Delinquency Outcome Monitoring

While delinquency rates might be trending up, loans that are rolling to 60-DPD are ultimately defaulting at lower and lower rates. The tables below track the status of loans that were 60+ DPD. Each bar in the chart represents the population of loans that were 60+ DPD exactly 6 months prior to the x-axis date.

Over time, we see growing 60-DPD and 60+ DPD groups, and a shrinking Default group. This indicates that a majority of delinquent loans wind up curing or prepaying, rather than proceeding to default.

The choppiness and high default rates in the first few observations of the data are related to the very low counts of delinquent loans as the CRT program ramped up.

The following table repeats the 60-DPD delinquency analysis for the Freddie Mac Loan Level Performance dataset leading up to and following the housing crisis. (The Fannie Mae loan level performance set yields a nearly identical chart.) Note how many more loans in these cohorts remained delinquent (rather than curing or defaulting) relative to the more recent CRT loans.

Vintage Quality Index

RiskSpan’s Vintage Quality Index (VQI) reflects a reversion to the looser underwriting standards of the early 2000s as a result of the GSEs’ expansion of high-DTI lending. RiskSpan introduced the VQI in 2015 as a way of quantifying the underwriting environment of a particular vintage of mortgage originations. We use the metric as an empirically grounded way to control for vintage differences within our credit model.

While both GSEs increased high-DTI lending in 2017, it’s worth noting that Fannie Mae saw a relatively larger surge in loans with DTIs greater than 43%. The chart below shows the share of loans backing MBS with DTI > 43. We use the loan-level MBS issuance data to track what’s being originated and acquired by the GSEs because it is the timeliest data source available. CRT deals are issued with loans that are between 6 and 20 months seasoned, and so tracking MBS issuance provides a preview of what will end up in the next cohort of deals.

Deal Profile Comparison

The tables below compare the credit profiles of recently issued deals. We focus on the key drivers of credit risk, highlighting the comparatively riskier features of a deal. Each table separates the high-LTV (80%+) deals from the low-LTV deals (60%-80%). We add two additional columns for comparison purposes. The first is the ‘Coming Cohort,’ which is meant to give an indication of what upcoming deal profiles will look like. The data in this column is derived from the most recent three months of MBS issuance loan-level data, controlling for the LTV group. These are newly originated and acquired by the GSEs—considering that CRT deals are generally issued with an average loan age between 6 and 15 months, these are the loans that will most likely wind up in future CRT transactions. The second comparison cohort consists of 2006 originations in the historical performance datasets (Fannie and Freddie combined), controlling for the LTV group. We supply this comparison as context for the level of risk that was associated with one of the worst-performing cohorts.

The latest CAS deals—both high- and low-LTV—show the impact of increased >43% DTI loan acquisitions. Until recently, STACR deals typically had a higher share of high-DTI loans, but the latest CAS deals have surpassed STACR in this measure, with nearly 30% of their loans having DTI ratios in excess of 43%.

CAS high-LTV deals carry more risk in LTV metrics, such as the percentage of loans with a CLTV > 90 or CLTV > 95. However, STACR includes a greater share of loans with a less-than-standard level of mortgage insurance, which would provide less loss protection to investors in the event of a default.

Low-LTV deals generally appear more evenly matched in terms of risk factors when comparing STACR and CAS. STACR does display the same DTI imbalance as seen in the high-LTV deals, but that may change as the high-DTI group makes its way into deals.

Deal Tracking Reports

Please note that defaults are reported on a delay for both GSEs, and so while we have CPR numbers available for August, CDR numbers are not provided because they are not fully populated yet. Fannie Mae CAS default data is delayed an additional month relative to STACR. We’ve left loss and severity metrics blank for fixed-loss deals.

Get a Demo

Data-as-a-Service – Credit Risk Transfer Data

Watch RiskSpan Managing Director Janet Jozwik explain our recent Credit Risk Transfer data (CRT) additions to the RS Edge Platform.

Each dataset has been normalized to the same standard for simpler analysis in RS Edge, enabling users to compare GSE performance with just a few clicks. The data has also been enhanced to include helpful variables, such as mark-to-market loan-to-value ratios based on the most granular house price indexes provided by the Federal Housing Finance Agency. 

get a demo

RiskSpan to Offer Credit Risk Transfer Data Through Edge Platform

ARLINGTON, VA, September 6, 2018 — RiskSpan announced today its rollout of Credit Risk Transfer (CRT) datasets available through its RS Edge Platform. The datasets include over seventy million Agency loans that will expand the RS Edge platform’s data library and add key enhancements for credit risk analysis. 

RS Edge is a SaaS platform that integrates normalized data, predictive models and complex scenario analytics for customers in the capital markets, commercial banking, and insurance industries. The Edge Platform solves the hardest data management and analytical problem – affordable off-the-shelf integration of clean data and reliable models. 

New additions to the RS Edge Data Library will include key GSE Loan Level Performance datasets going back eighteen years. RiskSpan is also adding Fannie Mae’s Connecticut Avenue Securities (CAS) and Credit Insurance Risk Transfer (CIRT) datasets as well as the Freddie Mac Structured Agency Credit Risk (STACR) datasets.  

RS Edge Platform – Data Libraries UI

Each dataset has been normalized to the same standard for simpler analysis in RS Edge. This will allow users to compare GSE performance with just a few clicks. The data has also been enhanced to include helpful variables, such as mark-to-market loan-to-value ratios based on the most granular house price indexes provided by the Federal Housing Finance Agency. 

Managing Director and Co-Head of Quantitative Analytics Janet Jozwik said of the new CRT data, “Our data library is a great, cost-effective resource that can be leveraged to build models, understand assumptions around losses on different vintages, and benchmark performance of their own portfolio against the wider universe.” 

RiskSpan’s Edge API also makes it easier-than-ever to access large datasets for analytics, model development and benchmarking. Major quant teams that prefer APIs now have access to normalized and validated data to run scenario analytics, stress testing or shock analysis. RiskSpan makes data available through its proprietary instance of RStudio and Python. 

get a demo

Here Come the CECL Models: What Model Validators Need to Know

As it turns out, model validation managers at regional banks didn’t get much time to contemplate what they would do with all their newly discovered free time. Passage of the Economic Growth, Regulatory Relief, and Consumer Protection Act appears to have relieved many model validators of the annual DFAST burden. But as one class of models exits the inventory, a new class enters—CECL models.

Banks everywhere are nearing the end of a multi-year scramble to implement a raft of new credit models designed to forecast life-of-loan performance for the purpose of determining appropriate credit-loss allowances under the Financial Accounting Standards Board’s new Current Expected Credit Loss (CECL) standard, which takes full effect in 2020 for public filers and 2021 for others.

The number of new models CECL adds to each bank’s inventory will depend on the diversity of asset portfolios. More asset classes and more segmentation will mean more models to validate. Generally model risk managers should count on having to validate at least one CECL model for every loan and debt security type (residential mortgage, CRE, plus all the various subcategories of consumer and C&I loans) plus potentially any challenger models the bank may have developed.

In many respects, tomorrow’s CECL model validations will simply replace today’s allowance for loan and lease losses (ALLL) model validations. But CECL models differ from traditional allowance models. Under the current standard, allowance models typically forecast losses over a one-to-two-year horizon. CECL requires a life-of-loan forecast, and a model’s inputs are explicitly constrained by the standard. Accounting rules also dictate how a bank may translate the modeled performance of a financial asset (the CECL model’s outputs) into an allowance. Model validators need to be just as familiar with the standards governing how these inputs and outputs are handled as they are with the conceptual soundness and mathematical theory of the credit models themselves.

CECL Model Inputs – And the Magic of Mean Reversion

Not unlike DFAST models, CECL models rely on a combination of loan-level characteristics and macroeconomic assumptions. Macroeconomic assumptions are problematic with a life-of-loan credit loss model (particularly with long-lived assets—mortgages, for instance) because no one can reasonably forecast what the economy is going to look like six years from now. (No one really knows what it will look like six months from now, either, but we need to start somewhere.) The CECL standard accounts for this reality by requiring modelers to consider macroeconomic input assumptions in two separate phases: 1) a “reasonable and supportable” forecast covering the time frame over which the entity can make or obtain such a forecast (two or three years is emerging as common practice for this time frame), and 2) a “mean reversion” forecast based on long-term historical averages for the out years. As an alternative to mean reverting by the inputs, entities may instead bypass their models in the out years and revert to long-term average performance outcomes by the relevant loan characteristics.

Assessing these assumptions (and others like them) requires a model validator to simultaneously wear a “conceptual soundness” testing hat and an “accounting policy” compliance hat. Because the purpose of the CECL model is to prove an accounting answer and satisfy an accounting requirement, what can validators reasonably conclude when confronted with an assumption that may seem unsound from purely statistical point of view but nevertheless satisfies the accounting standard?

Taking the mean reversion requirement as an example, the projected performance of loans and securities beyond the “reasonable and supportable” period is permitted to revert to the mean in one of two ways: 1) modelers can feed long-term history into the model by supplying average values for macroeconomic inputs, allowing modeled results to revert to long-term means in that way, or 2) modelers can mean revert “by the outputs” – bypassing the model and populating the remainder of the forecast with long-term average performance outcomes (prepayment, default, recovery and/or loss rates depending on the methodology). Either of these approaches could conceivably result in a modeler relying on assumptions that may be defensible from an accounting perspective despite being statistically dubious, but the first is particularly likely to raise a validator’s eyebrow. The loss rates that a model will predict when fed “average” macroeconomic input assumptions are always going to be uncharacteristically low. (Because credit losses are generally large in bad macroeconomic environments and low in average and good environments, long-term average credit losses are higher than the credit losses that occur during average environments. A model tuned to this reality—and fed one path of “average” macroeconomic inputs—will return credit losses substantially lower than long-term average credit losses.) A credit risk modeler is likely to think that these are not particularly realistic projections, but an auditor following the letter of the standard may choose not find any fault with them. In such situations, validators need to fall somewhere in between these two extremes—keeping in mind that the underlying purpose of CECL models is to reasonably fulfill an accounting requirement—before hastily issuing a series of high-risk validation findings.

CECL Model Outputs: What are they?

CECL models differ from some other models in that the allowance (the figure that modelers are ultimately tasked with getting to) is not itself a direct output of the underlying credit models being validated. The expected losses that emerge from the model must be subject to a further calculation in order to arrive at the appropriate allowance figure. Whether these subsequent calculations are considered within the scope of a CECL model validation is ultimately going to be an institutional policy question, but it stands to reason that they would be.

Under the CECL standard, banks will have two alternatives for calculating the allowance for credit losses: 1) the allowance can be set equal to the sum of the expected credit losses (as projected by the model), or 2) the allowance can be set equal to the cost basis of the loan minus the present value of expected cash flows. While a validator would theoretically not be in a position to comment on whether the selected approach is better or worse than the alternative, principles of process verification would dictate that the validator ought to determine whether the selected approach is consistent with internal policy and that it was computed accurately.

When Policy Trumps Statistics

The selection of a mean reversion approach is not the only area in which a modeler may make a statistically dubious choice in favor of complying with accounting policy.

Discount Rates

Translating expected losses into an allowance using the present-value-of-future-cash-flows approach (option 2—above) obviously requires selecting an appropriate discount rate. What should it be? The standard stipulates the use of the financial asset’s Effective Interest Rate (or “yield,” i.e., the rate of return that equates an instrument’s cash flows with its amortized cost basis). Subsequent accounting guidance affords quite a bit a flexibility in how this rate is calculated. Institutions may use the yield that equates contractual cash flows with the amortized cost basis (we can call this “contractual yield”), or the rate of return that equates cash flows adjusted for prepayment expectations with the cost basis (“prepayment-adjusted yield”).

The use of the contractual yield (which has been adjusted for neither prepayments nor credit events) to discount cash flows that have been adjusted for both prepayments and credit events will allow the impact of prepayment risk to be commingled with the allowance number. For any instruments where the cost basis is greater than unpaid principal balance (a mortgage instrument purchased at 102, for instance) prepayment risk will exacerbate the allowance. For any instruments where the cost basis is less than the unpaid principal balance, accelerations in repayment will offset the allowance. This flaw has been documented by FASB staff, with the FASB Board subsequently allowing but not requiring the use of a prepay-adjusted yield.

Multiple Scenarios

The accounting standard neither prohibits nor requires the use of multiple scenarios to forecast credit losses. Using multiple scenarios is likely more supportable from a statistical and model validation perspective, but it may be challenging for a validator to determine whether the various scenarios have been weighted properly to arrive at the correct, blended, “expected” outcome.

Macroeconomic Assumptions During the “Reasonable and Supportable” Period

Attempting to quantitatively support the macro assumptions during the “reasonable and supportable” forecast window (usually two to three years) is likely to be problematic both for the modeler and the validator. Such forecasts tend to be more art than science and validators are likely best off trying to benchmark them against what others are using than attempting to justify them using elaborately contrived quantitative methods. The data that is mostly likely to be used may turn out to be simply the data that is available. Validators must balance skepticism of such approaches with pragmatism. Modelers have to use something, and they can only use the data they have.

Internal Data vs. Industry Data

The standard allows for modeling using internal data or industry proxy data. Banks often operate under the dogma that internal data (when available) is always preferable to industry data. This seems reasonable on its face, but it only really makes sense for institutions with internal data that is sufficiently robust in terms of quantity and history. And the threshold for what constitutes “sufficiently robust” is not always obvious. Is one business cycle long enough? Is 10,000 loans enough? These questions do not have hard and fast answers.


Many questions pertaining to CECL model validations do not yet have hard and fast answers. In some cases, the answers will vary by institution as different banks adopt different policies. Industry best practices will doubtless emerge in response to others. For the rest, model validators will need to rely on judgment, sometimes having to balance statistical principles with accounting policy realities. The first CECL model validations are around the corner. It’s not too early to begin thinking about how to address these questions.

Augmenting Internal Loan Data to Comply with CECL and Boost Profit

The importance of sound internal data gathering practices cannot be understated. However, in light of the new CECL standard, many lending institutions have found themselves unable to meet the data requirements. This may have served as a wake-up call for organizations at all levels to look at their internal data warehousing systems and identify and remedy the gaps in their strategies. For some institutions, it may be difficult to consolidate data siloed within various stand-alone systems. Other institutions, even after consolidating all available data, may lack sufficient loan count, timespan, or data elements to meet the CECL standard with internal data alone. This post will discuss some of the strategies to make up for shortfalls while data gathering systems and procedures are built and implemented for the future.


Identify Your Data

The first step is to identify the data that is available. As many tasks go, this is easier said than done. Often, organizations without formal data gathering practices and without a centralized data warehouse find themselves looking at multiple data storage systems across various departments and a multitude of ad-hoc processes implemented in time of need and not upgraded to a standardized solution. However, it is important to begin this process now, if it is not already underway.

As part of the data identification phase, it is important to keep track of not only the available variables, but also the length of time for which the data exists, and whether any time periods have missing or unreliable information. In most circumstances, to meet the CECL standard, institutions should have loan performance data that will cover a full economic cycle by the time of CECL adoption. Such data enables an institution to form grounded expectations of how assets will perform over their full contractual lives, across a variety of potential economic climates.

Some data points are required regardless of the CECL methodology, while others are necessary only for certain approaches. At this part of the data preparation process, it is more important to understand the big picture than it is to confirm only some of the required fields—it is wise to see what information is available, even if it may not appear relevant at this time. This will prove very useful for drafting the data warehousing procedures, and will allow for a more transparent understanding of requirements should the bank decide to use a different methodology in the future.


Choose Your CECL Methodology

There are many intricacies involved in choosing a CECL Methodology. Each organization should determine both its capabilities and its needs. For example, the Vintage method has relatively simple calculations and limited data requirements, but provides little insight and control for management, and does not yield early model performance indicators. On the other hand, the Discounted Cash Flow method provides many insights and controls, and identifies model performance indicators preemptively, but requires more complex calculations and a very robust data history.

It is acceptable to implement a relatively simple methodology at first and begin utilizing more advanced methodologies in the future. Banks with limited historical data, but with procedures in place to ramp up data gathering and data warehousing capabilities, would be well served to implement a method for which all data needs are met. They can then work toward the goal of implementing a more robust methodology once enough historical data is available.

However, if insufficient data exists to effectively implement a satisfactory methodology, it may be necessary to augment existing historical data with proxy data as a bridge solution while your data collections mature.


Augment Your Internal Data

Choose Proxy Data

Search for cost-effective datasets that give historical loan performance information about portfolios that are reasonably similar to your go-forward portfolio. Note that proxy portfolios do not need to perfectly resemble your portfolio, so long as either a) the data provider offers filtering capability that enables you to find the subset of proxy loans that matches your portfolio’s characteristics, or b) you employ segment- or loan-level modeling techniques that apply the observations from the proxy dataset in the proportions that are relevant to your portfolio.

RiskSpan’s Edge platform contains a Data Library that offers historical loan performance datasets from a variety of industry sources covering multiple asset classes:

  • For commercial real estate (CRE) portfolios, we host loan-level data on all CRE loans guaranteed by the Small Business Administration (SBA) dating back to 1990. Data on loans underlying CMBS securitizations dating back to 1998, compiled by Trepp, is also available on the RiskSpan platform.
  • For commercial and industrial (C&I) portfolios, we also host loan-level on all C&I loans guaranteed by the SBA dating back to 1990.
  • For residential mortgage loan portfolios, we offer large agency datasets (excellent, low-cost options for portfolios that share many characteristics with GSE portfolios) and non-agency datasets (for portfolios with unique characteristics or risks).
  • By Q3 2018, we will also offer data for auto loan portfolios and reverse mortgage portfolios (Home Equity Conversion Mortgages).

Note that for audit purposes, limitations of proxy data and consequent assumptions for a given portfolio need to be clearly outlined, and all severe limitations addressed. In some cases, multiple proxy datasets may be required.

At this stage, it is important to ensure that the proxy data contains all the data required by the chosen CECL methodology. If such proxy data is not available, a different CECL model may be best.


Prepare Your Data

The next step is to prepare internal data for augmentation. This includes standard data-keeping practices, such as accurate and consistent data headers, unique keys such as loan numbers and reporting dates, and confirmation that no duplicates exist. Depending on the quality of internal data, additional analysis may also be required. For example, all data fields need to be displayed in a consistent format according to the data type, and invalid data points, such as FICO scores outside the acceptable range, need to be cleansed. If the data is assembled manually, it is prudent to automate the process to minimize the possibility of user error. If automation is not possible, it is important to implement data quality controls that verify that the dataset is generated according to the metadata rules.

This stage provides the final opportunity to identify any data quality issues that may have been missed. For example, if, after cleansing the data for invalid FICO scores, it appears that the dataset has many invalid entries, further analysis may be required, especially if borrower credit score is one of the risk metrics used for CECL modeling.

Once internal data preparation is complete, proxy metadata may need to be modified to be consistent with internal standards. This includes data labels and field formats, as well as data quality checks to ensure that consistent criteria are used across all datasets.


Identify Your Augmentation Strategy

Once the internal data is ready and its limitations identified, analysts need to confirm that the proxy data addresses these gaps. Note that it is assumed at this stage that the proxy data chosen contains information for loans that are consistent with the internal portfolio, and that all proxy metadata items are consistent with internal metadata.

For example, if internal data is robust, but has a short history, proxy data needs to cover the additional time periods for the life of the asset. In such cases, augmenting internal data is relatively simple: the datasets are joined, and tested to ensure that the join was successful. Testing should also cover the known limitations of the proxy data, such as missing non-required fields or other data quality issues deemed acceptable during the research and analysis phase.

More often, however, there is a combination of data shortfalls that lead to proxy data needs, which can include either time-related gaps, data element gaps, or both. In such cases, the augmentation strategy is more complex.

In the cases of optional data elements, a decision to exclude certain data columns is acceptable. However, when incorporating required elements that are inputs for the allowance calculation, the data must be used in a way that complies with regulatory requirements. If internal data has incomplete information for a given variable, statistical methods and machine learning tools are useful to incorporate the proxy data with the internal data, and approximate the missing variable fields. Statistical testing is then used to verify that the relationships between actual and approximated figures are consistent with expectation, which are then verified by management or expert analysis. External research on economic or agency data, where applicable, can further be used to justify the estimated data assumptions. While rigorous statistical analysis is integral for the most accurate metrics, the qualitative analysis that follows is imperative for CECL model documentation and review.


Justify Your Proxy Data

Overlaps in time periods between internal loan performance datasets and proxy loan performance datasets are critical in establishing the applicability of the proxy dataset. A variety of similarity metrics can be calculated that compare the performance of the proxy loans with the internal loan during the period of overlap. Such similarity metrics can be put forward to justify the use of the proxy dataset. The proxy dataset can be useful for predictions even if the performance of the proxy loans is not identical to the performance of the institutions’ loans. As long as there is a reliable pattern linking the performance of the two datasets, and no reason to think that pattern will discontinue, a risk-adjusting calibration can be justified and applied to the proxy data, or to results of models built thereon.


Why Augment Internal Data?

While the task of choosing the augmentation strategy may seem daunting, there are concrete benefits to supplementing internal data with a proxy, rather than using simply the proxy data on its own.

Most importantly, for the purpose of calculating the allowance for a given portfolio, incorporating some of the actual values will in most cases produce the most accurate estimate. For example, your institution may underwrite loans conservatively relative to the rest of the industry—incorporating at least some of the actual data associated with the lending practices will make it easier to understand how the proxy data differs from characteristics unique to your business.

More broadly, proxy data is useful beyond CECL reporting, and has other applications that can boost bank profits. For example, lending institutions can build better predictive models based on richer datasets to calibrate loan screening and loan pricing decisions. These datasets can also be built into existing models to provide better insight on risk metrics and other asset characteristics, and to allow for more fine-tuned management decisions.

RiskSpan Director David Andrukonis Featured on The Purposeful Banker Podcast

RiskSpan’s CECL Soution Director David Andrukonis was a featured guest on PrecisionLender’s podcast, The Purposeful Banker in their recent episode titled “Is your Bank Ready for CECL”

David summarized the major takeaways from a recent CECL conference, including regulator signals of forthcoming capital relief and emerging practices around reasonable and supportable forecast period length (16:19); outlined how RiskSpan is helping banks prepare for the new accounting standard (3:47); and offered ways that banks can stay current on continuing CECL developments (23:42).

You can listen to the entire episode of the podcast on their SoundCloud account:


Choosing a CECL Methodology

CECL presents institutions with a vast array of choices when it comes to CECL loss estimation methodologies. It can seem a daunting challenge to winnow down the list of possible methods. Institutions must consider considering competing concerns – including soundness and auditability, cost and feasibility, and the value of model reusability. Institutions must convince not only themselves but also external stakeholders that their methodology choices are reasonable, and often on a segment by segment basis, as methodology can vary by segment. It benefits banks, however, to narrow the field of CECL methodology choices soon so that they can finalize data preparation and begin parallel testing (generating CECL results alongside incurred-loss allowance estimates). Parallel testing generates advance signals of CECL impact and may itself play a role in the final choice of allowance methodology. In this post, we provide an overview of some of the most common loss estimation methodologies that banks and credit unions are considering for CECL, and outline the requirements, advantages and challenges of each.

Methods to Estimate Lifetime Losses

The CECL standard explicitly mentions five loss estimation methodologies, and these are the methodologies most commonly considered by practitioners. Different practitioners define them differently. Additionally, many sound approaches combine elements of each method. For this analysis, we will discuss them as separate methods, and use the definitions that most institutions have in mind when referring to them:

  1. Vintage,
  2. Loss Rate,
  3. PDxLGD,
  4. Roll Rate, and
  5. Discount Cash Flow (DCF).

While CECL allows the use of other methods—for example, for estimating losses on individual collateral-dependent loans—these five methodologies are the most applicable to the largest subset of assets and institutions.  For most loans, the allowance estimation process entails grouping loans into segments, and for each segment, choosing and applying one of the methodologies above. A common theme in FASB’s language regarding CECL methods is flexibility: rather than prescribing a formula, FASB expects that the banks consider historical patterns and the macroeconomic and credit policy drivers thereof, and then extrapolate based on those patterns, as well as each individual institution’s macroeconomic outlook. The discussion that follows demonstrates some of this flexibility within each methodology but focuses on the approach chosen by RiskSpan based on our view of CECL and our industry experience. We will first outline the basics of each methodology, followed by their data requirements, and end with the advantages and challenges of each approach.  

Vintage Method

Using the Vintage method, historical losses are tabulated by vintage and by loan age, as a percentage of origination balances by vintage year. In the example below, known historical values appear in the white cells, and forecasted values appear in shaded cells. We will refer to the entire shaded region as the “forecast triangle” and the cells within the forecast triangle as “forecast cells.”[/vc_column_text][/vc_column][/vc_row]

A simple way to populate the forecast cells is with the simple average of the known values from the same column. In other words, we calculate the average marginal loss rate for loans of each age and extrapolate that forward. The limitation of this approach is that it does not differentiate loss forecasts based on the bank’s macroeconomic outlook, which is a core requirement of CECL, so a bank using this method will need to incorporate its macroeconomic outlook via management adjustments and qualitative factors (Q-factors).

As an alternative methodology, RiskSpan has developed an approach to forecast the loss triangle using statistical regression, developing a regression model that estimates the historical loss rates in the vintage matrix as a function of loan age, a credit indicator, and a macroeconomic variable, and then applies that regression equation along with a forecast for the macroeconomic variable (and a mean-reversion process) to populate the forecast triangle. The forecast cells can still be adjusted by management as desired, and/or Q-factors can be used. We caution, however, that management should take care not to double-count the influence of macroeconomics on allowance estimates (i.e., once via models, and again via Q-factors)

Once the results of the regression are ready and adjustments are applied where needed, the final allowance can be derived as follows:

Loss Rate Method

Using the Loss Rate method, the average lifetime loss rate is calculated for historical static pools within a segment. This average lifetime loss rate of a is used as the basis to predict the lifetime loss rate of the current static pool—that is, the loans on the reporting-date balance sheet.

In this context, a static pool refers to a group of loans that were on the balance sheet as of a particular date, regardless of when they were originated. For example, within an institutions’ owner-occupied commercial real estate portfolio, the 12/31/06 static pool would refer to all such loans that were on the institution’s balance sheet as of December 31, 2006. We would measure the lifetime losses of such a static pool beginning on the static pool date (December 31, 2006, in this example) and express those losses as a percentage of the balance that existed on the static pool date. This premise is consistent with what CECL asks us to do, i.e., estimate all future credit losses on the loans on the reporting-date balance sheet.

A historical static pool fully aged if all loans that made up the pool are either paid in full or charged off, where payments in full include renewals that satisfy the original contract. We should be wary of including partially aged static pools in the development of average lifetime loss estimates, because the cumulative loss rates of partially aged pools constitute life-to-date loss rates rather than complete lifetime loss rates, and inherently understates the lifetime loss rate that is required by CECL.

To generate the most complete picture of historical losses, RiskSpan constructs multiple overlapping static pools within the historical dataset of a given segment and calculates the average of the lifetime loss rates of all fully aged static pools.  This provides an average lifetime loss rate over a business cycle as the soundest basis for a long-term forecast. This technique also allows, but does not require, the use of statistical techniques to estimate lifetime loss rate as a function of the credit mix of a static pool.

After the average lifetime loss rate has been determined, we can incorporate management’s view of how the forward-looking environment will differ from the lookback period over which the lifetime loss rates were calculated, via Q-Factors.

The final allowance can be derived as follows:

PDxLGD Method

Methods ranging from very simple to very sophisticated go by the name “PD×LGD.” At the most sophisticated end of the spectrum are models that calculate loan-by-loan, month-by-month, macro-conditioned probabilities of default and corresponding loss given default estimates. Such estimates can be used in a discounted cash flow context. These estimates can also be used outside of a cash flow context; we can summarize these monthly estimates into a cumulative default probability and corresponding exposure-at-default and loss-given-default estimates, which yield a single lifetime loss rate estimate. At the simpler end of the spectrum are calculations of the lifetime default rates and corresponding loss given default rates of static pools (not marginal monthly or annual default rates). This simpler calculation is the method that most institutions have in mind when referring to “PD×LGD methods,” so it is the definition we will use here.

Using this PDxLGD method, the loss rate is calculated based on the same static pool concept as that of the Loss Rate method. As with the Loss Rate method, we can use the default rates and loss given default rates of different static pools to quantify the relationship between those rates and the credit mix of the segment, and to use that relationship going forward based on the credit mix of today’s portfolio. However, under PDxLGD, the loss rate is a function of two components: the lifetime default rate (PD), and the loss given default (LGD).  The final allowance can be derived as follows:

Because the PDxLGD and Loss Rate methods derive the Expected Loss Rate for the segment using different but related approaches, one of the important quality controls is to verify that the final calculated rates are equal under both methodologies, and that the cause of any discrepancies is investigated.

Roll Rate Method

Using the Roll Rate method, ultimate losses are predicted based on historical roll rates and the historical loss given default estimate.  Roll rates are either (a) the frequency with which loans transition from one delinquency status to another, or (b) the frequency with which loans “migrate” or “transition” from one risk grade to another.  While the former is preferred due to its transparency and objectivity, for institutions with established risk grades, the latter is an appropriate metric.

Under this method, management can apply adjustments for macroeconomic and other factors at the individual roll rate level, as well as on-top adjustments as needed. Roll rate matrices can included prepayment as a possible transition, thereby incorporating prepayment probabilities. Roll rates can be used in a cash flow engine that incorporates contractual loan features and generates probabilistic (expected) cash flows, or outside of a cash flow engine to generate expected chargeoffs of amortized cost. Finally, it is possible to use statistical regression techniques to express roll rates as a function of macroeconomic variables, and thus, to condition future roll rates on macroeconomic expectations.

The final allowance can be derived as follows:

Discounted Cash Flow (DCF) Method

Discounting cash flows is a way of translating expected future cash flows into a present value. DCF is a loan-level method (even for loans grouped into segments), and thus requires loan-by-loan, month-by-month forecasts of prepayment, default, and loss-given-default forecasts to translate contractual cash flows into prepay-, default-, and loss-given-default-adjusted cash flows. Although such loan-level, monthly forecasts could be derived using any method, most institutions have statistical forecasting techniques in mind when thinking about a DCF approach. Thus, even though statistical forecasting techniques and cash flow discounting are not inextricably linked, we will treat them as a pair here.

The most complex, and the most robust, of the five methodologies, DCF (paired with statistical forecasting techniques) is generally used by larger institutions that have the capacity and the need for the greatest amount of insight and control. Critically, DCF capabilities give institutions the ability (when substituting the effective interest rate for a market-observed discount rate) to generate fair value estimates that can serve a host of accounting and strategic purposes.

To estimate future cash flows, RiskSpan uses statistical models, which comprise:

  • Prepayment sub-models
  • Probability-of-default or roll rate sub-models
  • Loss-given-default sub-models

Allowance is then determined based on the expected cash flows, which, similarly to the Roll Rate method, are generated based on the rates predicted by the statistical models, contractual loan terms, and the loan status at the reporting date.

Some argue that an advantage of the discounted cash flow approach is lower Day 1 losses. Whether DCF or non-DCF methods produce a lower Day 1 allowance, all else equal, depends upon the length of the assumed liquidation timeline, the discount rate, and the recovery rate. This is an underdiscussed topic that merits its own blog post. We will cover this fully in a future post.

The statistical models often used with DCF methods use historical data to express the likelihood of default or prepayment as a mathematical function of loan-level credit factors and macroeconomic variables.

For example, the probability of  transitioning from “Current” status to “Delinquent” at montht can be calculated as a function of that loan’s loan age at  multiplied by a sensitivity factor β1 on the loan age variable derived based on the data in the historical dataset, the loan’s FICO multiplied by a sensitivity factor β2, and the projected unemployment rate based on management’s macroeconomic assumptions at montht multiplied by a sensitivity factor β3.  Mathematically,

Because macroeconomic and loan-level credit factors are explicitly and transparently incorporated into the forecast, such statistical techniques reduce reliance on Q-Factors. This is one of the reasons why such methods are the most scientific.

Historical Data Requirements

The table below summarizes the historical data requirements for each methodology, including the dataset type, the minimum required data fields, and the timespan.

In conclusion, having the most robust data allows the most options; for institutions with moderately complex historical datasets, Loss Rate, PDxLGD, and Vintage are excellent options.  With limited historical data, the Vintage method can produce a sound allowance under CECL.

While the data requirements may be daunting, it is important to keep in mind that proxy data can be used in place of, or alongside, institutional historical data, and RiskSpan can help identify and fill your data needs.  Some of the proxy data options are summarized below:

Advantages and Challenges of CECL Methodologies

Each methodology has advantages, and each carries its own set of challenges.  While the Vintage method, for example, is forgiving to limited historical data, it also provides limited insight and control for further analysis.  On the other hand, the DCF method provides significant insight and control, as well as early model performance indicators, but requires a robust dataset and advanced statistical expertise.

We have summarized some of the advantages and challenges for each method below.

In addition to the considerations summarized in the table, it is important to consider audit and regulatory requirements. Generally, institutions facing higher audit and regulatory scrutiny will be steered toward more complex methods. Also, bankers who intend to leverage the loan forecasting model they use for CECL for strategic decision-making (for example, loan screening and pricing decisions), and who desire granular insight and dials around their allowance numbers, will gravitate toward methodologies that afford more precision. At the other end of the spectrum, the methods that provide less precision and insight generally come with lighter operational burden.

Choosing Your CECL Methodology

Choosing the method that’s right for you depends on many factors, from historical data availability to management objectives and associated operational costs.

In many cases, management can gain a better understanding of the institutional allowance requirements after analyzing the results determined by multiple complementary approaches.

RiskSpan is willing to talk further with individual institutions about their circumstances, as well as generate sample results using a set of various methodologies.

Hands-On Machine Learning–Predicting Loan Delinquency

The ability of machine learning models to predict loan performance makes them particularly interesting to lenders and fixed-income investors. This expanded post provides an example of applying the machine learning process to a loan-level dataset in order to predict delinquency. The process includes variable selection, model selection, model evaluation, and model tuning.

The data used in this example are from the first quarter of 2005 and come from the publicly available Fannie Mae performance dataset. The data are segmented into two different sets: acquisition and performance. The acquisition dataset contains 217,000 loans (rows) and 25 variables (columns) collected at origination (Q1 2005). The performance dataset contains the same set of 217,000 loans coupled with 31 variables that are updated each month over the life of the loan. Because there are multiple records for each loan, the performance dataset contains approximately 16 million rows.

For this exercise, the problem is to build a model capable of predicting which loans will become severely delinquent, defined as falling behind six or more months on payments. This delinquency variable was calculated from the performance dataset for all loans and merged with the acquisition data based on the loan’s unique identifier. This brings the total number of variables to 26. Plenty of other hypotheses can be tested, but this analysis focuses on just this one.

1          Variable Selection

An overview of the dataset can be found below, showing the name of each variable as well as the number of observations available

LOAN_IDENTIFIER                             217088
CHANNEL                                     217088
SELLER_NAME                                 217088
ORIGINAL_INTEREST_RATE                      217088
ORIGINAL_LOAN_TERM                          217088
ORIGINATION_DATE                            217088
FIRST_PAYMENT_DATE                          217088
ORIGINAL_LOAN-TO-VALUE_(LTV)                217088
NUMBER_OF_BORROWERS                         217082
DEBT-TO-INCOME_RATIO_(DTI)                  201580
BORROWER_CREDIT_SCORE                       215114
LOAN_PURPOSE                                217088
PROPERTY_TYPE                               217088
NUMBER_OF_UNITS                             217088
OCCUPANCY_STATUS                            217088
PROPERTY_STATE                              217088
ZIP_(3-DIGIT)                               217088
PRODUCT_TYPE                                217088
CO-BORROWER_CREDIT_SCORE                    100734
MORTGAGE_INSURANCE_TYPE                      34432

Most of the variables in the dataset are fully populated, with the exception of DTI, MI Percentage, MI Type, and Co-Borrower Credit Score. Many options exist for dealing with missing variables, including dropping the rows that are missing, eliminating the variable, substituting with a value such as 0 or the mean, or using a model to fill the most likely value.

The following chart plots the frequency of the 34,000 MI Percentage values.

The distribution suggests a decent amount of variability. Most loans that have mortgage insurance are covered at 25%, but there are sizeable populations both above and below. Mortgage insurance is not required for the majority of borrowers, so it makes sense that this value would be missing for most loans.  In this context, it makes the most sense to substitute the missing values with 0, since 0% mortgage insurance is an accurate representation of the state of the loan. An alternative that could be considered is to turn the variable into a binary yes/no variable indicating if the loan has mortgage insurance, though this would result in a loss of information.

The next variable with a large number of missing values is Mortgage Insurance Type. Querying the dataset reveals that that of the 34,400 loans that have mortgage insurance, 33,000 have type 1 borrower paid insurance and the remaining 1,400 have type 2 lender paid insurance. Like the mortgage insurance variable, the blank values can be filled. This will change the variable to indicate if the loan has no insurance, type 1, or type 2.

The remaining variable with a significant number of missing values is Co-Borrower Credit Score, with approximately half of its values missing. Unlike MI Percentage, the context does not allow us to substitute missing values with zeroes. The distribution of both borrower and co-borrower credit score as well as their relationship can be found below.

As the plot demonstrates, borrower and co-borrower credit scores are correlated. Because of this, the removal of co-borrower credit score would only result in a minimal loss of information (especially within the context of this example). Most of the variance captured by co-borrower credit score is also captured in borrower credit score. Turning the co-borrower credit score into a binary yes/no ‘has co-borrower’ variable would not be of much use in this scenario as it would not differ significantly from the Number of Borrowers variable. Alternate strategies such as averaging borrower/co-borrower credit score might work, but for this example we will simply drop the variable.

In summary, the dataset is now smaller—Co-Borrower Credit Score has been dropped. Additionally, missing values for MI Percentage and MI Type have been filled in. Now that the data have been cleaned up, the values and distributions of the remaining variables can be examined to determine what additional preprocessing steps are required before model building. Scatter matrices of pairs of variables and distribution plots of individual variables along the diagonal can be found below. The scatter plots are helpful for identifying multicollinearity between pairs of variables, and the distributions can show if a variable lacks enough variance that it won’t contribute to model performance.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_single_image image=”1089″][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]The third row of scatterplots, above, reflects a lack of variability in the distribution of Original Loan Term. The variance of 3.01 (calculated separately) is very small, and as a result the variable can be removed—it will not contribute to any model as there is very little information to learn from. This process of inspecting scatterplots and distributions is repeated for the remaining pairs of variables. The Number of Units variable suffers from the same issue and can also be removed.

2          Heatmaps and Pairwise Grids

Matrices of scatterplots are useful for looking at the relationships between variables. Another useful plot is a heatmap and pairwise grid of correlation coefficients. In the plot below a very strong correlation between Original LTV and Original CLTV is identified.

This multicollinearity can be problematic for both the interpretation of the relationship between the variables and delinquency as well as the actual performance of some models.  To combat this problem, we remove Original CLTV because Original LTV is a more accurate representation of the loan at origination. Loans in this population that were not refinanced kept their original LTV value as CLTV. If CLTV were included in the model it would introduce information not available at origination to the model. The problem of allowing unexpected additional information in a dataset introduces an issue known as leakage, which will bias the model.

Now that the numeric variables have been inspected, the remaining categorical variables must be analyzed to ensure that the classes are not significantly unbalanced. Count plots and simple descriptive statistics can be used to identify categorical variables are problematic. Two examples below show the count of loans by state and by seller.

Inspecting the remaining variables uncovers that Relocation Indicator (indicating a mortgage issued when an employer moves an employee) and Product Type (fixed vs. adjustable rate) must be removed as they are extremely unbalanced and do not contain any information that will help the models learn. We also removed first payment date and origination date, which were largely redundant. The final cleanup results in a dataset that contains the following columns:


The final two steps before model building are to standardize each of the numeric variables and turn each categorical variable into a series of dummy or indicator variables. Numeric variables are scaled with mean 0 and standard deviation 1 so that it is easier to compare variables that have a different scale (e.g. interest rate vs. LTV). Additionally, standardizing is also a requirement for many algorithms (e.g. principal component analysis).

Categorical variables are transformed by turning each value of the variable into its own yes/no feature. For example, Property State originally has 50 possible values, so it will be turned into 50 variables (e.g. Alabama yes/no, Alaska yes/no).  For categorical variables with many values this transformation will significantly increase the number of variables in the model.

After scaling and transforming the dataset, the final shape is 199,716 rows and 106 columns. The target variable—loan delinquency—has 186,094 ‘no’ values and 13,622 ‘yes’ values. The data are now ready to be used to build, evaluate, and tune machine learning models.

3          Model Selection

Because the target variable loan delinquency is binary (yes/no) the methods available will be classification machine learning models. There are many classification models, including but not limited to: neural networks, logistic regression, support vector machines, decision trees and nearest neighbors. It is always beneficial to seek out domain expertise when tackling a problem to learn best practices and reduce the number of model builds. For this example, two approaches will be tried—nearest neighbors and decision tree.

The first step is to split the dataset into two segments: training and testing. For this example, 40% of the data will be partitioned into the test set, and 60% will remain as the training set. The resulting segmentations are as follows:

1.       60% of the observations (as training set)- X_train

2.       The associated target (loan delinquency) for each observation in X_train- y_train

3.       40% of the observations (as test set)- X_test

4.        The targets associated with the test set- y_test

Data should be randomly shuffled before they are split, as datasets are often in some type of meaningful order. Once the data are segmented the model will first be exposed to the training data to begin learning.

4          K-Nearest Neighbors Classifier

Training a K-neighbors model requires the fitting of the model on X_train (variables) and y_train (target) training observations. Once the model is fit, a summary of the model hyperparameters is returned. Hyperparameters are model parameters not learned automatically but rather are selected by the model creator.


The K-neighbors algorithm searches for the closest (i.e., most similar) training examples for each test observation using a metric that calculates the distance between observations in high-dimensional space.  Once the nearest neighbors are identified, a predicted class label is generated as the class that is most prevalent in the neighbors. The biggest challenge with a K-neighbors classifier is choosing the number of neighbors to use. Another significant consideration is the type of distance metric to use.

To see more clearly how this method works, the 6 nearest neighbors of two random observations from the training set were selected, one that is a non-default (0 label) observation and one that is not.

Random delinquent observation: 28919 
Random non delinquent observation: 59504

The indices and minkowski distances to the 6 nearest neighbors of the two random observations are found below. Unsurprisingly, the first nearest neighbor is always itself and the first distance is 0.

Indices of closest neighbors of obs. 28919 [28919 112677 88645 103919 27218 15512]
Distance of 5 closest neighbor for obs. 28919 [0 0.703 0.842 0.883 0.973 1.011]

Indices of 5 closest neighbors for obs. 59504 [59504 87483 25903 22212 96220 118043]
Distance of 5 closest neighbor for obs. 59504 [0 0.873 1.185 1.186 1.464 1.488]

Recall that in order to make a classification prediction, the kneighbors algorithm finds the nearest neighbors of each observation. Each neighbor is given a ‘vote’ via their class label, and the majority vote wins. Below are the labels (or votes) of either 0 (non-delinquent) or 1 (delinquent) for the 6 nearest neighbors of the random observations. Based on the voting below, the delinquent observation would be classified correctly as 3 of the 5 nearest neighbors (excluding itself) are also delinquent. The non-delinquent observation would also be classified correctly, with 4 of 5 neighbors voting non-delinquent.

Delinquency label of nearest neighbors- non delinquent observation: [0 1 0 0 0 0]
Delinquency label of nearest neighbors- delinquent observation: [1 0 1 1 0 1]


5          Tree-Based Classifier

Tree based classifiers learn by segmenting the variable space into a number of distinct regions or nodes. This is accomplished via a process called recursive binary splitting. During this process observations are continuously split into two groups by selecting the variable and cutoff value that results in the highest node purity where purity is defined as the measure of variance across the two classes. The two most popular purity metrics are the gini index and cross entropy. A low value for these metrics indicates that the resulting node is pure and contains predominantly observations from the same class. Just like the nearest neighbor classifier, the decision tree classifier makes classification decisions by ‘votes’ from observations within each final node (known as the leaf node).

To illustrate how this works, a decision tree was created with the number of splitting rules (max depth) limited to 5. An excerpt of this tree can be found below. All 120,000 training examples start together in the top box. From top to bottom, each box shows the variable and splitting rule applied to the observations, the value of the gini metric, the number of observations the rule was applied to, and the current segmentation of the target variable. The first box indicates that the 6th variable (represented by the 5th index ‘X[5]’) Borrower Credit Score was  used to  split  the  training  examples.  Observations where the value of Borrower Credit Score was below or equal to -0.4413 follow the line to the box on the left. This box shows that 40,262 samples met the criteria. This box also holds the next splitting rule, also applied to the Borrower Credit Score variable. This process continues with X[2] (Original LTV) and so on until the tree is finished growing to its depth of 5. The final segments at the bottom of the tree are the aforementioned leaf nodes which are used to make classification decisions.  When making a prediction on new observations, the same splitting rules are applied and the observation receives the label of the most commonly occurring class in its leaf node.

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_single_image image=”1086″][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]A more advanced tree based classifier is the Random Forest Classifier. The Random Forest works by generating many individual trees, often hundreds or thousands. However, for each tree, number of variables considered at each split is limited to a random subset. This helps reduce model variance and de-correlate the trees (since each tree will have a different set of available splitting choices). In our example, we fit a random forest classifier on the training data. The resulting hyperparameters and model documentation indicate that by default the model generates 10 trees, considers a random subset of variables the size of the square root of all variables (approximately 10 in this case), has no depth limitation, and only requires each leaf node to have 1 observation.

Since the random forest contains many trees and does not have a depth limitation, it is incredibly difficult to visualize. In order to better understand the model, a plot showing which variables were selected and resulted in the largest drop in the purity metric (gini index) can be useful. Below are the top 10 most important variables in the model, ranked by the total (normalized) reduction to the gini index.  Intuitively, this plot can be described as showing which variables can be used to best segment the observations into groups that are predominantly one class, either delinquent and non-delinquent.


6          Model Evaluation

Now that the models have been fitted, their performance must be evaluated. To do this, the fitted model will first be used to generate predictions on the test set (X_test). Next, the predicted class labels are compared to the actual observed class label (y_test). Three of the most popular classification metrics that can be used to compare the predicted and actual values are recall, precision, and the f1-score. These metrics are calculated for each class, delinquent and not-delinquent.

Recall is calculated for each class as the ratio of events that were correctly predicted. More precisely, it is defined as the number of true positive predictions divided by the number of true positive predictions plus false negative predictions. For example, if the data had 10 delinquent observations and 7 were correctly predicted, recall for delinquent observations would be 7/10 or 70%.

Precision is the number of true positives divided by the number of true positives plus false positives. Precision can be thought of as the ratio of events correctly predicted to the total number of events predicted. In the hypothetical example above, assume that the model made a total of 14 predictions for the label delinquent. If so, then the precision for delinquent predictions would be 7/14 or 50%.

The f1 score is calculated as the harmonic mean of recall and precision: (2(Precision*Recall/Precision+Recall)).

The classification reports for the K-neighbors and decision tree below show the precision, recall, and f1 scores for label 0 (non-delinquent) and 1 (delinquent).


There is no silver bullet for choosing a model—often it comes down to the goals of implementation. In this situation, the tradeoff between identifying more delinquent loans at the cost of misclassification can be analyzed with a specific tool called a roc curve.  When the model predicts a class label, a probability threshold is used to make the decision. This threshold is set by default at 50% so that observations with more than a 50% chance of membership belong to one class and vice-versa.

The majority vote (of the neighbor observations or the leaf node observations) determines the predicted label. Roc curves allow us to see the impact of varying this voting threshold by plotting the true positive prediction rate against the false positive prediction rate for each threshold value between 0% and 100%.

The area under the ROC curve (AUC) quantifies the model’s ability to distinguish between delinquent and non-delinquent observations.  A completely useless model will have an AUC of .5 as the probability for each event is equal. A perfect model will have an AUC of 1 as it is able to perfectly predict each class.

To better illustrate, the ROC curves plotting the true positive and false positive rate on the held-out test set as the threshold is changed are plotted below.

7          Model Tuning

Up to this point the models have been built and evaluated using a single train/test split of the data. In practice this is often insufficient because a single split does not always provide the most robust estimate of the error on the test set. Additionally, there are more steps required for model tuning. To solve both of these problems it is common to train multiple instances of a model using cross validation. In K-fold cross validation, the training data that was first created gets split into a third dataset called the validation set. The model is trained on the training set and then evaluated on the validation set. This process is repeated times, each time holding out a different portion of the training set to validate against. Once the model has been tuned using the train/validation splits, it is tested against the held out test set just as before. As a general rule, once data have been used to make a decision about the model they should never be used for evaluation.

8          K-Nearest Neighbors Tuning

Below a grid search approach is used to tune the K-nearest neighbors model. The first step is to define all of the possible hyperparameters to try in the model. For the KNN model, the list nk = [10, 50, 100, 150, 200, 250] specifies the number of nearest neighbors to try in each model. The list is used by the function GridSearchCV to build a series of models, each using the different value of nk. By default, GridSearchCV uses 3-fold cross validation. This means that the model will evaluate 3 train/validate splits of the data for each value of nk. Also specified in GridSearchCV is the scoring parameter used to evaluate each model. In this instance it is set to the metric discussed earlier, the area under the roc curve. GridSearchCV will return the best performing model by default, which can then be used to generate predictions on the test set as before. Many more values of could be specified to search through, and the default minkowski distance could be set to a series of metrics to try. However, this comes at a cost of computation time that increases significantly with each added hyperparameter.


In the plot below the mean training and validation scores of the 3 cross-validated splits is plotted for each value of K. The plot indicates that for the lower values of the model was overfitting the training data and causing lower validation scores. As increases, the training score lowers but the validation score increases because the model gets better at generalizing to unseen data.

9               Random Forest Tuning

There are many hyperparameters that can be adjusted to tune the random forest model. We use three in our example: n_estimatorsmax_features, and min_samples_leafN_estimators refers to the number of trees to be created. This value can be increased substantially, so the search space is set to list estimators. Random Forests are generally very robust to overfitting, and it is not uncommon to train a classifier with more than 1,000 trees. Second, the number of variables to be randomly considered at each split can be tuned via max_features. Having a smaller value for the number of random features is helpful for decorrelating the trees in the forest, which is especially useful when multicollinearity is present. We tried a number of different values for max_features, which can be found in the list features. Finally, the number of observations required in each leaf node is tuned via the min_samples_leaf parameter and list samples.


The resulting plot, below, shows a subset of the grid search results. Specifically, it shows the mean test score for each number of trees and leaf size when the number of random features considered at each split is limited to 5. The plot demonstrates that the best performance occurs with 500 trees and a requirement of at least 5 observations per leaf. To see the best performing model from the entire grid space the best estimator method can be used.

By default, parameters of the best estimator are assigned to the GridSearch object (cvknc and cvrfc). This object can now be used generate future predictions or predicted probabilities. In our example, the tuned models are used to generate predicted probabilities on the held out test set. The resulting

ROC curves show an improvement in the KNN model from an AUC of .62 to .75. Likewise, the tuned Random Forest AUC improves from .64 to .77.

Predicting loan delinquency using only origination data is not an easy task. Presumably, if significant signal existed in the data it would trigger a change in strategy by MBS investors and ultimately origination practices. Nevertheless, this exercise demonstrates the capability of a machine learning approach to deconstruct such an intricate problem and suggests the appropriateness of using machine learning model to tackle these and other risk management data challenges relating to mortgages and a potentially wide range of asset classes.

Talk Scope

Basel III Capital Requirements and CECL

With the upcoming implementation of IFRS 9 in 2018, the discussion of Basel III capital requirements and CECL / IFRS 9 is gaining importance. The relationship between capital and provisions for loan-loss has been a topic of discussion as the world moves towards mandating loss provisioning by looking out over the life of a financial asset. How will this new credit-loss approach for provisioning affect regulatory capital? The Basel Committee on Banking Supervision (BCBS) has begun addressing this question in a series of documents now available at  This post summarizes some key takeaways from these publications.

“Accounting changes will affect bank regulatory capital.”

— Basel Committee on Banking Supervision

Basel III Capital Requirements Update

  1. Banks and credit unions need to think about the impact of CECL on regulatory capital now and
    factor it into their capital planning.
  2. As of March 2017, Basel has elected to retain the current regulatory treatment of accounting provisions for an interim period.
  3. At this point, the implementation of CECL is expected to lower CET1 capital because it will increase loss provisions.
  4. The BCBS is setting forth transitional arrangements to take effect January 1, 2018, for jurisdictions that choose to implement them. Under these arrangements, adjustments to CET1 capital (and by extension, to other regulatory capital measures) will be incrementally phased in. It was important for BCBS to establish these before the implementation of IFRS 9.

“In the short term, provisions may rise but the impact on regulatory capital is expected to be limited.”

— BIS Quarterly, March 2017

In the latest of five documents published by BCBS on Basel III capital requirements, the committee identified “a number of reasons why it may be appropriate for a jurisdiction to introduce a transitional arrangement for the impact of ECL (expected credit-loss) accounting on regulatory capital. These include:

  • The possibility that the impact could be significantly more material than currently expected and result in an unexpected decline in capital ratios; and
  • The fact that the Committee has not yet reached a conclusion on what should be the permanent interaction between ECL accounting and the prudential regime.”1

The transitional arrangements are as follows:

  • Approach A: Day 1 impact on CET1 capital spread over a specified number of years. If there is a decrease in CET1 capital due to an increase in provisions (net of tax effects) the amount of the decrease would be spread over several years (for regulatory purposes).2
  • Approach B: Phased prudential recognition of IFRS 9 Stages 1 and 2 provisions. This approach is for IFRS 9 and provides a phased approach using amortization of the capital differences.3

IFRS 9 vs. CECL (Expected Credit-Loss Models)

Regulatory Capital and CECL

Future BCBS releases will address the long-term regulatory capital treatment of loss provisions.  Recognizing these changes in both the computation of allowance for loan lease losses and its regulatory capital, an institution is required to adopt a more specific quantitative and risk governance framework than in the past. Previously computing the ALLL according to FASB requirements could be accomplished with adequate historical data and an Excel spreadsheet. By moving to life-of-loan approach, CECL requires institutions to take a more forward-looking view of their losses. This in turn will have a direct effect on the computation of regulatory capital.

It is not too early for institutions to begin thinking through the ramifications of this—not only how these measures will be accurately quantified and computed, but what their implementation will ultimately mean for P&L and liquidity.

[1] Basel Committee on Banking Supervision, Standards, Regulatory treatment of accounting provisions – interim approach and transitional arrangements, March 2017, pp. 3-4.
[2] Ibid, p. 6
[3] Ibid, p. 6




Security & Compliance

Get Started