Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Category: Article

Machine Learning Detects Model Validation Blind Spots

Machine learning represents the next frontier in model validation—particularly in the credit and prepayment modeling arena. Financial institutions employ numerous models to make predictions relating to MBS performance. Validating these models by assessing their predictions is of paramount importance, but even models that appear to perform well based upon summary statistics can have subsets of input (input subspaces) for which they tend to perform poorly. Isolating these “blind spots” can be challenging using conventional model validation techniques, but recently developed machine learning algorithms are making the job easier and the results more reliable. 

High-Error Subspace Visualization

RiskSpan’s modeling team has developed a statistical algorithm which identifies high-error subspaces and flags model outputs corresponding to inputs originating from these subspaces, indicating to model users that the results might be unreliable. An extension to this problem that we also address is whether migration of data points to more error-prone subspaces of the input space over time can be indicative of macroeconomic regime shifts and signal a need to re-estimate the model. This will aid in the prevention of declining model efficacy over time.

Due to the high-dimensional nature of the input spaces of many financial models, traditional statistical methods of partitioning data may prove inadequate. Using machine learning techniques, we have developed a more robust method of high-error subspace identification. We develop the algorithm using loan performance model data, but the method is adaptable to generic models.

Data Selection and Preparation

The dataset we use for our analysis is a random sample of the publicly available Freddie Mac Loan-Level Dataset. The entire dataset covers the monthly loan performance for loans originated from 1999 to 2016 (25.4 million fixed-rate mortgages). From this set, one million loans were randomly sampled. Features of this dataset include loan-to-value ratio, borrower debt-to-income ratio, borrower credit score, interest rate, and loan status, among others. We aggregate the monthly status vectors for each loan into a single vector which contains a loan status time series over the life of the loan within the historical period. This aggregated status vector is mapped to a value of 1 if the time series indicates the loan was ever 90 days delinquent within the first three years after its origination, representing a default, and 0 otherwise. This procedure results in 914,802 total records.

Algorithm Framework

Using the prepared loan dataset, we estimate a logistic regression loan performance model. The data is sampled and partitioned into training and test datasets for clustering analysis. The model estimation and training data is taken from loans originating in the period from 1999 to 2007, while loans originating in the period from 2008 to 2016 are used for testing. Once the data has been partitioned into training and test sets, a clustering algorithm is run on the training data.

Two-Dimensional Visualization of Select Clusters

The clustering is evaluated based upon its ability to stratify the loan data into clusters that meaningfully identify regions of the input for which the model performs poorly. This requires the average model performance error associated with certain clusters to be substantially higher than the mean. After the training data is assigned to clusters, cluster-level error is computed for each cluster using the logistic regression model. Clusters with high error are flagged based upon a scoring scheme. Each loan in the test set is assigned to a cluster based upon its proximity to the training cluster centers. Loans in the test set that are assigned to flagged clusters are flagged, indicating that the loan comes from a region for which loan performance model predictions exhibit lower accuracy.

Algorithm Performance Analysis

The clustering algorithm successfully flagged high-error regions of the input space, with flagged test clusters exhibiting accuracy more than one standard deviation below the mean. The high errors associated with clusters flagged during model training were persistent over time, with flagged clusters in the test set having a model accuracy of just 38.7%, compared to an accuracy of 92.1% for unflagged clusters. Failure to address observed high-error clusters in the training set and migration of data to high-error subspaces led to substantially diminished model accuracy, with overall model accuracy dropping from 93.9% in the earlier period to 84.1% in the later period.

Training/Test Cluster Error Comparison

Additionally, the nature of default misclassifications and variables with greatest impact on misclassification were also determined. Cluster FICO scores proved to be a strong indicator of cluster model prediction accuracy. While a relatively large proportion of loans in low-FICO clusters defaulted, the logistic regression model substantially overpredicted the number of defaults for these clusters, leading to a large number of Type I errors (inaccurate default predictions) for these clusters. Type II (inaccurate non-default predictions) errors constituted a smaller proportion of overall model error, and their impact was diminished even further when considering their magnitude relative to the number of true negative predictions (accurate non-default predictions), which are far fewer in number than true positive predictions (accurate default predictions).

FICO vs. Cluster Accuracy

Conclusion

Our application of the subspace error identification algorithm to a loan performance model illustrates the dangers of using high-level summary statistics as the sole determinant of model efficacy and failure to consistently monitor the statistical profile of model input data over time. Often, more advanced statistical analysis is required to comprehensively understand model performance. The algorithm identified sets of loans for which the model was systematically misclassifying default status. These large-scale errors come at a high cost to financial institutions employing such models.

As an extension to this research into high error subspace detection, RiskSpan is currently developing machine learning analytics tools that can detect the root cause of systematic model errors and suggest ways to enhance predictive model performance by alleviating these errors.


Permissioned Blockchains–A Quest for Consensus

 

Conspicuously absent from all the chatter around blockchain’s potential place in structured finance has been much discussion around the thorny matter of consensus. Consensus is at the heart of all distributed ledger networks and is what enables them to function without a trusted central authority. Consensus algorithms are designed to prevent fraud and error. With large, public blockchains, achieving consensus—ensuring that all new information has been examined before is universally accepted—is relatively straightforward. It is achieved either by performing large amounts of work or simply by members who collectively hold a majority stake in the blockchain.

However, when it comes to private (or “permissioned”) blockchains with a relatively small number of interested parties—the kind of blockchains that are currently poised for adoption in the structured finance space—the question of how to obtain consensus takes on an added layer of complexity. Restricting membership greatly reduces the need for elaborate algorithms to prevent fraud on permissioned blockchains. Instead, these applications must ensure that complex workflows and transactions are implemented correctly. They must provide a framework for having members agree to the very structure of the transaction itself. Consensus algorithms complement this by ensuring that the steps performed in verifying transaction data is agreed upon and verified.

With widespread adoption of blockchain in structured finance appearing more and more to be a question of when rather than if, SmartLink Labs, a RiskSpan fintech affiliate, recently embarked on a proof of concept designed to identify and measure the impact of the technology across the structured finance life cycle. The project took a holistic approach, looking at everything from deal issuance to bondholder payments. We sought to understand the benefits, how various roles would change, and the extent to which certain functions might be eliminated altogether. At the heart of virtually every lesson we learned along the way was a common, overriding principle: consensus is hard.

Why is Consensus Hard?

Much of blockchain’s appeal to those of us in the structured finance arena has to do with its potential to lend visibility and transparency to complicated payment rules that govern deals along with dynamic borrower- and collateral-level details that evolve over the lives of the underlying loans. Distributed ledgers facilitate the real-time sharing of these details across all relevant parties—including loan originators, asset servicers, and bond administrators—from deal issuance through the final payment on the transaction. The ledger transactions are synchronized to ensure that ledgers only update when the appropriate participants approve transactions. This is the essence of consensus, and it seems like it ought to be straightforward.

Imagine our surprise when one of the most significant challenges our test implementation encountered was designing the consensus algorithm. Unlike with public blockchains, consensus in a private, or “permissioned,” blockchain is designed for a specific business purpose where the counterparties are known. However, to achieve consensus, the data posted to the blockchain must be verified in an automated manner by the relevant parties to the transaction. One of the challenges with the data and rules that govern most structured transactions is that it is (at best) only partially digital. We approached our project with the premise that most business terms can be translated into a series of logical statements in the form of computer code. Translating unstructured data into structured data in a fully transparent way is problematic, however, and limitations to transparency represent a significant barrier to achieving consensus. In order for a distributed ledger to work in this context, all transaction parties need to reach consensus around how the cash will flow and numerous other business rules throughout the process.

 

A Potential Solution for Structured Finance

To this end, our initial prototype seeks to test our consensus algorithm on the deal waterfall model. If the industry can move to a process where consensus of the deal waterfall model is achieved at deal issuance, the model posted to the blockchain can then serve as an agreed-upon source of truth and perpetuate through the life of the security—from loan administration to master servicer aggregation and bondholder payments. This business function alone could save the industry countless hours and effectively eliminate all of today’s costs associated with having to model and remodel each transaction multiple times.

Those of us who have been in the structured finance business for 25 years or more know how little the fundamental business processes have evolved. They remain manual, governed largely by paper documents, and prone to human error.

The mortgage industry has proven to be particularly problematic. Little to no transparency in the process has fostered a culture of information asymmetry and general mistrust which has predictably given rise to the need to have multiple unrelated parties double-checking data, performing due diligence reviews on virtually all loan files, validating and re-validating cash flow models, and requiring costly layers of legal payment verification. Ten or more parties might contribute in one way or another to verifying and validating data, documents, or cash flow models for a single deal. Upfront consensus via blockchain holds the potential to dramatically reduce or even eliminate almost all of this redundancy.

Transparency and Real-Time Investor Reporting

The issuance process, of course, is only the beginning. The need for consensus does not end when the cash flow model is agreed to and the deal is finalized. Once we complete a verified deal, the focus of our proof of concept will shift to the monthly process of investor reporting and corresponding payments to the bond holders.

The immutability of transactions posted to the ledger is particularly valuable because of the unmistakable audit trail it creates. Rather than compelling master servicers to rely on a monthly servicing snapshot “tape” to try and figure out what happened to a severely delinquent loan with four instances of non-sufficient funds, a partial payment in suspense, and an interest rate change somewhere in the middle. Putting all these transactions on a blockchain creates a relatively straightforward sequence of transactions that everyone can decipher.

Posting borrower payments to a blockchain in real time will also require consensus among transaction parties. Once this is achieved, the antiquated notion of monthly investor reporting will become obsolete. The potential ramifications of this extend to timing of payments to bond holders. No longer needing to wait until the next month to find out what borrowers did the month before means that payments to investors might be accelerated and, in the private-label security markets, perhaps even more often than monthly. With real-time consensus comes the possibility of far more flexibility for issuers and investors in designing the timing of cash flows should they elect to pursue it.

This envisioned future state is not without its detractors. Some ask why servicers would opt for more transparency when they already encounter more scrutiny and criticism than they would like. In many cases, however, it is the lack of transparency, more than a servicer’s actions themselves, that invite the unwanted scrutiny. Servicers that move beyond reporting monthly snapshots and post comprehensive loan activity to a blockchain stand to reap significant competitive advantages. Because of the real-time consensus and sharing of dynamic loan reporting data (and perhaps of accelerated bond payments, as suggested above) investors will quickly gravitate toward deals that are administered by blockchain-enabled servicers. Sooner or later, servicers who fail to adapt will find themselves on the outside looking in.

Less Redundancy; More Trust

Much of blockchain’s appeal is bound up in the promise of an environment in which deal participants can gain reasonable assurance that their counterparts are disclosing information that is both accurate and comprehensive. Visibility is an important component of this, but ultimately, achieving consensus that what is being done is what ought to be done will be necessary in order to fully eliminate redundant functions in business processes and overcome information asymmetry in the private markets.  Sophisticated, well-conceived algorithms that enable private parties to arrive at this consensus in real time will be key.


One of the enduring lessons of our structured finance proof of concept is that consensus is necessary throughout a transaction’s life. The market (i.e., issuers, investors, servicers, and bond administrators) will ultimately determine what gets posted to a blockchain and what remains off-chain, and more than one business model will likely evolve. As data becomes more structured and more reliable, however, competitive advantages will increasingly accrue to those who adopt consensus algorithms capable of infusing trust into the process. The failure of the private-label MBS market to regain its pre-crisis footing is, in large measure, a failure of trust. Nothing repairs trust like consensus.

Talk Scope


A Brief Introduction to Agile Philosophy

Reducing time to delivery by developing in smaller incremental chunks and incorporating an ability to pivot is the cornerstone of Agile software development methodology.

“Agile” software development is a rarity among business buzz words in that it is actually a fitting description of what it seeks to accomplish. Optimally implemented, it is capable of delivering value and efficiency to business-IT partnerships by incorporating flexibility and an ability to pivot rapidly when necessary.

As a technology company with a longstanding management consulting pedigree, RiskSpan values the combination of discipline and flexibility inherent to Agile development and regularly makes use of the philosophy in executing client engagements. Dynamic economic environments contribute to business priorities that are seemingly in a near-constant state of flux. In response to these ever-evolving needs, clients seek to implement applications and application feature changes quickly and efficiently to realize business benefits early.

This growing need for speed and “agility” makes Agile software development methods an increasingly appealing alternative to traditional “waterfall” methodologies. Waterfall approaches move in discrete phases—treating analysis, design, coding, and testing as individual, stand-alone components of a software project. Historically, when the cost of changing plans was high, such a discrete approach worked best. Nowadays, however, technological advances have made changing the plan more cost-feasible. In an environment where changes can be made inexpensively, rigid waterfall methodologies become unnecessarily counterproductive for at least four reasons:

  1. When a project runs out of time (or money), individual critical phases—often testing—must be compressed, and overall project quality suffers.
  2. Because working software isn’t produced until the very end of the project, it is difficult to know whether the project is really on track prior to project completion.
  3. Not knowing whether established deadlines will be met until relatively late in the game can lead to schedule risks.
  4. Most important, discrete phase waterfalls simply do not respond well to the various ripple effects created by change.

 

Continuous Activities vs. Discrete Project Phases

Agile software development methodologies resolve these traditional shortcomings by applying techniques that focus on reducing overhead and time to delivery. Instead of treating fixed development stages as discrete phases, Agile treats them as continuous activities. Doing things simultaneously and continuously—for example, incorporating testing into the development process from day one—improves quality and visibility, while reducing risk. Visibility improves because being halfway through a project means that half of a project’s features have been built and tested, rather than having many partially built features with no way of knowing how they will perform in testing. Risk is reduced because feedback comes in from earliest stages of development and changes without paying exorbitant costs. This makes everybody happy.

 

Flexible but Controlled

Firms sometimes balk at Agile methods because of a tendency to equate “flexibility” and “agility” with a lack of organization and planning, weak governance and controls, and an abandonment of formal documentation. This, however, is a misconception. “Agile” does not mean uncontrolled—on the contrary, it is no more or less controlled than the existing organizational boundaries of standardized processes into which it is integrated. Most Agile methods do not advocate any particular methodology for project management or quality control. Rather, their intent is on simplifying the software development approach, embracing changing business needs, and producing working software as quickly as possible. Thus, Agile frameworks are more like a shell which users of the framework have full flexibility to customize as necessary.

 

Frameworks and Integrated Teams

Agile methodologies can be implemented using a variety of frameworks, including Scrum, Kanban, and XP. Scrum is the most popular of these and is characterized by producing a potentially shippable set of functionalities at the end of every iteration in two-week time boxes called sprints. Delivering high-quality software at the conclusion of such short sprints requires supplementing team activities with additional best practices, such as automated testing, code cleanup and other refactoring, continuous integration, and test-driven or behavior-driven development.

Agile teams are built around motivated individuals subscribing what is commonly referred to as a “lean Agile mindset.” Team members who embrace this mindset share a common vision and are motivated to contribute in ways beyond their defined roles to attain success. In this way, innovation and creativity is supported and encouraged. Perhaps most important, Agile promotes building relationships based on trust among team members and with the end-user customer in providing fast and high-quality delivery of software. When all is said and done, this is the aim of any worthwhile endeavor. When it comes to software development, Agile is showing itself to be an impressive means to this end.


CECL–Why Implement Now?

FASB permits early adoption of CECL for fiscal years beginning after December 15, 2018, including interim periods within the fiscal year. The decision of whether to pursue formal early adoption is a complex one hinging on specific factors that vary among institutions. We are finding, however, that early implementation of a CECL solution offers many potential benefits regardless of whether an institution opts for early adoption.

The benefits of implementing a CECL solution comfortably in advance of adoption are analogous to those of a flight simulator. Both enable professionals to gain insights into how a new system functions and reacts to various influences in a riskless environment. Like a flight simulator, early CECL system implementation enables institutions to move forward with this important standard in a controlled and well-considered manner. It benefits institutions by allowing time for analysis and supporting optimal decision-making.

It accomplishes this in the following ways.

 

Enhanced Data Readiness

Data readiness is a crucial factor in CECL implementation. Analysts and risk managers need to know whether their data will be sufficient to support effective functioning of a credit loss forecasting model specifically designed for their portfolio. Early implementation facilitates answering this question in advance by enabling institutions to assess the following:

  1. Internal data completeness
  2. Internal data reliability
  3. Internal data accessibility
  4. Externally sourced forecasting elements

Data preparation often requires a considerable investment of time and effort that can span months.  Obtaining data and developing interfaces to legacy systems and other sources requires planning and sufficient time to develop and test queries. Early implementation provides a safe environment for exploring data nuances and figuring out what works best.

 

Appropriate Cohort Segmentation

Loans and revolving credits are segmented into risk-related cohorts for credit loss forecasting purposes.  Clients frequently ask whether existing segments should be revisited or revised in a CECL implementation. The answer, like most, depends on many factors. However, when new segmentation is called for, it takes time to analyze the effect of risk factors on losses and to devise the new segmentation strategy. That work would ideally begin well in advance of formal CECL adoption, along with testing methodologies for each segment.

 

Early Operational and Internal Control Insights

Capturing the real-time data necessary to book lifetime losses at origination will pose new and daunting challenges. Traditional ALLL processes historically enabled banks to largely ignore originations occurring near the end of an accounting period. CECL changes all this by requiring day-one losses to be booked upon origination, thus potentially taxing data collection processes that likely will need to be streamlined in order to gather the data necessary to produce loss forecasts in time for the accounting close.

In addition to overcoming the obvious operational obstacles associated with this degree of data processing, early implementation gives banks an advance opportunity to assess the internal controls issues that will inevitably arise. Datasets and models that previously were not subject to SOX and financial reporting control testing may need to be reviewed in a new light. Controls around the databases and models supporting the life-of-loan calculation will almost certainly need to be enhanced to meet financial reporting expectations. Figuring out how to address all this in the relative safety of an early-implementation simulator is preferable to learning it on the fly.

Capital and Strategic Planning Benefits

Because CECL is generally expected to increase loan loss reserves, the effect on bank capital should be understood as soon as possible. Banks will want to understand the potential impact of CECL on capital to inform discussions with the board, auditors, regulators, and shareholders and to factor into capital planning in advance of formal adoption.

It is particularly important for banks that are active in mergers and acquisitions to understand the implications of CECL on earnings. This will inform efforts currently underway or planned for the next couple of years. Early CECL implementation also provides the ability to begin making lending and origination decisions informed by CECL.


Compared with the relatively complex decisions surrounding whether to pursue early CECL adoption, deciding whether to implement a CECL solution early is straightforward. Regardless of whether a bank opts for early adoption, early implementation promises a range of benefits that will support optimal risk management decisions going forward, regardless of when CECL is ultimately adopted.


Hands-On Machine Learning–Predicting Loan Delinquency

The ability of machine learning models to predict loan performance makes them particularly interesting to lenders and fixed-income investors. This expanded post provides an example of applying the machine learning process to a loan-level dataset in order to predict delinquency. The process includes variable selection, model selection, model evaluation, and model tuning.

The data used in this example are from the first quarter of 2005 and come from the publicly available Fannie Mae performance dataset. The data are segmented into two different sets: acquisition and performance. The acquisition dataset contains 217,000 loans (rows) and 25 variables (columns) collected at origination (Q1 2005). The performance dataset contains the same set of 217,000 loans coupled with 31 variables that are updated each month over the life of the loan. Because there are multiple records for each loan, the performance dataset contains approximately 16 million rows.

For this exercise, the problem is to build a model capable of predicting which loans will become severely delinquent, defined as falling behind six or more months on payments. This delinquency variable was calculated from the performance dataset for all loans and merged with the acquisition data based on the loan’s unique identifier. This brings the total number of variables to 26. Plenty of other hypotheses can be tested, but this analysis focuses on just this one.

1          Variable Selection

An overview of the dataset can be found below, showing the name of each variable as well as the number of observations available

                                            Count
LOAN_IDENTIFIER                             217088
CHANNEL                                     217088
SELLER_NAME                                 217088
ORIGINAL_INTEREST_RATE                      217088
ORIGINAL_UNPAID_PRINCIPAL_BALANCE_(UPB)     217088
ORIGINAL_LOAN_TERM                          217088
ORIGINATION_DATE                            217088
FIRST_PAYMENT_DATE                          217088
ORIGINAL_LOAN-TO-VALUE_(LTV)                217088
ORIGINAL_COMBINED_LOAN-TO-VALUE_(CLTV)      217074
NUMBER_OF_BORROWERS                         217082
DEBT-TO-INCOME_RATIO_(DTI)                  201580
BORROWER_CREDIT_SCORE                       215114
FIRST-TIME_HOME_BUYER_INDICATOR             217088
LOAN_PURPOSE                                217088
PROPERTY_TYPE                               217088
NUMBER_OF_UNITS                             217088
OCCUPANCY_STATUS                            217088
PROPERTY_STATE                              217088
ZIP_(3-DIGIT)                               217088
MORTGAGE_INSURANCE_PERCENTAGE                34432
PRODUCT_TYPE                                217088
CO-BORROWER_CREDIT_SCORE                    100734
MORTGAGE_INSURANCE_TYPE                      34432
RELOCATION_MORTGAGE_INDICATOR               217088

Most of the variables in the dataset are fully populated, with the exception of DTI, MI Percentage, MI Type, and Co-Borrower Credit Score. Many options exist for dealing with missing variables, including dropping the rows that are missing, eliminating the variable, substituting with a value such as 0 or the mean, or using a model to fill the most likely value.

The following chart plots the frequency of the 34,000 MI Percentage values.

The distribution suggests a decent amount of variability. Most loans that have mortgage insurance are covered at 25%, but there are sizeable populations both above and below. Mortgage insurance is not required for the majority of borrowers, so it makes sense that this value would be missing for most loans.  In this context, it makes the most sense to substitute the missing values with 0, since 0% mortgage insurance is an accurate representation of the state of the loan. An alternative that could be considered is to turn the variable into a binary yes/no variable indicating if the loan has mortgage insurance, though this would result in a loss of information.

The next variable with a large number of missing values is Mortgage Insurance Type. Querying the dataset reveals that that of the 34,400 loans that have mortgage insurance, 33,000 have type 1 borrower paid insurance and the remaining 1,400 have type 2 lender paid insurance. Like the mortgage insurance variable, the blank values can be filled. This will change the variable to indicate if the loan has no insurance, type 1, or type 2.

The remaining variable with a significant number of missing values is Co-Borrower Credit Score, with approximately half of its values missing. Unlike MI Percentage, the context does not allow us to substitute missing values with zeroes. The distribution of both borrower and co-borrower credit score as well as their relationship can be found below.

As the plot demonstrates, borrower and co-borrower credit scores are correlated. Because of this, the removal of co-borrower credit score would only result in a minimal loss of information (especially within the context of this example). Most of the variance captured by co-borrower credit score is also captured in borrower credit score. Turning the co-borrower credit score into a binary yes/no ‘has co-borrower’ variable would not be of much use in this scenario as it would not differ significantly from the Number of Borrowers variable. Alternate strategies such as averaging borrower/co-borrower credit score might work, but for this example we will simply drop the variable.

In summary, the dataset is now smaller—Co-Borrower Credit Score has been dropped. Additionally, missing values for MI Percentage and MI Type have been filled in. Now that the data have been cleaned up, the values and distributions of the remaining variables can be examined to determine what additional preprocessing steps are required before model building. Scatter matrices of pairs of variables and distribution plots of individual variables along the diagonal can be found below. The scatter plots are helpful for identifying multicollinearity between pairs of variables, and the distributions can show if a variable lacks enough variance that it won’t contribute to model performance.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_single_image image=”1089″][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]The third row of scatterplots, above, reflects a lack of variability in the distribution of Original Loan Term. The variance of 3.01 (calculated separately) is very small, and as a result the variable can be removed—it will not contribute to any model as there is very little information to learn from. This process of inspecting scatterplots and distributions is repeated for the remaining pairs of variables. The Number of Units variable suffers from the same issue and can also be removed.

2          Heatmaps and Pairwise Grids

Matrices of scatterplots are useful for looking at the relationships between variables. Another useful plot is a heatmap and pairwise grid of correlation coefficients. In the plot below a very strong correlation between Original LTV and Original CLTV is identified.

This multicollinearity can be problematic for both the interpretation of the relationship between the variables and delinquency as well as the actual performance of some models.  To combat this problem, we remove Original CLTV because Original LTV is a more accurate representation of the loan at origination. Loans in this population that were not refinanced kept their original LTV value as CLTV. If CLTV were included in the model it would introduce information not available at origination to the model. The problem of allowing unexpected additional information in a dataset introduces an issue known as leakage, which will bias the model.

Now that the numeric variables have been inspected, the remaining categorical variables must be analyzed to ensure that the classes are not significantly unbalanced. Count plots and simple descriptive statistics can be used to identify categorical variables are problematic. Two examples below show the count of loans by state and by seller.

Inspecting the remaining variables uncovers that Relocation Indicator (indicating a mortgage issued when an employer moves an employee) and Product Type (fixed vs. adjustable rate) must be removed as they are extremely unbalanced and do not contain any information that will help the models learn. We also removed first payment date and origination date, which were largely redundant. The final cleanup results in a dataset that contains the following columns:

LOAN_IDENTIFIER 
CHANNEL 
SELLER_NAME
ORIGINAL_INTEREST_RATE
ORIGINAL_UNPAID_PRINCIPAL_BALANCE_(UPB) 
ORIGINAL_LOAN-TO-VALUE_(LTV) 
NUMBER_OF_BORROWERS
DEBT-TO-INCOME_RATIO_(DTI) 
BORROWER_CREDIT_SCORE
FIRST-TIME_HOME_BUYER_INDICATOR 
LOAN_PURPOSE
PROPERTY_TYPE 
OCCUPANCY_STATUS 
PROPERTY_STATE
MORTGAGE_INSURANCE_PERCENTAGE 
MORTGAGE_INSURANCE_TYPE 
ZIP_(3-DIGIT)

The final two steps before model building are to standardize each of the numeric variables and turn each categorical variable into a series of dummy or indicator variables. Numeric variables are scaled with mean 0 and standard deviation 1 so that it is easier to compare variables that have a different scale (e.g. interest rate vs. LTV). Additionally, standardizing is also a requirement for many algorithms (e.g. principal component analysis).

Categorical variables are transformed by turning each value of the variable into its own yes/no feature. For example, Property State originally has 50 possible values, so it will be turned into 50 variables (e.g. Alabama yes/no, Alaska yes/no).  For categorical variables with many values this transformation will significantly increase the number of variables in the model.

After scaling and transforming the dataset, the final shape is 199,716 rows and 106 columns. The target variable—loan delinquency—has 186,094 ‘no’ values and 13,622 ‘yes’ values. The data are now ready to be used to build, evaluate, and tune machine learning models.

3          Model Selection

Because the target variable loan delinquency is binary (yes/no) the methods available will be classification machine learning models. There are many classification models, including but not limited to: neural networks, logistic regression, support vector machines, decision trees and nearest neighbors. It is always beneficial to seek out domain expertise when tackling a problem to learn best practices and reduce the number of model builds. For this example, two approaches will be tried—nearest neighbors and decision tree.

The first step is to split the dataset into two segments: training and testing. For this example, 40% of the data will be partitioned into the test set, and 60% will remain as the training set. The resulting segmentations are as follows:

1.       60% of the observations (as training set)- X_train

2.       The associated target (loan delinquency) for each observation in X_train- y_train

3.       40% of the observations (as test set)- X_test

4.        The targets associated with the test set- y_test

Data should be randomly shuffled before they are split, as datasets are often in some type of meaningful order. Once the data are segmented the model will first be exposed to the training data to begin learning.

4          K-Nearest Neighbors Classifier

Training a K-neighbors model requires the fitting of the model on X_train (variables) and y_train (target) training observations. Once the model is fit, a summary of the model hyperparameters is returned. Hyperparameters are model parameters not learned automatically but rather are selected by the model creator.

 

The K-neighbors algorithm searches for the closest (i.e., most similar) training examples for each test observation using a metric that calculates the distance between observations in high-dimensional space.  Once the nearest neighbors are identified, a predicted class label is generated as the class that is most prevalent in the neighbors. The biggest challenge with a K-neighbors classifier is choosing the number of neighbors to use. Another significant consideration is the type of distance metric to use.

To see more clearly how this method works, the 6 nearest neighbors of two random observations from the training set were selected, one that is a non-default (0 label) observation and one that is not.

Random delinquent observation: 28919 
Random non delinquent observation: 59504

The indices and minkowski distances to the 6 nearest neighbors of the two random observations are found below. Unsurprisingly, the first nearest neighbor is always itself and the first distance is 0.

Indices of closest neighbors of obs. 28919 [28919 112677 88645 103919 27218 15512]
Distance of 5 closest neighbor for obs. 28919 [0 0.703 0.842 0.883 0.973 1.011]

Indices of 5 closest neighbors for obs. 59504 [59504 87483 25903 22212 96220 118043]
Distance of 5 closest neighbor for obs. 59504 [0 0.873 1.185 1.186 1.464 1.488]

Recall that in order to make a classification prediction, the kneighbors algorithm finds the nearest neighbors of each observation. Each neighbor is given a ‘vote’ via their class label, and the majority vote wins. Below are the labels (or votes) of either 0 (non-delinquent) or 1 (delinquent) for the 6 nearest neighbors of the random observations. Based on the voting below, the delinquent observation would be classified correctly as 3 of the 5 nearest neighbors (excluding itself) are also delinquent. The non-delinquent observation would also be classified correctly, with 4 of 5 neighbors voting non-delinquent.

Delinquency label of nearest neighbors- non delinquent observation: [0 1 0 0 0 0]
Delinquency label of nearest neighbors- delinquent observation: [1 0 1 1 0 1]

 

5          Tree-Based Classifier

Tree based classifiers learn by segmenting the variable space into a number of distinct regions or nodes. This is accomplished via a process called recursive binary splitting. During this process observations are continuously split into two groups by selecting the variable and cutoff value that results in the highest node purity where purity is defined as the measure of variance across the two classes. The two most popular purity metrics are the gini index and cross entropy. A low value for these metrics indicates that the resulting node is pure and contains predominantly observations from the same class. Just like the nearest neighbor classifier, the decision tree classifier makes classification decisions by ‘votes’ from observations within each final node (known as the leaf node).

To illustrate how this works, a decision tree was created with the number of splitting rules (max depth) limited to 5. An excerpt of this tree can be found below. All 120,000 training examples start together in the top box. From top to bottom, each box shows the variable and splitting rule applied to the observations, the value of the gini metric, the number of observations the rule was applied to, and the current segmentation of the target variable. The first box indicates that the 6th variable (represented by the 5th index ‘X[5]’) Borrower Credit Score was  used to  split  the  training  examples.  Observations where the value of Borrower Credit Score was below or equal to -0.4413 follow the line to the box on the left. This box shows that 40,262 samples met the criteria. This box also holds the next splitting rule, also applied to the Borrower Credit Score variable. This process continues with X[2] (Original LTV) and so on until the tree is finished growing to its depth of 5. The final segments at the bottom of the tree are the aforementioned leaf nodes which are used to make classification decisions.  When making a prediction on new observations, the same splitting rules are applied and the observation receives the label of the most commonly occurring class in its leaf node.

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_single_image image=”1086″][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]A more advanced tree based classifier is the Random Forest Classifier. The Random Forest works by generating many individual trees, often hundreds or thousands. However, for each tree, number of variables considered at each split is limited to a random subset. This helps reduce model variance and de-correlate the trees (since each tree will have a different set of available splitting choices). In our example, we fit a random forest classifier on the training data. The resulting hyperparameters and model documentation indicate that by default the model generates 10 trees, considers a random subset of variables the size of the square root of all variables (approximately 10 in this case), has no depth limitation, and only requires each leaf node to have 1 observation.

Since the random forest contains many trees and does not have a depth limitation, it is incredibly difficult to visualize. In order to better understand the model, a plot showing which variables were selected and resulted in the largest drop in the purity metric (gini index) can be useful. Below are the top 10 most important variables in the model, ranked by the total (normalized) reduction to the gini index.  Intuitively, this plot can be described as showing which variables can be used to best segment the observations into groups that are predominantly one class, either delinquent and non-delinquent.

 

6          Model Evaluation

Now that the models have been fitted, their performance must be evaluated. To do this, the fitted model will first be used to generate predictions on the test set (X_test). Next, the predicted class labels are compared to the actual observed class label (y_test). Three of the most popular classification metrics that can be used to compare the predicted and actual values are recall, precision, and the f1-score. These metrics are calculated for each class, delinquent and not-delinquent.

Recall is calculated for each class as the ratio of events that were correctly predicted. More precisely, it is defined as the number of true positive predictions divided by the number of true positive predictions plus false negative predictions. For example, if the data had 10 delinquent observations and 7 were correctly predicted, recall for delinquent observations would be 7/10 or 70%.

Precision is the number of true positives divided by the number of true positives plus false positives. Precision can be thought of as the ratio of events correctly predicted to the total number of events predicted. In the hypothetical example above, assume that the model made a total of 14 predictions for the label delinquent. If so, then the precision for delinquent predictions would be 7/14 or 50%.

The f1 score is calculated as the harmonic mean of recall and precision: (2(Precision*Recall/Precision+Recall)).

The classification reports for the K-neighbors and decision tree below show the precision, recall, and f1 scores for label 0 (non-delinquent) and 1 (delinquent).

 

There is no silver bullet for choosing a model—often it comes down to the goals of implementation. In this situation, the tradeoff between identifying more delinquent loans at the cost of misclassification can be analyzed with a specific tool called a roc curve.  When the model predicts a class label, a probability threshold is used to make the decision. This threshold is set by default at 50% so that observations with more than a 50% chance of membership belong to one class and vice-versa.

The majority vote (of the neighbor observations or the leaf node observations) determines the predicted label. Roc curves allow us to see the impact of varying this voting threshold by plotting the true positive prediction rate against the false positive prediction rate for each threshold value between 0% and 100%.

The area under the ROC curve (AUC) quantifies the model’s ability to distinguish between delinquent and non-delinquent observations.  A completely useless model will have an AUC of .5 as the probability for each event is equal. A perfect model will have an AUC of 1 as it is able to perfectly predict each class.

To better illustrate, the ROC curves plotting the true positive and false positive rate on the held-out test set as the threshold is changed are plotted below.

7          Model Tuning

Up to this point the models have been built and evaluated using a single train/test split of the data. In practice this is often insufficient because a single split does not always provide the most robust estimate of the error on the test set. Additionally, there are more steps required for model tuning. To solve both of these problems it is common to train multiple instances of a model using cross validation. In K-fold cross validation, the training data that was first created gets split into a third dataset called the validation set. The model is trained on the training set and then evaluated on the validation set. This process is repeated times, each time holding out a different portion of the training set to validate against. Once the model has been tuned using the train/validation splits, it is tested against the held out test set just as before. As a general rule, once data have been used to make a decision about the model they should never be used for evaluation.

8          K-Nearest Neighbors Tuning

Below a grid search approach is used to tune the K-nearest neighbors model. The first step is to define all of the possible hyperparameters to try in the model. For the KNN model, the list nk = [10, 50, 100, 150, 200, 250] specifies the number of nearest neighbors to try in each model. The list is used by the function GridSearchCV to build a series of models, each using the different value of nk. By default, GridSearchCV uses 3-fold cross validation. This means that the model will evaluate 3 train/validate splits of the data for each value of nk. Also specified in GridSearchCV is the scoring parameter used to evaluate each model. In this instance it is set to the metric discussed earlier, the area under the roc curve. GridSearchCV will return the best performing model by default, which can then be used to generate predictions on the test set as before. Many more values of could be specified to search through, and the default minkowski distance could be set to a series of metrics to try. However, this comes at a cost of computation time that increases significantly with each added hyperparameter.

 

In the plot below the mean training and validation scores of the 3 cross-validated splits is plotted for each value of K. The plot indicates that for the lower values of the model was overfitting the training data and causing lower validation scores. As increases, the training score lowers but the validation score increases because the model gets better at generalizing to unseen data.

9               Random Forest Tuning

There are many hyperparameters that can be adjusted to tune the random forest model. We use three in our example: n_estimatorsmax_features, and min_samples_leafN_estimators refers to the number of trees to be created. This value can be increased substantially, so the search space is set to list estimators. Random Forests are generally very robust to overfitting, and it is not uncommon to train a classifier with more than 1,000 trees. Second, the number of variables to be randomly considered at each split can be tuned via max_features. Having a smaller value for the number of random features is helpful for decorrelating the trees in the forest, which is especially useful when multicollinearity is present. We tried a number of different values for max_features, which can be found in the list features. Finally, the number of observations required in each leaf node is tuned via the min_samples_leaf parameter and list samples.

 

The resulting plot, below, shows a subset of the grid search results. Specifically, it shows the mean test score for each number of trees and leaf size when the number of random features considered at each split is limited to 5. The plot demonstrates that the best performance occurs with 500 trees and a requirement of at least 5 observations per leaf. To see the best performing model from the entire grid space the best estimator method can be used.

By default, parameters of the best estimator are assigned to the GridSearch object (cvknc and cvrfc). This object can now be used generate future predictions or predicted probabilities. In our example, the tuned models are used to generate predicted probabilities on the held out test set. The resulting

ROC curves show an improvement in the KNN model from an AUC of .62 to .75. Likewise, the tuned Random Forest AUC improves from .64 to .77.

Predicting loan delinquency using only origination data is not an easy task. Presumably, if significant signal existed in the data it would trigger a change in strategy by MBS investors and ultimately origination practices. Nevertheless, this exercise demonstrates the capability of a machine learning approach to deconstruct such an intricate problem and suggests the appropriateness of using machine learning model to tackle these and other risk management data challenges relating to mortgages and a potentially wide range of asset classes.

Talk Scope


Big Data in Small Dimensions: Machine Learning Methods for Data Visualization

Analysts and data scientists are constantly seeking new ways to parse increasingly intricate datasets, many of which are deemed “high dimensional”, i.e., contain many (sometimes hundreds or more) individual variables. Machine learning has recently emerged as one such technique due to its exceptional ability to process massive quantities of data. A particularly useful machine learning method is t-distributed stochastic neighbor embedding (t-SNE), used to summarize very high-dimensional data using comparatively few variables. T-SNE visualizations allow analysts to identify hidden structures that may have otherwise been missed.

Traditional Data Visualization

The first step in tackling any analytical problem is to develop a solid understanding of the dataset in question. This process often begins with calculating descriptive statistics that summarize useful characteristics of each variable, such as the mean and variance. Also critical to this pursuit is the use of data visualizations that can illustrate the relationships between observations and variables and can identify issues that must be corrected. For example, the chart below shows a series of pairwise plots between a set of variables taken from a loan-level dataset. Along the diagonal axis the distribution of each individual variable is plotted.

The plot above is useful for identifying pairs of variables that are highly correlated as well as variables that lack variance, such as original loan term. When dealing with a larger number of variables, heatmaps like the one below can summarize the relationships between the data in a compact way that is also visually intuitive.

The statistics and visualizations described so far are helpful for summarizing and identifying issues, but they often fall short in telling the entire narrative of the data. One issue that remains is a lack of understanding of the underlying structure of the data. Gaining this understanding is often key to selecting the best approach for problem solving.

Enhanced Data Visualization with Machine Learning

Humans can visualize observations plotted with up to three variables (dimensions), but with the exponential rise in data collection it is now abnormal to only be dealing with a handful of variables. Thankfully, there are new machine learning methods that can help overcome our limited capacity and deliver new insights never seen before.

T-SNE is a type of non-linear dimensionality reduction algorithm. While this is a mouthful, the idea behind it is straightforward: t-SNE takes data that exists in very high dimensions and produces a plot in two or three dimensions that can be observed. The plot in low dimensions is created in such a way that observations close to each other in high dimensions remain close together in low dimensions. Additionally, t-SNE has proven to be good at preserving both the global and local structures present within the data1, which is of critical importance.

The full technical details of t-SNE are beyond the scope of this blog, but a simplified version of the steps for t-SNE are as follows:

  1. Compute the Euclidean distance between each pair of observations in high-dimensional space.
  2. Using a Gaussian distribution, convert the distance between each pair of observations into a probability that represents similarity between the points.
  3. Randomly place the observations into low-dimensional space (usually 2 or 3).
  4. Compute the distance and similarity (as in steps 1 and 2) for each pair of observations in the low-dimensional space. Crucially, in this step a Student t-distribution is used instead of a normal Gaussian.
  5. Using gradient based optimization, iteratively nudge the observations in the low-dimensional space in such a way that the probabilities between pairs of observations are as close as possible to the probabilities in high dimensions.

Two key consideration are the use of the Student t-distribution in step four as opposed to the Gaussian in step two, and the random initialization of the data points in low dimensional space. The t-distribution is critical to the success of the algorithm for multiple reasons, but perhaps most importantly in that it allows clusters that initially start far apart to re-converge2. Given the random initialization of the points in low dimensional space, it is common practice to run the algorithm multiple times with the same parameters to observe the best mapping and ensure that the gradient descent optimization does not get stuck in a local minima.

We applied t-SNE to a loan-level dataset comprised of approximately 40 variables. The loans are a random sample of originations from every quarter dating back to 1999. T-SNE was used to map the data into just three dimensions and the resulting plot was color-coded based on the year of origination.

In the interactive visualization below many clusters emerge. Rotating the figure reveals that some clusters are comprised predominantly of loans within similar origination years (groups of same-colored data points). Other clusters are less well-defined or contain a mix of origination years. Using this same method, we could choose to color loans with other information that we may wish to explore. For example, a mapping showing clusters related to delinquencies, foreclosure, or other credit loss events could prove tremendously insightful. For a given problem, using information from a plot such as this can enhance the understanding of the problem separability and enhance the analytical approach.

Crucial to the t-SNE mapping is a parameter set by the analyst called perplexity, which should be roughly equal to the number of expected nearby neighbors for each data point. Therefore, as the value of perplexity increases, the number of resulting clusters should generally decrease and vice versa. When implementing t-SNE, various perplexity parameters should be tried as the appropriate value is generally not known beforehand. The plot below was produced using the same dataset as before but with a larger value of perplexity. In this plot four distinct clusters emerge, and within each cluster loans of similar origination years group closely together.


Private-Label Securities – Technological Solutions to Information Asymmetry and Mistrust

At its heart, the failure of the private-label residential mortgage-backed securities (PLS) market to return to its pre-crisis volume is a failure of trust. Virtually every proposed remedy, in one way or another, seeks to create an environment in which deal participants can gain reasonable assurance that their counterparts are disclosing information that is both accurate and comprehensive. For better or worse, nine-figure transactions whose ultimate performance will be determined by the manner in which hundreds or thousands of anonymous people repay their mortgages cannot be negotiated on the basis of a handshake and reputation alone. The scale of these transactions makes manual verification both impractical and prohibitively expensive. Fortunately, the convergence of a stalled market with new technologies presents an ideal time for change and renewed hope to restore confidence in the system.

 

Trust in Agency-Backed Securities vs Private-Label Securities

Ginnie Mae guaranteed the world’s first mortgage-backed security nearly 50 years ago. The bankers who packaged, issued, and invested in this MBS could scarcely have imagined the technology that is available today. Trust, however, has never been an issue with Ginnie Mae securities, which are collateralized entirely by mortgages backed by the federal government—mortgages whose underwriting requirements are transparent, well understood, and consistently applied.

Further, the security itself is backed by the full faith and credit of the U.S. Government. This degree of “belt-and-suspenders” protection afforded to investors makes trust an afterthought and, as a result, Ginnie Mae securities are among the most liquid instruments in the world.

Contrast this with the private-label market. Private-label securities, by their nature, will always carry a higher degree of uncertainty than Ginnie Mae, Fannie Mae, and Freddie Mac (i.e., “Agency”) products, but uncertainty is not the problem. All lending and investment involves uncertainty. The problem is information asymmetry—where not all parties have equal access to the data necessary to assess risk. This asymmetry makes it challenging to price deals fairly and is a principal driver of illiquidity.

 

Using Technology to Garner Trust in the PLS Market

In many transactions, ten or more parties contribute in some manner to verifying and validating data, documents, or cash flow models. In order to overcome asymmetry and restore liquidity, the market will need to refine (and in some cases identify) technological solutions to, among other challenges, share loan-level data with investors, re-envision the due diligence process, and modernize document custody.

 

Loan-Level Data

During SFIG’s Residential Mortgage Finance symposium last month, RiskSpan moderated a panel that featured significant discussion around loan-level disclosures. At issue was whether the data required by the SEC’s Regulation AB provided investors with all the information necessary to make an investment decision. Specifically debated was the mortgaged property’s zip code, which provides investors valuable information on historical valuation trends for properties in a given geographic area.

Privacy advocates question the wisdom of disclosing full, five-digit zip codes. Particularly in sparsely populated areas where zip codes contain a relatively small number of addresses, knowing the zip code along with the home’s sale price and date (which are both publicly available) can enable unscrupulous data analysts to “triangulate” in on an individual borrower’s identity and link the borrower to other, more sensitive personal information in the loan-level disclosure package.

The SEC’s “compromise” is to require disclosing only the first two digits of the zip code, which provide a sense of a property’s geography without the risk of violating privacy. Investors counter that two-digit zip codes do not provide nearly enough granularity to make an informed judgment about home-price stability (and with good reason—some entire states are covered entirely by a single two-digit zip code).

The competing demands of disclosure and privacy can be satisfied in large measure by technology. Rather than attempting to determine which individual data fields should be included in a loan-level disclosure (and then publishing it on the SEC’s EDGAR site for all the world to see) the market ought to be able to develop a technology where a secure, encrypted, password-protected copy of the loan documents (including the loan application, tax documents, pay-stubs, bank statements, and other relevant income, employment, and asset verifications) is made available on a need-to-know basis to qualified PLS investors who share in the responsibility for safeguarding the information.

 

Due Diligence Review

Technologically improving the transparency of the due diligence process to investors may also increase investor trust, particularly in the representation and warranty review process. Providing investors with a secure view of the loan-level documentation used to underwrite and close the underlying mortgage loan, as described above, may reduce the scope of due diligence review as it exists in today’s market. Technology companies, which today support initiatives such as Fannie Mae’s “Day 1 Certainty” program, promise to further disrupt the due diligence process in the future. Through automation, the due diligence process becomes less burdensome and fosters confidence in the underwriting process while also reducing costs and bringing representation and warranty relief.

Today’s insistence on 100% file reviews in many cases is perhaps the most obvious evidence of the lack of trust across transactions. Investors will likely always require some degree of assurance that they are getting what they pay for in terms of collateral. However, an automated verification process for income, assets, and employment will launch the industry forward with investor confidence. Should any reconciliation of individual loan file documentation with data files be necessary, results of these reconciliations could be automated and added to a secure blockchain accessible only via private permissions. Over time, investors will become more comfortable with the reliability of the electronic data files describing the mortgage loans submitted to them.

The same technology could be implemented to allow investors to view supporting documents when reps and warrants are triggered and a review of the underlying loan documents needs to be completed.

 

Document Custody

Smart document technologies also have the potential to improve the transparency of the document custody process. At some point the industry is going to have to move beyond today’s humidity-controlled file cabinets and vaults, where documents are obtained and viewed only on an exception basis or when loans are paid off. Adding loan documents that have been reviewed and accepted by the securitization’s document custodian to a secure, permissioned blockchain will allow investors in the securities to view and verify collateral documents whenever questions arise without going to the time and expense of retrieving paper from the custodian’s vault.

——————————-

Securitization makes mortgages and other types of borrowing affordable for a greater population by leveraging the power of global capital markets. Few market participants view mortgage loan securitization dominated by government corporations and government-sponsored enterprises as a desirable permanent solution. Private markets, however, are going to continue to lag markets that benefit from implicit and explicit government guarantees until improved processes, supported by enhanced technologies, are successful in bridging gaps in trust and information asymmetry.

With trust restored, verified by technology, the PLS market will be better positioned to support housing financing needs not supported by the Agencies.

Get a Demo


Back-Testing: Using RS Edge to Validate a Prepayment Model

Most asset-liability management (ALM) models contain an embedded prepayment model for residential mortgage loans. To gauge their accuracy, prepayment modelers typically run a back-test comparing model projections to the actual prepayment rates observed. A standard test is to run a portfolio of loans as of a year ago using the actual interest rates experienced during this time as well as any additional economic factors used by the model such as home price appreciation or the unemployment rate. This methodology isolates the model’s ability to estimate voluntary payoffs from its ability to forecast the economic variables.

The graph below was produced from such a back-test. The residential mortgage loans in the bank’s portfolio as of 10/31/2016 were run through the ALM model (projections) and compared with the observed speeds (actuals). It is apparent that the model did not do a particularly good job forecasting the actual CPRs, as the mean absolute error is 5.0%. Prepayment model validators typically prefer to see mean absolute error rates no higher than 1 to 2%.

Does this mean there is something unique with the bank’s loan portfolio or servicing practices that would cause prepays to deviate from expectations, or does the prepayment model require calibration?

Dissecting the Problem

One strategy is to compare the bank’s prepayment experience to that of the market (see below). The “market” is the universe of comparable loans, in this case residential, conventional loans. This assessment should indicate whether the bank’s portfolio is unique or if it behaves similar to the market. Although this comparison looks better, there are still some material differences, especially at the beginning and end of the time series. 

Examining the portfolio composition reveals a number of differences which could be the source of the discrepancy. For example:

  • Larger-balance loans have a greater refinance incentive.
  • California loans historically prepay faster than the rest of the country, while New York loans are historically slower.
  • Broker and correspondent loans typically pay faster than retail originations.

To compensate, the next step is to adjust the market portfolio to more closely mirror the attributes of the bank’s portfolio. Fine-tuning the “market” so that it better aligns with the bank’s channel and geographic breakout, as well as its larger average loan size, results in the following adjusted prepayment speeds.

Conclusion

Prepayments for the bank’s mortgage portfolio track the market speeds reasonably well with no adjustments. Compensating for the differences in composition related to channel, geography, and loan size tracks even better and results in a mean absolute error of only 1.1%. This indicates that there is nothing unique or idiosyncratic with the bank’s portfolio that would cause projections from a market-based prepayment model to deviate significantly from the observed speeds. Consequently, the ALM prepayment model likely needs adjustments to its tuning parameters to better capture the current environment.


Why Model Validation Does Not Eliminate Spreadsheet Risk

Model risk managers invest considerable time in determining which spreadsheets qualify as models, which are end-user computing (EUC) applications, and which are neither. Seldom, however, do model risk managers consider the question of whether a spreadsheet is the appropriate tool for the task at hand.

Perhaps they should start.

Buried in the middle of page seven of the joint Federal Reserve/OCC supervisory guidance on model risk management is this frequently overlooked principle:

“Sound model risk management depends on substantial investment in supporting systems to ensure data and reporting integrity, together with controls and testing to ensure proper implementation of models, effective systems integration, and appropriate use.”

It brings to mind a fairly obvious question: What good is a “substantial investment” in data integrity surrounding the modeling process when the modeling itself is carried out in Excel? Spreadsheets are useful tools, to be sure, but they meet virtually none of the development standards to which traditional production systems are held. What percentage of “spreadsheet models” are subjected to the rigors of the software development life cycle (SDLC) before being put into use?

 

Model Validation vs. SDLC

More often than not, and usually without realizing it, banks use model validation as a substitute for SDLC when it comes to spreadsheet models. The main problem with this approach is that SDLC and model validation are complementary processes and are not designed to stand in for one another. SDLC is a primarily forward-looking process to ensure applications are implemented properly. Model validation is primarily backward looking and seeks to determine whether existing applications are working as they should.

SDLC includes robust planning, design, and implementation—developing business and technical requirements and then developing or selecting the right tool for the job. Model validation may perform a few cursory tests designed to determine whether some semblance of a selection process has taken place, but model validation is not designed to replicate (or actually perform) the selection process.

This presents a problem because spreadsheet models are seldom if ever built with SDLC principles in mind. Rather, they are more likely to evolve organically as analysts seek increasingly innovative ways of automating business tasks. A spreadsheet may begin as a simple calculator, but as analysts become more sophisticated, they gradually introduce increasingly complex functionality and coding into their spreadsheet. And then one day, the spreadsheet gets picked up by an operational risk discovery tool and the analyst suddenly becomes a model owner. Not every spreadsheet model evolves in such an unstructured way, of course, but more than a few do. And even spreadsheet-based applications that are designed to be models from the outset are seldom created according to a disciplined SDLC process.

I am confident that this is the primary reason spreadsheet models are often so poorly documented. They simply weren’t designed to be models. They weren’t really designed at all. A lot of intelligent, critical thought may have gone into their code and formulas, but little if any thought was likely given to the question of whether a spreadsheet is the best tool for what the spreadsheet has evolved to be able to do.
 

Challenging the Spreadsheets Themselves

Outside of banking, a growing number of firms are becoming wary of spreadsheets and attempting to move away from them. A Wall Street Journal article last week cited CFOs from companies as diverse as P.F. Chang’s China Bistro Inc., ABM Industries, and Wintrust Financial Corp. seeking to “reduce how much their finance teams use Excel for financial planning, analysis and reporting.”

Many of the reasons spreadsheets are falling out of favor have little to do with governance and risk management. But one core reason will resonate with anyone who has ever attempted to validate a spreadsheet model. Quoting from the article: “Errors can bloom because data in Excel is separated from other systems and isn’t automatically updated.”

It is precisely this “separation” of spreadsheet data from its sources that is so problematic for model validators. Even if a validator can determine that the input data in the spreadsheet is consistent with the source data at the time of validation, it is difficult to ascertain whether tomorrow’s input data will be. Even spreadsheets that pull input data in via dynamic linking or automated feeds can be problematic because the code governing the links and feeds can so easily become broken or corrupted.
 

An Expanded Way of Thinking About “Conceptual Soundness”

Typically, when model validators speak of evaluating conceptual soundness, they are referring to the model’s underlying theory, how its variables were selected, the reasonableness of its inputs and assumptions, and how well everything is documented. In diving into these details, it is easy to overlook the supervisory guidance’s opening sentence in the Evaluation of Conceptual Soundness section: “This element involves assessing the quality of the model design and construction.”

How often, in assessing a spreadsheet model’s design and construction, do validators ask, “Is Excel even the right application for this?” Not very often, I suspect. When an analyst is assigned to validate a model, the medium is simply a given. In a perfect world, model validators would be empowered to issue a finding along the lines of, “Excel is not an appropriate tool for a high-risk production model of this scope and importance.” Practically speaking, however, few departments will be willing to upend the way they work and analyze data in response to a model validation finding. (In the WSJ article, it took CFOs to affect that kind of change.)

Absent the ability to nudge model owners away from spreadsheets entirely, model validators would do well to incorporate certain additional “best practices” checks into their validation procedures when the model in question is a spreadsheet. These might include the following:

  • Incorporation of a cover sheet on the first tab of the workbook that includes the model’s name, the model’s version, a brief description of what the model does, and a table of contents defining and describing the purpose of each tab
  • Application of a consistent color key so that inputs, assumptions, macros, and formulas can be easily identified
  • Grouping of inputs by source, e.g., raw data versus transformed data versus calculations
  • Grouping of inputs, processing, and output tabs together by color
  • Separate instruction sheets for data import and transformation

Spreadsheets present unique challenges to model validators. By accounting for the additional risk posed by the nature of spreadsheets themselves, model risk managers can contribute value by identifying situations where the effectiveness of sound data, theory, and analysis is blunted by an inadequate tool.


Non-Qualified Mortgage Securitization Market

Since 2015, a new tier of the private-label residential mortgage-backed securities (PLS) market has emerged, with securities collateralized by non-qualified mortgage (non-QM) loans. These securities enable mortgage lenders to serve borrowers with non-traditional credit profiles.

The financial crisis ushered in a sharp reduction in mortgage credit available to certain groups of borrowers. Funding sources, such as the PLS market, which once provided access for borrowers with credit blemishes, non-traditional income sources, or the desire for expanded product features were virtually eliminated.

The limited issuance of private-label RMBS since the financial crisis has generally consisted of new origination jumbo “prime” mortgage loans. These securities have included loans that meet the “qualified mortgage” (QM) standard with strong credit scores, pristine payment history, and fully documented income and assets. The non-QM market addresses a previously underserved market and reflects the expanding credit policies of many institutions.

What is a Non-Qualified Mortgage Loan?

Since the crisis, standards governing the majority of mortgage loan production have generally followed the restrictive credit criteria implemented by the GSEs. This has prompted some consumers and lenders to seek alternative products that may not meet the “qualified mortgage” requirements or the high-credit-quality standards of the GSEs. These tightened credit standards have restricted home ownership opportunities for certain groups of consumers. These groups include self-employed individuals and borrowers with weaker credit or a recent credit event, such as a foreclosure, short sale, or deed in lieu of foreclosure. While many of these potential borrowers can meet the criteria of the ‘ability-to-repay’ rule and have taken steps to improve their credit standing, they nevertheless are not able to meet the very high credit standards that have emerged since the financial crisis.

To meet the demand of these underserved borrowers, a number of lenders have begun to expand their credit parameters. As lenders have sought funding sources for these non-QM originations, a new tier of the PLS market has emerged. While it is difficult to create generic categories that define the origination practices of the various lenders, some high-level similarities can be observed in the following non-QM products and programs established to meet borrower demand:

  • Alternative Documentation – the borrower’s income is assessed through sources other than available tax returns, business earnings, or Appendix Q requirements. Many non-QM lenders offer variations of bank statement programs (e.g., 24-month review and 12-month review) to determine a self-employed borrower’s ability to repay through analysis of their monthly cash flow.
  • Borrowers with Non-Standard Credit Profile
    • Expanded Credit – borrowers with weaker FICO scores, a recent delinquency on a mortgage, a debt-to-income ratio slightly above the qualified mortgage requirements, or higher loan-to-value ratios.
    • Prior Credit Event – borrowers with recent foreclosure, bankruptcy, or other loss mitigation disposition that have not met the seasoning requirements established by GSE guidelines.
  • Investor Program – financing for investors purchasing 1-4 family rental properties that may not meet GSE guidelines.
  • Foreign National Program – financing for borrowers that are not permanent residents or do not have credit history in the United States.
  • Non-QM Product Features – financing for products that do not meet qualified mortgage guidelines, such as loans with interest-only or balloon features.

Each of these programs evaluate many aspects of the loan during the underwriting process but primarily rely on an evaluation of the borrower’s ability to repay the loan to predict loan performance. These mortgage loan products and programs attempt to meet the housing finance needs of underserved borrowers while assessing the increased risk associated with the expanded lending standards.

Non-QM securities are likely to experience more performance volatility and higher realized losses than their jumbo prime counterparts in negative economic scenarios. This is due to weaker credit profiles among non-QM borrowers, product features that do not meet “qualified mortgage” requirements (e.g., interest-only, balloon payments, prepayment penalties), and alternative methods to assess the borrower’s ability-to-repay. Investors in these securities are challenged to assess the magnitude of the increased risk of loss (net of protection provided by credit enhancement levels) versus the incremental yield provided by the securities.

Overview of Non-Prime Issuers

The non-QM sector has been created and led by non-bank financial institutions that have filled the void left by regulated banking entities that have reduced their footprint in the mortgage market. Most financial institutions that have entered the non-QM mortgage space during the past five years have received financial backing from asset managers, hedge funds or private equity firms. Securitization activity for this sector of the PLS market began in 2015 and has increased slowly since. The table below reflects the strong growth in issuance activity for non-QM securitizations between January 2015 and September 2017:

Next Market Phase

The push by mortgage lenders to expand their credit criteria and provide consumers with “affordability” products combined with investor demand for higher yielding investments set the stage for the financial crisis of 2007-2008. Bolstered by strong demand from investors for mortgage-backed securities, mortgage lenders expanded underwriting guidelines to allow borrowers with weaker credit profiles, smaller down-payment amounts, and limited or no verification of income or assets to qualify for mortgages. Weakened underwriting standards were combined with product features that slowed repayment of principal through interest-only, negative amortization and loan term extension features.

History has shown that the combination of these credit guideline expansions with weaker PLS processes resulted in historic losses. As a reaction to the abysmal credit performance of mortgage loans originated between 2005 and 2007, credit availability in the mortgage market contracted dramatically. The swing of the credit pendulum resulted in significant improvement in the credit performance of loans originated since 2008. This improved performance, however, came at the cost of shutting a large segment of the population out of the mortgage market. Now almost a decade later, the pendulum appears to be swinging back in favor expanding credit criteria to accommodate more non-QM borrowers. Time will tell whether the market has learned and will remember the lessons of the financial crisis.


Get Started
Log in

Linkedin   

risktech2024