Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Blog Archives

Machine Learning Model Selection

Machine learning model selection is the second step of the machine learning process, following variable selection and data cleansing. Selecting the right machine learning model is a critical step, as a model which does not appropriately fit the data will yield inaccurate results. Model selection largely depends on the goal of the model – is the purpose to explore the relationship between the variables or to maximize predictive power? In this blog, we cover a few key concepts of machine learning model selection, including parametic vs. non-parametic models, key metrics for managing the variance-bias tradeoff, and an introduction to a few standard machine learning models.

Parametric vs. Non-Parametric Tradeoffs

One of the first choices to be made in the model selection process pertains to our assumption about the shape of the functional relationship between our explanatory variables (our given, or input, variables) and our response variable (the output that we want to predict). When we choose to assume the shape of our model, we are constructing a parametric model, and our problem reduces to estimating a set of measurable factors, known as parameters.1 One of the most common assumptions is that the data is linear. While we can relax the linear assumption when necessary, we sometimes do not want to assume the shape of the function at all. Non-parametric models help to avoid the case where we incorrectly assume a function that does not match the data. However, a much larger number of observations must be obtained to make non-parametric methods effective, which can be costly or even infeasible.2

In addition to the fact that non-parametric methods are often not practical, there are other tradeoffs to take into consideration. One important tradeoff is between interpretability and flexibility. Since non-parametric models follow the data closely, they often result in abnormally shaped plots, which can be difficult to interpret. If the goal is to make sense of and model the relationship between the explanatory variable and the response, we may be willing to trade some predictive power for a parametric curve that is more understandable. If, however, we are comfortable constructing a “black-box” in hopes of maximizing the predictive power of the model, then non-parametric models may be suitable.Another important tradeoff is that of variance versus bias . Variance, in the context of statistical learning, refers to the amount by which our prediction would change if we had used a different training dataset for our estimation. Bias refers to the error resulting from approximating a complex relationship by using a simplified representation of it. In general, more flexible (non-parametric) methods tend to have higher variance and lower bias, with the opposite being true of less flexible (parametric) models. Ideally though, we want a model that has low variance and low bias. To find it, we most frequently rely on three important tools: R-squared, residual standard error, and diagnostic plots.

R-Squared, Residual Standard Error, and Plots

R-squared—formally, the “coefficient of determination”—measures the amount of variance in the response variable that is explained by the explanatory variables. Constrained between 0 and 1, a very low R-squared can indicate problems with model fit, while a very high R-squared can sometimes indicate overfitting. Residual standard error (RSE) estimates variance in the data. RSE depends on the residual sum of squares—the variation in the data left unexplained after the regression has been run—the number of observations, and the number of explanatory variables.

Graphical plots complement R-squared and RSE. Plots can be as simple as plotting the response variable against a single explanatory variable or against a fitted linear model. This can be useful for detecting non-linearity, but other plots have broader application.

One such plot is the residual plot, which plots the residuals—the difference between the true response variables and the fitted values—and the fitted values themselves. Patterns in residual plots can suggest a lack of model fit, perhaps due to non-constant variance or non-linearity in the data. Outliers and leverage points3 can also be detected through standardized residual, Normal QQ plots, and leverage point/Cook’s distance plots.

Observing these diagnostic plots enables us to make decisions as to what functional form our variables should take. For instance, by taking a logarithmic function (a curved function) of our response variable, we can help to account for non-constant variance in our model, or a non-linear relationship with the explanatory variables. We can also relax the additive assumption in a linear model by adding multiplicative combinations of variables—a technique that helps to model a synergistic relationship between variables.

Machine Learning Models: Shrinkage Methods, Splines, and Decision Trees

Our goal is to determine the model with the highest probability of having realistically generated the data, and we have summarized above the most important metrics that can help us identify such a model. However, it is also important to be aware of several standard models—to know ahead of time which are likely to be most useful.

Shrinkage methods are an alternative to the standard linear model and most notably include ridge and lasso regressions. While these models are similar to ordinary least squares, they include a shrinkage “penalty” which shrinks the coefficients, as an increasing function of their magnitude, toward zero. Through adding this constraint, the model can offer a sizeable reduction in variance in exchange for a slight increase in bias. A tuning parameter—a coefficient on this penalty—can help us fine-tune the amount of variance we want to eliminate, as well as bias we are willing to accept.4

If we are looking for a model with more flexibility and predictive power, splines may be an avenue to explore. Splines introduce several “knots” into the model, creating a smooth, continuous line with many different slopes. Unsurprisingly, since splines are much more flexible than linear regression or shrinkage methods, they have a lower bias due to following the data more closely. They also do a better job than polynomial regressions, as they provide more consistent estimates.5 

A third option is decision trees, which provide more flexibility, but are also highly interpretable due to the way they segment the problem into a hierarchical structure. The idea is to segment the set of possible values for the random variables into a distinct number of regions and make the same prediction for each observation in a particular region. This is generally done using an algorithm to select the most meaningful way to segment the observations, then the next most, and so on. Once this iterative algorithm is complete, we are left with what is usually a complex, hierarchical tree-like structure that can be readily mapped into a highly intuitive visualization. Decision trees can be very useful for their interpretability, ability to model non-linear data, and arguably more realistic approach to modeling human decision-making.

Application to Finance and Mortgage Data

We can use machine learning to answer a wide variety of questions related to finance and mortgage data, but it is crucial to understand the model selection process. Strong domain knowledge can help considerably in knowing what assumptions would be plausible, but a knowledge of diagnostic metrics, as well as the different types of models, their strengths, and weaknesses, can help unlock insights and uncover the logic behind processes—especially when answering questions that have yet to be answered. Whether your goal is to identify which customers are most likely to default on a loan, determine the elasticity of demand for a certain type of loan, or cut out some of the noise in the data, a solid grounding in approaches to model selection can help significantly.

 

[1] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, Introduction to Statistical Learning (New York: Springer, 2013), 21-22.
[2] James, Witten, Hastie, and Tibshirani, 23.
[3] Outliers are Y values that are unusual given the explanatory variables. Leverage points are X values that are surprising given the response variables.
[4] James, Witten, Hastie, and Tibshirani, 218.
[5] James, Witten, Hastie, and Tibshirani, 276.


End-User Computing Controls – Building an EUC Inventory

An accounting manager at a mid-sized bank recently wondered aloud to us how to approach implementing end-user computing controls (EUC).  She had recently become responsible for identifying and overseeing her institution’s unknown number of EUC applications and had obviously given a lot of thought to the types of applications that needed to be identified and what the review process ought to look like. She recognized that a comprehensive inventory would need to be built, but, like so many others in her position, was uncertain of how to go about it.

We reasoned together that her options fell into two broad categories—each of which has benefits and drawbacks.

The first category of inventory-building options we classified as a top-down approach. This begins with identifying all data contained in financial statements or mission-critical management reports and then working backward from there to identify every model, database, spreadsheet, or other application that is used to generate these reports. The second category is a bottom-up approach, which first identifies every single spreadsheet in use at the bank and then determines which of those rise to the level of EUCs and need to be formally and independently reviewed.

 

Top-Down EUC Inventory Building

The primary advantage of a top-down approach is the comfort of knowing that everything important has been accounted for. An EUC inventory that is built systematically by tracing every figure on every balance sheet, income statement, and footnote back to every spreadsheet that contributed to it is not likely to miss much. Top-down approaches have the added benefit of placing the EUC inventory coordinator firmly in control of the exercise because she knows precisely what she is looking for. “We’re forecasting $23 million in retail deposit runoff next month,” she might observe. “Someone needs to show me the system that generated that figure. And if it’s a spreadsheet, then it needs an EUC review.”

The downside is that this exercise usually turns out to be more complicated than it sounds. One problem with requests that begin with “Somebody needs to show me…” is that “somebody” can often be hard to track down. Also, “somebody” many times is “somebodies.” Individual financial statement line items are often supported by multiple spreadsheets, and those spreadsheets may have data-feed issues of their own. What begins looking like it should be a straightforward exercise quickly evolves into one of those dreaded “spaghetti bowl” problems where attempting to extract a single strand leads to a tangled mess. A single required line item—say, cash required for loan originations in the next 90 days—would likely require input from a half-dozen or more EUCs tracking everything from economic forecasts to pipeline reports for any number of different loan types and origination channels. Before long, the person in charge of end-user computing controls can begin to feel like she’s been placed in charge of auditing not just EUCs, but the entire bank.

 

Bottom-Up EUC Inventory Building

A more common means to building an EUC inventory is a bottom-up approach that identifies every spreadsheet on the network and then relies on a combination of manual and automated methods to sort them into one of three bins:

  1. Models (which have hopefully already been tagged and classified during a separate model-inventory-building process)
  2. Non-computational/non-relevant spreadsheets (spreadsheets that either contain data only and do not perform calculations or spreadsheets that do not contribute to a quantitative business purpose—e.g., leave schedules, org charts, and fantasy football standings)
  3. EUCs (pretty much everything that does not get filtered into the first two bins)

Identifying all the spreadsheets can be done manually or using an automated “discovery” tool. Even in the very smallest institutions, manual discovery is too big a job for a single person. Typically, individual business unit heads will be tasked with identifying all of the EUCs in use within their various realms and reporting them to a central EUC oversight coordinator. The advantage of this approach is that it enables non-EUC spreadsheets to be filtered out before they get to the central EUC oversight coordinator, which makes that person’s job easier. The disadvantage is that it is unlikely to capture every EUC. Business unit heads are incentivized to apply a sub-optimal set of criteria when determining whether a spreadsheet should be classified as an EUC. They are likely to overlook files that an impartial EUC coordinator might wish to review.

An automated discovery tool avoids this problem by grabbing everything—every spreadsheet in a given shared drive or folder structure and then scanning and evaluating them for formulas and levels of complexity that contribute to an EUC’s risk rating. Automated scanning tools have the dual benefit of enabling central EUC coordinators to peer into how individual business units are using spreadsheets without having to rely on the judgment of business unit heads to determine what is worthy of review. The downside is that, even with all the automated filtering discovery tools are capable of, they are likely to result in the “discovery” of a lot of spreadsheets that ultimately do not need to go through an EUC review. Paradoxically, the more automated the discovery process is, the more manual the winnowing needs to be.

 

A Hybrid Approach to End-User Computing Controls

As with many things, the best solution probably lies somewhere in the middle—drawing from the benefits of both top-down and bottom-up approaches.

While a pure top-down approach is usually too involved to be practical on its own, elements of a top-down approach can enlighten and facilitate a bottom-up process. For example, a bottom-up process may identify several spreadsheets whose complexity and perceived importance to the departments that use them make them appear to be high-risk EUCs in need of review. However, a top-down review may reveal that these spreadsheets ultimately do not contribute to financial or enterprise-wise management reporting. It could be that the importance of some spreadsheets does not extend far enough beyond the business unit that owns them to require an independent review. Furthermore, being able to connect the dots between spreadsheets that are identified using a bottom-up approach and individual financial statement/management report entries can help ensure that all important entries are accounted for.

A hybrid approach—one that is informed both by an understanding of critical reporting items and a series of comprehensive, automated discovery scans—introduces the virtues of both methods and is most likely to yield an EUC inventory that is both comprehensive and aligned with an institution’s risk profile.


Evaluating Supervised and Unsupervised Learning Models

Model evaluation (including evaluating supervised and unsupervised learning models) is the process of objectively measuring how well machine learning models perform the specific tasks they were designed to do—such as predicting a stock price or appropriately flagging credit card transactions as fraud. Because each machine learning model is unique, optimal methods of evaluation vary depending on whether the model in question is “supervised” or “unsupervised.” Supervised machine learning models make specific predictions or classifications based on labeled training data, while unsupervised machine learning models seek to cluster or otherwise find patterns in unlabeled data.

Unsupervised Learning

Common unsupervised learning techniques include clustering, anomaly detection, and neural networks. Each technique calls for a different method of evaluating performance. We’ll focus on clustering models as an example. Clustering is the task of grouping a set of objects in such a way that objects in the same cluster are more like each other than they are to objects in other clusters. Various algorithms are capable of clustering, including k-means and hierarchical, which differ in their definitions of a cluster and how to find one.

Evaluating Unsupervised Learning Models

Let’s assume that we need to cluster banking customers together into groups based on the amount and magnitude of risk they pose. After the clustering algorithm has grouped the customers into distinct clusters, we need to evaluate how well those clusters were formed. The lack of labels on an unsupervised learning model’s training data makes evaluation problematic because there is nothing to which the model’s results can be meaningfully compared.  If we were to manually group these customers, we could then compare our manual groupings with the algorithm’s, but often this is not an option due to time or labor constraints, so we need a more efficient way to determine how well the algorithm performed.

One way would be to determine 1) how close each customer within each cluster is to every other customer in its cluster (the “intra-cluster” distance”) and 2) how close each cluster of customers is to other clusters (the “inter-cluster” distance), and then to compare the two distances. Models that produce relatively small intra-cluster distances and relatively large inter-cluster distances evaluate favorably because they appear to be doing a good job of grouping like customers with discrete characteristics.

Supervised Learning

Within supervised learning there are techniques for both regression and classification tasks. While some techniques are suited to either regression or classification, some can be used for both. For example, linear regression can only be used for regression while support vector machines and random forests can be used for either. While each of these is a different technique, the metrics that we use to evaluate them are the same, so we can even compare these models to one another.  In our examples, we’ll focus on flagging credit card purchases as fraud, a classification task, and predicting housing prices, a regression task.

 

Evaluating Supervised Models

The task of evaluating how well a supervised learning model performs is more straightforward. Because supervised learning models learn from labeled training data, once they have been fitted using training data, they can be tested against data from the same population and therefore has the same labels.

For example, let’s say we need to classify whether a credit card transaction is fraudulent and we have a dataset of transactions with labels of either “fraud” or “not fraud.” We can (and sometimes do1) train our model on all the available data, but this prevents us from fairly evaluating it because no “independent” data remains for testing and overfitting2 becomes difficult to detect. This problem can be avoided by splitting the available data into training and testing sets.

This can be accomplished in various ways. For simplicity, we’ll first talk about splitting our dataset into two sets: a training set (typically 70% of the whole dataset) from which the model learns and a test set (the other 30%). Because the test set is withheld from the model during training, it can contribute to an unbiased evaluation of how well a model performs on previously unseen data. This protects against overfitting and allows us to evaluate how our model would perform “in the wild” on new data as it emerges.

Cross-validation is another antidote for overfitting. Cross-validation involves partitioning data into multiple groups and then training and testing models on different group combinations. For example, in a 5-fold cross-validation we would split our transaction data set into five partitions of equal sizes. We would then train our model on four of those five partitions and test our model on the remaining partition. We would then repeat the process—selecting a different partition to be the test group and training a new model on the remaining set of four partitions. We would repeat three more times, for a total of five rounds of cross-validation, one for each fold. We will then have five different models, each having been trained and tested on a different subset of data and each having their own weights and prediction accuracy. At the end, we combine these models by averaging their weights together to estimate a final predictive model.

Classification metrics are the measures against which models are evaluated. The simplest and most common such metric is accuracy. Accuracy is computed by dividing the number of correct predictions by the total number of predictions. In our supervised transaction classification model example, if we tested our model on one hundred transactions and correctly predicted their label (fraud/not fraud) for ninety-five of them, then the accuracy of our model is 95%.

Accuracy is the simplest, most understandable metric we can use, but we wouldn’t want to rely on accuracy alone because it doesn’t distinguish between false positives, transactions incorrectly classified as fraud, and false negatives, transactions incorrectly classified as non-fraud. For this we need a confusion matrix.

A confusion matrix is a 2-by-2 table that sorts predictions into one of four classifications: true positive, true negative, false positive, and false negative. Our transaction classification model might generate a confusion matrix like this one:

The confusion matrix indicates that, out of 100 total transactions, our model correctly predicted fraud four times and correctly predicted not fraud 91 times, yielding an overall accuracy of 95%. The confusion matrix, however, also enables us to see the number of times the model incorrectly predicted that a transaction was fraud—a false positive which occurred on two out of the 100 transactions. We can also see the number of times the model predicted a transaction was not fraud when it was—a false negative which occurred on three out of the 100 transactions.

While the model appears to boast a fairly strong “true negative” rate—the percentage of non-fraud messages correctly classified as such (91/(91+2)=97.8%), the model’s “true positive” rate—the percentage of fraud messages correctly flagged as such (4/(4+3)=57.1%) is far less attractive. Breaking down the model’s performance in this way paints a different and more complete picture than the 95% accuracy rate alone.

Evaluation methods apply to regression models, as well. Let’s assume we have a regression model that’s been trained to predict housing prices. The model’s predicted prices can be compared with actual prices using the mean squared-error, which measures the average of the squares of the errors, which are the differences between the actual and predicted price. The lower the mean squared-error, the better the model.

All models need to be subjected to evaluation—when they are built and throughout their lives. Supervised and unsupervised learning models pose different sorts of evaluation challenges, and selecting the right type of metrics is key.

Talk Scope


[1] Many fraud detection models are also built using neural networks and other unsupervised learning techniques.

[2] Overfitting occurs when a model makes generalizations about coincidental data elements that in reality are not germane to the analysis. Continuing the example of fraud detection, overfitting may occur if model training detects a correlation between the length of a customer’s name (or whether the customer’s name begins with a vowel) and the likelihood that a transaction is fraudulent. Testing is likely to expose random, spurious correlations of this type for what they are, as they are not likely to be replicated in the test data set that has been held out from the training data. A model that has been “overfit” to its training data is likely to return a considerably lower accuracy ratio on the test data.


https://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://www.oreilly.com/data/free/files/evaluating-machine-learning-models.pdf

https://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_and_assessment

http://www.mit.edu/~9.54/fall14/slides/Class13.pdf

https://stats.stackexchange.com/questions/79028/performance-metrics-to-evaluate-unsupervised-learning


Feature Selection – Machine Learning Methods

Feature selection in machine learning refers to the process of isolating only those variables (or “features”) in a dataset that are pertinent to the analysis. Failure to do this effectively has many drawbacks, including: 1) unnecessarily complex models with difficult-to-interpret outcomes, 2) longer computing time, and 3) collinearity and overfitting. Effective feature selection eliminates redundant variables and keeps only the best subset of predictors in the model, thus making it possible to represent the data in the simplest way.This post begins by identifying steps that must be taken to prepare datasets for meaningful analysis—and how machine learning can help. We then introduce and discuss some commonly used machine learning techniques for variable selection.

Data Cleansing

Real world data contains a wide range of holes, noise, and inconsistencies. Before doing any statistical analysis, it is crucial to ensure that the data can be meaningfully analyzed. In practice, data cleansing is often the most time-consuming part of data analysis. This upfront investment is necessary, however, because the quality of data has a direct bearing on the reliability of model outputs.

Various machine learning projects require different sorts of data cleansing steps, but in general, when people speak of data cleansing, they are referring to the following specific tasks.

Cleaning Missing Values

Many machine learning techniques do not support data with missing values. To address this, we first need to understand why data are missing. Missing values usually occur simply because no information is provided, but other circumstances can lead to data holes as well. For instance, setting incorrect data types for attributes when data is extracted and integrated from multiple sources can cause data loss.

One way to investigate missing values is to identify patterns for missing data. For example, missing answers for certain questions from female respondents in a survey may indicate that those questions are only asked of male respondents. Another example might involve two loan records that share the same ID. If the second record contains blank values for every attribute except ‘Market Price,’ then the second record is likely simply updating the market price of the first record.

Once the early-stage evaluation of missing data is complete, we can set about determining how to address the problem. The easiest way to handle missing values is simply to ignore the records that contain them. However, this solution is not always practical. If a relatively large portion of the dataset contains missing values, then removing all of them could result in remaining data that may not be a good representation of the initial population. In that case, rather than filtering out relevant rows or attributes, a more proper approach is to impute missing values with sensible values.

A typical imputing method for categorical variables involves replacing the missing values with the most frequent value or with a newly created “unknown” category. For numeric variables, missing values might be replaced with mean or median values. Other, more advanced methods for dealing with missing values, e.g., listwise deletion for deleting rows with missing data and multiple imputation for substituting missing values, exist as well.

Reducing Noise in Data

“Noise” in data refers to erroneous values and outliers. Noise is an unavoidable problem which can be caused by human mistakes in data entry, technical problems, and many other factors. Noisy data adversely influences model performance, so its detection and removal has a key role to play in the data cleaning process.

There are two major noise types in data: class noise and attribute noise. Class noise often occurs in categorical variables and can include: 1) non-standardized class labels, 2) duplicate records mapping to different class labels, and 3) mislabeled records. Attribute noise refers to corruptive values and outliers, such as percentages inappropriately greater than 100% and placeholders (e.g., 999,000).1

There are many ways to deal with noisy data. Certain type of noise can be easily identified by sorting the data—thus isolating text input where numeric input is expected and other placeholders. Other noise can be addressed only using statistical methods. Clustering analysis groups the data by similarity and can help with detecting irrelevant objects and outliers. Data binning is used to reduce the impact of observation errors by combining ‘neighborhood’ data into a small number of bins. Advanced smoothing algorithms, including moving average and loess, fit the data into regression functions to eliminate the effect due to random variation and allow important patterns to stand out.

Data Normalization

Data normalization converts numerical values into specific ranges to meet the needs of a model. Performing data normalization makes it possible to aggregate data with different scales. Several algorithms require normalized data. For example, it is necessary to normalize data before feeding into principal component analysis (PCA) so that all variables have zero mean and unit variance and therefore the same weight. This also applies when performing support vector machines (SVM), which assumes that the input data is in range [0,1] or [-1,1]. Unnormalized data slows down model convergence time and skews results.

The most common way of normalizing data involves Z-score. Also known as standard-score normalization, this approach normalizes the error by dividing the difference between the data and mean by standard deviation. Z-score normalization is often used when min and max are unknown. Another common method is feature scaling, which brings all values into range [0,1] by dividing the difference between the data and min by the difference between max and min. Other normalization methods include studentized residual, t-statistics, and coefficient of variation.

Feature Selection Methods2

Stepwise Procedures

A stepwise procedure adds or subtracts individual features from a model until the optimal mix is identified. Stepwise procedures take three forms: backward elimination, forward selection, and stepwise regression.

Backward elimination is the simplest method. It fits the model using all available features and then systematically removes features one at a time, beginning with the feature with the highest p-value (provided the p-value exceeds a given threshold, usually 5%). The model is refit after each elimination and process loops until a model is identified in which each feature’s p-value falls below the threshold.

Forward selection is the opposite of backward elimination. It includes no variables in the model at first and then systematically adds features one at a time, beginning with the lowest p-value (provided the p-value falls below a threshold). The model is refit after each addition and loops until additional features do not help model performance.

Stepwise regression combines backward elimination and forward selection by allowing a feature to be added or dropped at each iteration. Using this method, a newly added variable in an early stage may be removed later, and vice versa.

Criterion-Based Procedures

A variable’s p-value is not the only statistic that can be used for feature selection. Penalized-likelihood criteria, such as akaike information criterion (AIC) and bayesian information criterion (BIC), are also valuable. Lower AICs and BICs indicate that a model is more likely to be true. They are given as: nlog (RSS/n) + kp, where RSS is residual sum of square (which decreases as the model complexity increases), n is sample size, p is numbers of predictors, and k is two for AIC and log(n) for BIC. Both criteria penalize larger models as p goes up, and BIC penalizes model complexity more heavily, which explains why BIC tends to favor smaller models in comparison to AIC. Other criteria are 1) Adjusted R2, which increases only if a new feature improves model performance more than expected, 2) PRESS, summing up squares of predicted residuals, and 3) Mallow’s Cp Statistic, estimating the average MSE of prediction.

Lasso and Ridge Regression

Lasso and ridge regressions are powerful techniques for dealing with large feature coefficients. Both approaches reduce overfitting by penalizing features with large coefficients and minimizing the difference between predicted value and observation, but they differ when adding penalized terms. Lasso adds a penalty term equivalent to the absolute value of the magnitude of coefficients, so that it zeros out target variables’ coefficients and eliminates them from the model. Ridge assigns a penalty equivalent to square of the magnitudes of the coefficients. Even though it does not shrink the coefficient to zero, it can regularize and constrain the coefficients to control variance.

Lasso and ridge regression models have been widely used in finance since their introduction. A recent example used both these methods in predicting corporate bankruptcy.3 In this study, the authors discovered that these regression methods are optimal as they handle multicollinearity and minimize the numerical instability that may occur due to overfitting.

Dimensionality Regression

“Dimensionality reduction” is a process of transforming an extraordinarily complex, “high-dimensional” dataset (i.e., one with thousands of variables or more) into a dataset that can tell the story using a significantly smaller number of variables.

The most popular linear technique for dimensionality reduction is principal component analysis (PCA). It converts complex dataset features into a new set of coordinates named principal components (PCs). PCs are created in such a way that each succeeding PC preserves the largest possible variance under the condition that it is uncorrelated with the preceding PCs. Keeping only the first several PCs in the model reduces data dimensionality and eliminates multi-collinearity among features.

PCA has a couple of potential pitfalls: 1) PCA is sensitive to the scale effects of the original variables (data normalization is required for performing PCA), and 2) Applying PCA to the data will hurt its ability to interpret the influence of individual features since the PCs are not real variables any more. For these reasons, PCA is not a good choice for feature selection if interpretation of results is important.

Dimensionality reduction and specifically PCA have practical applications to fixed income analysis, particularly in explaining term-structure variation in interest rates. Dimensionality reduction has also been applied to portfolio construction and analytics. It is well known that the first eigenvector identified by PCA maximally captures the systematic risk (variation of returns) of a portfolio.4 Quantifying and understanding this risk is essential when balancing a portfolio.


[1] http://sci2s.ugr.es/noisydata
[2] http://www.biostat.jhsph.edu/~iruczins/teaching/jf/ch10.pdf
[3] Pereira, J. M., Basto, M., & da Silva, A. F. (2016). The Logistic Lasso and Ridge Regression in Predicting Corporate Failure. Procedia Economics and Finance, v.39, pp.634-641.
[4] Alexander, C. (2001). Market models: A guide to financial data analysis. John Wiley & Sons.


Fed MBS Runoff Portends More Negative Vega for the Broader Market

With much anticipation and fanfare, the Federal Reserve is finally on track to reduce its MBS holdings. Guidance from the September FOMC meeting reveals that the Fed will allow its MBS holdings to “run off,” reducing its position via prepayments as opposed to selling it off. What does this Fed MBS Runoff mean for the market? In the long-term, it means a large increase in net supply of Agency MBS and with it an increase in overall implied and realized volatility.

MBS: The Largest Net Source of Options in the Fixed-Income Market

We start this analysis with some basic background on the U.S. MBS market. U.S. homeowners, by in large, finance home purchases using fixed-rate 30-year mortgages. These fixed-rate mortgages amortize over time, allowing the homeowner to pay principal and interest in even, monthly payments. A homeowner has the option to pay off this mortgage early for any reason, which they tend to do when either the homeowner moves, often referred to as turnover, or when prevailing mortgage rates drop significantly below the homeowner’s current mortgage rate, referred to as refinancing or “refis.” As a rough rule-of-thumb, turnover has varied between 6% and 10% per annum as economic conditions vary, whereas refis can drive prepayments to 40% per annum under current lending conditions.[1] Rate refis account for most of a mortgage’s cash flow volatility. If the homeowner is long the option to refinance, the MBS holder is short that same option. Fixed-rate MBS shorten due to prepayments as rates drop, and extend as rates rise, putting the MBS holder into a short convexity (gamma) and short vega position. Some MBS holders hedge this risk explicitly, buying short- and longer-dated options to cover their short gamma/short vega risk. Others hedge dynamically, including money managers and long-only funds that tend to target a duration bogey. One way or another, the short-volatility risk from MBS is transmitted into the larger fixed-income market. Hence, the rates market is net short vol risk. While not all investors hedge their short-volatility position, the aggregate market tends to hedge a similar amount of the short-options position over time. Until, of course, the Fed, the largest buyer of MBS, entered the market. From the start of Quantitative Easing, the Fed purchased progressively more of the MBS market, until by the end of 2014 the Fed just under 30% of the agency MBS market. Over the course of five years, the effective size of the MBS market ex-Fed shrunk by more than a quarter. Since the Fed doesn’t hedge its position, either explicitly through options or implicitly through delta-hedging, the size of the market’s net-short volatility position dropped by a similar fraction.[2]

The Fed’s Balance Sheet

As of early October 2017, the Federal Reserve owned $1.77 trillion agency MBS, or just under 30% of the outstanding agency MBS market. The Fed publishes its holdings weekly which can be found on the New York Fed’s web site here. In the chart below, we summarize the Fed’s 30yr MBS holdings, which make up roughly 90% of the Fed’s MBS holdings. [3]

Fed Holdings - 30yr MBS

Runoff from the Fed

Following its September meeting, the Fed announced they will reduce their balance sheet by not reinvesting run-off from their treasury and MBS portfolio. If the Fed sticks to its plan, MBS monthly runoff from MBS will reach $20B by 2018 Q1. Assuming no growth in the aggregate mortgage market, runoff from these MBS will be replaced with the same size of new, at-the-money MBS passthroughs. Since the Fed is not reinvesting paydowns, these new passthroughs will re-enter the non-Fed-held MBS market, which does hedge volatility by either buying options or delta-hedging. Given the expected runoff rate of the Fed’s portfolio, we can now estimate the vega exposure of new mortgages entering the wider (non-Fed-held) market. When fully implemented, we estimate that $20B in new MBS represents roughly $34 million in vega hitting the market each month. To put that in perspective, that is roughly equivalent to $23 billion notional 3yr->5yr ATM swaption straddles hitting the market each and every month.

Conclusion

While the Fed isn’t selling its MBS holdings, portfolio runoff will have a significant impact on rate volatility. Runoff implies significant net issuance ex-Fed. It’s reasonable to expect increased demand for options hedging, as well as increased delta-hedging, which should drive both implied and realized vol higher over time. This change will manifest itself slowly as monthly prepayments shrinks the Fed’s position. But the reintroduction of negative vega into the wider market represents a change in paradigm which may lead to a more volatile rates market over time.


[1] In the early 2000s, prepayments hit their all-time highs with the aggregate market prepaying in excess of 60% per annum. [2] This is not entirely accurate. The short-vol position in a mortgage passthrough is also a function of its note rate (GWAC) with respect to the prevailing market rate, and the mortgage market has a distribution of note rates. But the statement is broadly true. [3] The remaining Fed holdings are primarily 15yr MBS pass-throughs.


What is an “S-Curve” and Does it Matter if it Varies by Servicer?

Mortgage analysts refer to graphs plotting prepayment rates against the interest rate incentive for refinancing as “S-curves” because the resulting curve typically (vaguely) resembles an “S.” The curve takes this shape because prepayment rates vary positively with refinance incentive, but not linearly. Very few borrowers refinance without an interest rate incentive for doing so. Consequently, on the left-hand side of the graph, where the refinance incentive is negative or out of the money, prepayment speeds are both low and fairly flat. This is because a borrower with a rate 1.0% lower than market rates is not very much more likely to refinance than a borrower with a rate 1.5% lower. They are both roughly equally unlikely to do so.

As the refinance incentive crosses over into the money (i.e., when prevailing interest rates fall below rates the borrowers are currently paying), the prepayment rate spikes upward, as a significant number of borrowers take advantage of the opportunity to refinance. But this spike is short-lived. Once the refinance incentive gets above 1.0% or so, prepayment rates begin to flatten out again. This reflects a segment of borrowers that do not refinance even when they have an interest rate incentive to do so. Some of these borrowers have credit or other issues preventing them from refinancing. Others are simply disinclined to go through the trouble. In either case, the growing refinance incentive has little impact and the prepayment rate flattens out.

These two bends—moving from non-incentivized borrowers to incentivized borrowers and then from incentivized borrowers to borrowers who can’t or choose not to refinance—are what gives the S-curve its distinctive shape.

Figure 1: S-Curve Example

An S-Curve Example – Servicer Effects

Interestingly, the shape of a deal’s S-curve tends to vary depending on who is servicing the deal. Many things contribute to this difference, including how actively servicers market refinance opportunities. How important is it to be able to evaluate and analyze the S-curves for the servicers specific to a given deal? It depends, but it could be imperative.

In this example, we’ll analyze a subset of the collateral (“Group 4”) supporting a recently issued Fannie Mae deal, FNR 2017-11. This collateral consists of four Fannie multi-issuer pools of recently originated jumbo-conforming loans with a current weighted average coupon (WAC) of 3.575% and a weighted average maturity (WAM) of 348 months. The table below shows the breakout of the top six servicers in these four pools based on the combined balance.

Figure 2: Breakout of Top Six Servicers

Over half (54%) of the Group 4 collateral is serviced by these six servicers. To begin the analysis, we pulled all jumbo-conforming, 30-year loans originated between 2015 and 2017 for the six servicers and bucketed them based on their refi incentive. A longer timeframe is used to ensure that there are sufficient observations at each point. The graph below shows the prepayment rate relative to the refi incentive for each of the servicers as well as the universe.

Figure 3: S-curve by Servicer

For loans that are at the money—i.e., the point at which the S-curve would be expected to begin spiking upward—only those serviced by IMPAC prepay materially faster than the entire cohort. However, as the refi incentive increases, IMPAC, Seneca Mortgage, and New American Funding all experience a sharp pick-up in speeds while loans serviced by Pingora, Lakeview, and Wells behave comparable to the market.

The last step is to compute the weighted average S-curve for the top six servicers using the current UPB percentages as the weights, shown in Figure 4 below. On the basis of the individual servicer observations, prepays for out-of-the-money loans should mirror the universe, but as loans become more re-financeable, speeds should accelerate faster than the universe. The difference between the six-servicer average and the universe reaches a peak of approximately 4% CPR between 50 bps and 100 bps in the money. This is valuable information for framing expectations for future prepayment rates. Analysts can calibrate prepayment models (or their outputs) to account for observed differences in CPRs that may be attributable to the servicer, rather than loan characteristics.

Figure 4: Weighted Average vs. Universe

This analysis was generated using RiskSpan’s data and analytics platform, RS Edge.


Reviving the Private-Label RMBS Market with Improvements to the Securitization Process

Weaknesses in securitization processes for mortgage loans contributed to the financial crisis of 2007 – 2008 and have led to a decade-long stagnation in the private-label residential mortgage-backed securities (PLS) market.

Although market participants have attempted to improve known weaknesses, lack of demand for private-label RMBS reflects investors’ reluctance to re-enter the market and the need for continued improvements to securitization processes to re-establish market activity.  While significant issues still need to be addressed, promising advances have been made in the PLS market that improve information provided to investors as well as checks and balances designed to boost transaction performance.

Specifically, we are beginning to see significant improvements in the following securitization processes:

  • Due Diligence
  • Rating Agency Assessment
  • Representation and Warranty Framework and Enforcement
  • Loan Quality Standards
  • Risk Retention
  • Bondholder Communication

Enhancements to these processes in the post-crisis PLS market improve transparency; align incentives between issuers, sponsors, and investors; and may lead to increased investor trust in this market segment.

Due Diligence

The due diligence process is intended to provide the purchaser of an asset with an opportunity to assess the asset’s quality. Prior to the financial crisis, investors relied on the underwriter of the securitization (i.e., an investment bank) to perform loan-level due diligence on their behalf and assess the quality of the underlying loans. Limited information about these reviews was made available to investors. The process was opaque and did not provide investors a clear view of the quality of loans underlying a securitization.

Prior to the financial crisis, due diligence was performed on between 5% and 10% of the loans in a securitization. (Slightly larger samples were selected for Alt-A and subprime transactions.) The criteria for selecting the specific loans in the sample was generally not communicated to investors and rating agencies. Even more odd, the due diligence results were not communicated to key transaction parties (rating agencies and investors) and issuers did not disclose the results in disclosure documents.

Since the crisis, the following improvements to the due diligence process have made it more transparent:

  • While specific due diligence sample sizes have not been mandated, securitizations issued since the financial crisis have significantly increased the percentage of loans being reviewed—in many transactions, issuers have even included all loans. In two recent Prime Jumbo securitizations, Flagstar and JPMorgan Chase performed 100% due diligence on the underlying loans.
  • Rating Agencies have defined requirements for the firms that perform due diligence activities. Market participants have recommended standards for the scope of the due diligence performed. For example, the Structured Finance Industry Group (“SFIG”) has outlined general criteria for the review of credit, property valuation and regulatory compliance on loans reviewed during the due diligence.
  • Due diligence results are provided to all rating agencies under SEC Rule 17g-10. These reports detail the number of loans reviewed, due diligence findings, the number of loans dropped during the due diligence process, and the rationale behind dropping them. The reports summarize grades assigned to each loan based on rating agency criteria and are made available on the Securities and Exchange Commission (“SEC”)’s EDGAR site as well as in securitization disclosure documents.
  • If a transaction is rated, issuers are required to file detailed reports of due diligence results with the SEC (Rule 17-Ga2 filings) at least five business days prior to first sale of an offered security. Examples of summary reports for both the Flagstar and JPMorgan Chase securitizations show the additional information on due diligence results provided to investors. For those investors interested in more detail, loan-level reporting of the due diligence findings is also available on EDGAR.

This increased transparency enables investors to independently assess the quality of mortgage loans in a private-label RMBS transaction and factor the results of the due diligence process into their investment decision.

Rating Agency Assessment Process

Over-reliance on rating agencies and the conflict of interest caused by the “issuer pay” model for credit ratings is a frequently cited problem with pre-crisis private-label RMBS transactions. Passage of the Dodd-Frank Act is expected to help reduce the blind reliance by investors and regulators on the ratings process by eliminating the use of credit ratings within the regulatory framework and increasing independent due diligence by investors. Despite tremendous criticism of the “issuer pay” model, the system remains intact almost a decade after the financial crisis across multiple asset classes, including corporate bonds and municipal bonds. The Dodd-Frank Act, however, now requires rating agencies to establish “firewalls” between their business development processes and their ratings processes.

With the criticism levied on the performance and opacity of the rating agency assessment process, the SEC Rule 17g-7 requires public disclosures from rating agencies whenever they provide a credit rating.  With these new disclosures, rating agencies have increased the transparency of the ratings process by making public the following changes to their assessment process:

  • Assumptions, methodologies, and processes used to rate transactions
  • Pre-Sale Reports that outline how a rating agency reviews the specific transaction, including areas such as the capital structure, cash flow triggers, pool characteristics, loan underwriting criteria, representations and warranties, and origination and servicing practices

While many market participants criticize the pre-crisis methodologies used by rating agencies to establish credit enhancement levels, pre-sale reports detail reviews performed on each rated private-label RMBS transaction and the assessments made by rating agencies to compute the expected credit enhancement requirements to support the securitization ratings.

In response to a weak pre-crisis representation and warranty framework (discussed in greater detail in the following section), rating agencies now publish “market standard” representations and warranties for each asset class and compare the representations and warranties in each private-label RMBS transaction being evaluated against the standard. The rating agencies also assess a transaction’s processes for enforcing representations and warranties (including repurchases) when a breach occurs.

Rating agencies typically publish the pre-sale report and their assessment of the representations and warranties a few days before a new private-label RMBS issuance is priced. Together with the preliminary offering documents, these items provide post-crisis PLS market investors a comprehensive view of the transaction’s risk prior to making a pricing / investment decision.

Finally, in another step to reduce the risk of issuers “shopping” for favorable ratings, SEC Rule 17g-5 requires rating agencies to make information provided to them by an issuer available to all other rating agencies. This allows other rating agencies to assess transactions on an equal basis and reach independent conclusions – using the same data – on credit enhancement requirements.

One measure of whether the rating agency process has changed since the crisis is the credit enhancement levels themselves. Higher credit enhancement levels would tend to suggest more stringent ratings. Credit enhancement levels on prime jumbo private-label RMBS can be observed in the tables below.

Post-Crisis Transaction Summary:

Pre-Crisis Transaction Summary:

In general, post-crisis AAA credit enhancement levels are higher today compared to pre-crisis AAA credit enhancement levels, which generally ranged between 3.50% – 4.00%. The rating agency assessment process has become more transparent since the crisis, and credit enhancement levels have increased. The future performance of these transactions will determine whether these changes are sufficient.

Representation and Warranty Framework and Enforcement

Representations and warranties are designed to allocate risks associated with a securitization’s underlying loans between issuers and investors. Basic principles of an effective process for allocating risks associated with underwriting standards, collateral value, or regulatory compliance include:

  • Clear rules (i.e., representations and warranties) defining when loans must be repurchased out of the security
  • Transparent and robust methods for identifying loans that may cause losses
  • Financial stability of the entity responsible for funding required loan repurchases

One criticism of the pre-crisis PLS market was the lack of an independent party tasked with identifying rep and warrant breaches. In many cases, the issuers or sponsors themselves were the only transaction parties capable of conducting the type of forensic loan review necessary to discover breaches. However, because these very parties would be on the hook to fund any repurchases required by their analyses, investors had reason to question the thoroughness of these reviews.

In response, the post-crisis PLS market has generally adapted a rules-based approach that relies on delinquency and other objective “triggers” to review loans and identify potential representation and warranty breaches. Once triggered, reviews are often performed by either 1) an independent third-party with forensic review capabilities, or 2) the holder of the most subordinate outstanding security. Reviews are no longer performed or controlled by issuers whose incentive to identify a breach could be questioned.

These process improvements are meant to increase the likelihood that potential representation and warranty breaches are identified and their terms enforced. If a loan meets the contractual requirements for a repurchase, it is critical that the entity responsible for repurchasing it has the financial ability to do so. New SEC disclosure requirements (Rule 15-Ga1) help track and assess an issuer’s ability to comply with repurchase requests.

Changes in the representation and warranty framework have improved methods for breach identification, evaluation, and enforcement. These changes have increased transparency, clarified the allocation of risk, contractually established roles for identifying and evaluating potential breaches, and brought about more effective enforcement mechanisms.

Loan Quality Standards

The Dodd-Frank Act requires lenders to make a good faith effort to determine borrowers’ ability to repay (ATR) their mortgage obligations. The ATR rule seeks to discourage some of the practices used to originate pre-crisis mortgage loans and requires lenders to consider certain underwriting criteria, such as the borrower’s assets or income, debt load, and credit history, to determine whether a loan can be repaid.

Lenders are presumed to comply with the ATR rule when they originate a “qualified mortgage” (QM) which meets the requirements of the ATR rule and additional underwriting and pricing standards. These requirements generally include a limit on points and fees, along with various restrictions on loan terms and features.2

Risk Retention

The risk retention requirements added by Section 15G of the Securities Exchange Act of 1934 generally require the issuer of securities backed by non-QM loans to retain at least 5 percent of the credit risk of the mortgage loans collateralizing the securities. This rule change helps align the interests of issuers and sponsors with those of investors by requiring issuers and sponsors to retain an economic interest in the credit risk of the assets they securitize. The rule allows issuers and sponsors to retain risk as either a horizontal interest (i.e., retaining the most subordinate 5% of the securitization), a vertical interest (i.e., retaining a “slice” of each security issued), an “L-shaped” interest (i.e., a combination of horizontal and vertical), or a cash reserve account.

For most non-QM securitizations, the issuers and sponsors have migrated towards the vertical interest, which performs like whole loan exposure and avoids the comprehensive fair value disclosures required for retained horizontal interests. At the margin, this change will create “skin in the game” for non-QM issuers and sponsors and better align their incentives with those of investors.

Bondholder Communication

To address concerns expressed by investors in locating other investors to enforce contractual rights, recent private-label RMBS transactions have incorporated mechanisms for investors to communicate with each other. Many transactions have incorporated methods for investors who wish to communicate to be included in a transaction registry, which may allow them to reach the required percentage of security holders necessary to provide specific direction to the trustee.

Summary

The PLS market has experienced a decade of stagnation since the financial crisis of 2007 – 2008. Notwithstanding new entrants to this market, a persistent lack of investor trust in and demand for private-label RMBS remains a challenge. While opportunities for improvement remain, major improvements to the securitization process are beginning to take hold.  These changes in post-crisis private-label RMBS transactions improve transparency, align the incentives of issuers and sponsors with those of investors, and hold the key to attracting investors back to this once-thriving market segment.


[1] Include loans with original term less than 20 years.

[2] Unpermitted features include negative amortization, interest-only payments, loan terms of more than 30 years, and “back-end” debt-to-income ratios above 43%. (The back-end debt-to-income ratio limit does not apply to 1) loans guaranteed by the Federal Housing Administration and Veterans Administration, 2) loans eligible for purchase by Fannie Mae and Freddie Mac, and 3) portfolio loans made by “small creditors.”)


Machine Learning and Portfolio Performance Analysis

Attribution analysis of portfolios typically aims to discover the impact that a portfolio manager’s investment choices and strategies had on overall profitability. They can help determine whether success was the result of an educated choice or simply good luck. Usually a benchmark is chosen and the portfolio’s performance is assessed relative to it.

This post, however, considers the question of whether a non-referential assessment is possible. That is, can we deconstruct and assess a portfolio’s performance without employing a benchmark? Such an analysis would require access to historical return as well as the portfolio’s weights and perhaps the volatility of interest rates, if some of the components exhibit a dependence on them. This list of required variables is by no means exhaustive.

There are two prevalent approaches to attribution analysis—one based on factor models and the other on return decomposition. The factor model approach considers the equities in a portfolio at a single point in time and attributes performance to various macro- and micro-economic factors prevalent at that time. The effects of these factors are aggregated at the portfolio level and a qualitative assessment is done. Return decomposition, on the other hand, explores the manner in which positive portfolio returns are achieved across time. The principal drivers of performance are separated and further analyzed. In addition to a year’s worth of time series data for the variables listed in the previous paragraph, covariance, correlation, and cluster analyses and other mathematical methods would likely be required.

Normality Assumption

Is the normality assumption for stock returns fully justified? Are sample means and variances good proxies for population means and variances? This assumption is worth testing because Normality and the Central Limit Theorem are widely assumed when dealing with financial data. The Delta-Normal Value at Risk (VaR) method, which is widely used to compute portfolio VaR, assumes that stock returns and allied risk factors are normally distributed. Normality is also implicitly assumed in financial literature. Consider the distribution of S&P returns from May 1980 to May 2017 displayed in Figure 1.

Figure One: Distribution of S&P Returns

Panel (a) is a histogram of S&P daily returns from January 2001 to January 2017. The red curve is a Gaussian fit. Panel (b) shows the same data on a semi-log plot (logarithmic Y axis). The semi-log plot emphasizes the tail events.

The returns displayed in the left panel of figure 1 have a higher central peak and the “shoulders” are somewhat wider than what is predicted by the Gaussian fit. This mismatch in the tails is more visible in the semi-log plot shown in panel (b). This demonstrates that a normal distribution is probably not a very accurate assumption. Sigma, the standard deviation, is typically used as a measure of the relative magnitude of market moves and as a rough proxy for the occurrence of such events. The normal distribution places the odds of a minus-5 sigma swing at only 2.86×10-5 %. In other words, assuming 252 trading days per year, a drop of this magnitude should occur once in every 13,000 years! However, an examination of S&P returns over the 37-year period cited shows drops of 5 standard deviations or greater on 15 occasions. Assuming a normal distribution would consistently underestimate the occurrence of tail events.

We conducted a subsequent analysis focusing on the daily returns of SPY, a popular exchange-traded fund (ETF). This ETF tracks 503 component instruments. Using returns from July 01, 2016 through June 31, 2017, we tested each component instrument’s return vector for normality using the Chi-Square Test, the Kurtosis estimate, and a visual inspection of the Q-Q plot. Brief explanations of these methods are provided below.

Chi-Square Test

This is a goodness-of-fit test that assumes a specific data distribution (Null hypothesis) and then tests that assumption. The test evaluates the deviations of the model predictions (Normal distribution, in this instance) from empirical values. If the resulting computed test statistic is large, then the observed and expected values are not close and the model is deemed a poor fit to the data. Thus, the Null hypothesis assumption of a specific distribution is rejected.

Kurtosis

The kurtosis of any univariate standard-Normal distribution is 3. Any deviations from this value imply that the data distribution is correspondingly non-Normal. An example is illustrated in Figures 2, 3, and 4, below.

Q-Q Plot

Quantile-quantile (QQ) plots are graphs on which quantiles from two distributions are plotted relative to each other. If the distributions correspond, then the plot appears linear. This is a visual assessment rather than a quantitative estimation. A sample set of results is shown in Figures 2, 3, and 4, below.

Figure Two: Year’s Returns for Exxon

Figure 2. The left panel shows the histogram of a year’s returns for Exxon (XOM). The null hypothesis was rejected with the conclusion that the data is not normally distributed. The kurtosis was 6 which implies a deviation from normality. The Q-Q plot in the right panel reinforces these conclusions.

Figure Three: Year’s Returns for Boeing

Figure 3. The left panel shows the histogram of a year’s returns for Boeing (BA). The data is not normally distributed and shows a significant skewness also. The kurtosis was 12.83 and implies a significant deviation from normality. The Q-Q plot in the right panel confirms this.

For the sake of comparison, we also show returns that exhibit normality in the next figure.

Figure Four: Year’s Returns for Xerox

The left panel shows the histogram of a year’s returns for Xerox (XRX). The data is normally distributed, which is apparent from a visual inspection of both panels. The kurtosis was 3.23 which is very close to the value for a theoretical normal distribution.

Machine learning literature has several suggestions for addressing this problem, including Kernel Density Estimation and Mixture Density Networks. If the data exhibits multi-modal behavior, learning a multi-modal mixture model is a possible approach.

Stationarity Assumption

In addition to normality, we also make untested assumptions regarding stationarity. This critical assumption is implicit when computing covariances and correlations. We also tend to overlook insufficient sample sizes. As observed earlier, the SPY dataset we had at our disposal consisted of 503 instruments, with around 250 returns per instrument. The number of observations is much lower than the dimensionality of the data. This will produce a covariance matrix which is not full-rank and, consequently, its inverse will not exist. Singular covariance matrices are highly problematic when computing the risk-return efficiency loci in the analysis of portfolios. We tested the returns of all instruments for stationarity using the Augmented Dickey Fuller (ADF) test. Several return vectors were non-stationary. Non-stationarity and sample size issues can’t be wished away because the financial markets are fluid with new firms coming into existence and existing firms disappearing due bankruptcies or acquisitions. Consequently, limited financial histories will be encountered and must be dealt with.

This is a problem where machine learning can be profitably employed. Shrinkage methods, Latent factor models, Empirical Bayes estimators and Random matrix theory based models are widely published techniques that are applicable here.

Portfolio Performance Analysis

Once issues surrounding untested assumptions have addressed, we can focus on portfolio performance analysis–a subject with a vast collection of books and papers devoted to it. We limit our attention here to one aspect of portfolio performance analysis – an inquiry into the clustering behavior of stocks in a portfolio.

Books on portfolio theory devote substantial space to the discussion of asset diversification to achieve an optimum balance of risk and return. To properly diversify assets, we need to know if resources have been over-allocated to a specific sector and, consequently, under-allocated to others. Cluster analysis can help to answer this. A pertinent question is how to best measure the difference or similarity between stocks. One way would be to estimate correlations between stocks. This approach has its own weaknesses, some of which have been discussed in earlier sections. Even if we had a statistically significant set of observations, we are faced with the problem of changing correlations during the course of a year due to structural and regime shifts caused by intermittent periods of stress. Even in the absence of stress, correlations can break down or change due to factors that are endogenous to individual stocks.

We can estimate similarity and visualize clusters using histogram analysis. However, histograms eliminate temporal information. To overcome this constraint, we used Spectral Clustering, which is a machine learning technique that explores cluster formation without neglecting temporal information.

Figures 5 to 7 display preliminary results from our cluster analysis. Analyses like this will enable portfolio managers to realize clustering patterns and their strengths in their portfolios. They will also help guide decisions on reweighting portfolio components and diversification.

Figures 5-7: Cluster Analyses

Figure 5. Cluster analysis of a limited set of stocks is shown here. The labels indicate the names of the firms. Clusters are illustrated by various colored bullets, and increasing distances indicate decreasing similarities. Within clusters, stronger affinities are indicated by greater connecting line weights.

The following figures display magnified views of individual clusters.

Figure 6. We can see that Procter & Gamble, Kimberly Clark and Colgate Palmolive form a cluster (top left, dark green bullets). Likewise, Bank of America, Wells Fargo and Goldman Sachs form a cluster (top right, light green bullets). This is not surprising as these two clusters represent two sectors: consumer products and banking. Line weights are correlated to affinities within sectors.

Figure 7. The cluster on the left displays stocks in the technology sector, while the clusters on the right represent firms in the defense industry (top) and the energy sector (bottom).

In this post, we raised questions about standard assumptions that are made when analyzing portfolios. We also suggested possible solutions from machine learning literature. We subsequently analyzed one year’s worth of returns of SPY to identify clusters and their strengths and discussed the value of such an analysis to portfolio managers in evaluating risk and reweighting or diversifying their portfolios.


Mitigating EUC Risk Using Model Validation Principles

The challenge associated with simply gauging the risk associated with “end user computing” applications (EUCs)— let alone managing it—is both alarming and overwhelming. Scanning tools designed to detect EUCs can routinely turn up tens of thousands of potential files, even at not especially large financial institutions. Despite the risks inherent in using EUCs for mission-critical calculations, EUCs are prevalent in nearly any institution due to their ease of use and wide-ranging functionality.

This reality has spurred a growing number of operational risk managers to action. And even though EUCs, by definition, do not rise to the level of models, many of these managers are turning to their model risk departments for assistance. This is sensible in many cases because the skills associated with effectively validating a model translate well to reviewing an EUC for reasonableness and accuracy.  Certain model risk management tools can be tailored and scaled to manage burgeoning EUC inventories without breaking the bank.

Identifying an EUC

One risk of reviewing EUCs using personnel accustomed to validating models is the tendency of model validators to do more than is necessary. Subjecting an EUC to a full battery of effective challenges, conceptual soundness assessments, benchmarking, back-testing, and sensitivity analyses is not an efficient use of resources, nor is it typically necessary. To avoid this level of overkill, reviewers ought to be able to quickly recognize when they are looking an EUC and when they are looking at something else.

Sometimes the simplest definitions work best: an EUC is a spreadsheet.

While neither precise, comprehensive, nor 100 percent accurate, that definition is a reasonable approximation. Not every EUC is a spreadsheet (some are Access databases) but the overwhelming majority of EUCs we see are Excel files. And not every Excel file is an EUC—conference room schedules and other files in Excel that do not do any serious calculating do not pose EUC risk. Some Excel spreadsheets are models, of course, and if an EUC review discovers quantitative estimates in a spreadsheet used to compute forecasts, then analysts should be empowered to flag such applications for review and possible inclusion in the institution’s formal model inventory. Once the dust has settled, however, the final EUC inventory is likely to contain almost exclusively spreadsheets.

Building an EUC Inventory

EUCs are not models, but much of what goes into building a model inventory applies equally well to building an EUC inventory. Because the overwhelming majority of EUCs are Excel files, the search for latent EUCs typically begins with an automated search for files with .xls and .xlsx extensions. Many commercially available tools conduct these sorts of scans. The exercise typically returns an extensive list of files that must be sifted through.

Simple analytical tools, such as Excel’s “Inquire” add-in, are useful for identifying the number and types of unique calculations in a spreadsheet as well as a spreadsheet’s reliance on external data sources. Spreadsheets with no calculations can likely be excluded from further consideration from the EUC inventory. Likewise, spreadsheets with no data connections (i.e., links to or from other spreadsheets) are unlikely to qualify for the EUC inventory because such files do not typically have significant downstream impact. Spreadsheets with many tabs and hundreds of unique calculations are likely to qualify as EUCs (at least—if not as models) regardless of their specific use.

Most spreadsheets fall somewhere between these two extremes. In many cases, questioning the owners/users of identified spreadsheets is necessary to determine its use and help ascertain any potential institutional risks if the spreadsheet does not work as intended. When making inquiries of spreadsheet owners, open-ended questions may not always be as helpful as those designed to elicit a narrow band of responses. Instead of asking, “What is this spreadsheet used for?” A more effective request would be, “What other systems and files is this spreadsheet used to populate?”

Answers to these sorts of questions aid not only in determining whether a spreadsheet qualifies as an EUC but the risk-rating of the EUC as well.

Testing Requirements

For now, regulator interest in seeing that EUCs are adequately monitored and controlled appears to be outpacing any formal guidance on how to go about doing it.

Absent such guidance, many institutions have started approaching EUC testing like a limited-scope model validation. Effective reviews include a documentation review, a tie-out of input data to authorized, verified sources, an examination of formulas and coding, a form of benchmarking, and an overview of spreadsheet governance and controls.

Documentation Review

Not unlike a model, each EUC should be accompanied by documentation that explains its purpose and how it accomplishes what it intends to do. Documentation should describe the source of input data and what the EUC does with it. Sufficient information should be provided for a reasonably informed reviewer to re-create the EUC based solely on the documentation. If a reviewer must guess the purpose of any calculation, then the EUC’s documentation is likely deficient.

Input Review

The reviewer should be able to match input data in the EUC back to an authoritative source. This review can be performed manually; however, any automated lookups used to pull data in from other files should be thoroughly reviewed, as well.

Formula and Function Review

Each formula in the EUC should be independently reviewed to verify that it is consistent with its documented purposes. Reviewers do not need to test the functionality of Excel—e.g., they do not need to test arithmetic functions on a calculator—however, formulas and functions should be reviewed for reasonableness.

Benchmarking

A model validation benchmarking exercise generally consists of comparing the subject model’s forecasts with those of a challenger model designed to do the same thing, but perhaps in a different way. Benchmarking an EUC, in contrast, typically involves constructing an independent spreadsheet based on the EUC documentation and making sure it returns the same answers as the EUC.

Governance and Controls

An EUC should ideally be subjected to the same controls requirements as a model. Procedures designed to ensure process checks, access and change control management, output reconciliation, and tolerance levels should be adequately documented.

The extent to which these tools should be applied depends largely on how much risk an EUC poses. Properly classifying EUCs as high-, medium, or low-risk during the inventory process is critical to determining how much effort to invest in the review.

Other model validation elements, such as back-testing, stress testing, and sensitivity analysis, are typically not applicable to an EUC review. Because EUCs are not predictive by definition, these sorts of analyses are not likely to bring much value to an EUC review .

Striking an appropriate balance — leveraging effective model risk management principles without doing more than needs to be done — is the key to ensuring that EUCs are adequately accounted for, well controlled, and functioning properly without incurring unnecessary costs.


The Non-Agency MBS Market: Re-Assessing Securitization Market Conditions

Since the financial crisis began in 2007, the “Non-Agency” MBS market, i.e., securities neither issued nor guaranteed by Fannie Mae, Freddie Mac, or Ginnie Mae, has been sporadic and has not rebounded from pre-crisis levels. In recent months, however, activity by large financial institutions, such as AIG and Wells Fargo, has indicated a return to the issuance of Non-Agency MBS. What is contributing to the current state of the securitization market for high-quality mortgage loans? Does the recent, limited-scale return to issuance by these institutions signal an increase in private securitization activity in this sector of the securitization market? If so, what is sparking this renewed interest?

 

The MBS Securitization Market

Three entities – Ginnie Mae, Fannie Mae, and Freddie Mac – have been the dominant engine behind mortgage-backed securities (MBS) issuance since 2007. These entities, two of which remain in federal government conservatorship and the third a federal government corporation, have maintained the flow of capital from investors into guaranteed MBS and ensured that mortgage originators have adequate funds to originate certain types of single-family mortgage loans.

Virtually all mortgage loans backed by federal government insurance or guaranty programs, such as those offered by the Federal Housing Administration and the Department of Veterans Affairs, are issued in Ginnie Mae pools. Mortgage loans that are not eligible for these programs are referred to as “Conventional” mortgage loans. In the current market environment, most Conventional mortgage loans are sold to Fannie Mae and Freddie Mac (i.e. “Conforming” loans) and are securitized in Agency-guaranteed pass-through securities.

 

The Non-Agency MBS Market

Not all Conventional mortgage loans are eligible for purchase by Fannie Mae or Freddie Mac, however, due to collateral restrictions (i.e., their loan balances are too high or they do not meet certain underwriting requirements). These are referred to as “Non-Conforming” loans and, for most of the past decade, have been held in portfolio at large financial institutions, rather than placed in private, Non-Agency MBS. The Non-Agency MBS market is further divided into sectors for “Qualified Mortgage” (QM) loans, non-QM loans, re-performing loans and nonperforming loans. This post deals with the securitization of QM loans through Non-Agency MBS programs.

Since the crisis, Non-Agency MBS issuance has been the exclusive province of JP Morgan and Redwood Trust, both of which continue to issue a relatively small number of deals each year. The recent entry of AIG into the Non-Agency MBS market and, combined with Wells Fargo’s announcement that it intends to begin issuing as well, makes this a good time to discuss reasons why these institutions with other funding sources available to them are now moving back to this securitization market sector.

 

Considerations for Issuing QM Loans

Three potential considerations may lead financial institutions to investigate issuing QM Loans through Non-Agency MBS transactions:

  • “All-In” Economics
  • Portfolio Concentration or Limitations
  • Regulatory Pressures

Investigate “All-In” Economics

Over the long-term, mortgage originators gravitate to funding sources that provide the lowest cost to borrowers and profitability for their firms.  To improve the “all-in” economics of a Non-Agency MBS transaction, investment banks work closely with issuers to broaden the investor base for each level of the securitization capital structure.  Partly due to the success of the Fannie Mae and Freddie Mac Credit Risk Transfer transactions, there appears to be significant interest in higher-yielding mortgage-related securities at the lower-rated (i.e. higher risk) end of the securitization capital structure. This need for higher yielding assets has also increased demand for lower-rated securities in the Non-Agency MBS sector.

However, demand from investors at the higher-rated end of the securitization capital structure (i.e. ‘AAA’ and ‘AA’ securities) has not resulted in “all-in” economics for a Non-Agency MBS transaction that surpass the economics of balance sheet financing provided by portfolios funded with low deposit rates or low debt costs. If deposit rates and debt costs remain at historically low levels, the portfolio funding alternative will remain attractive. Notwithstanding the low interest rate environment, some institutions may develop operational capabilities for Non-Agency MBS programs as a risk mitigation process for future periods where balance sheet financing alternatives may not be as beneficial.

 

Portfolio Concentration or Limitations

Due to the lack of robust investor demand and unfavorable economics in Non-Agency MBS, many banks have increased their portfolio exposure to both fixed-rate and intermediate-adjustable-rate QM loans. The ability to hold these mortgage loans in portfolio has provided attractive pricing to a key customer demographic and earned an attractive net interest rate margin during the historical low-rate environment. While bank portfolios have provided an attractive funding source for Non-Agency QM loans, some financial institutions may attempt to develop diversified funding sources in response to regulatory pressure or self-imposed portfolio concentration limits. Selling existing mortgage portfolio assets into the Non-Agency MBS securitization market is one way in which financial institutions might choose to reduce concentrated mortgage risk exposure.

 

Regulatory Pressure

Some financial institutions may be under pressure from their regulators to demonstrate their ability to sell assets out of their mortgage portfolio as a contingency plan. The Non-Agency MBS market is one way of complying with these sorts of regulatory requests. Developing a contingency ability to tap Non-Agency MBS markets develops operational capabilities under less critical circumstances, while assessing the time needed by the institution to liquidate such assets through securitization. This early establishment of securitization functionalities is a prudent activity for those institutions who foresee the possibility of securitization as a future funding option.

While the Non-Agency MBS market has been dormant for most of the past decade, some financial institutions that have relied upon portfolio funding now appear to be testing the current viability of the Non-Agency MBS market. Other mortgage originators would be wise to take notice of these events, monitor activity in these markets, and assess the viability of this alternative funding source for their on-Conforming QM Loans. With the continued issuance by JP Morgan and Redwood Trust and new entrants such as AIG and Wells Fargo, -Non-Agency MBS market activity should be monitored by other mortgage originators to determine whether securitization has the potential to provide an alternative funding source for future lending activity.

In our next article on the Non-Agency MBS market, we will review the changes in due diligence practices, loan-level data disclosures, the representation and warranty framework, and the ratings process made by securitization market participants and the impact of these changes on the Non-Agency MBS market segment.


Get Started
Log in

Linkedin   

risktech2024