Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Category: Article

What is an “S-Curve” and Does it Matter if it Varies by Servicer?

Mortgage analysts refer to graphs plotting prepayment rates against the interest rate incentive for refinancing as “S-curves” because the resulting curve typically (vaguely) resembles an “S.” The curve takes this shape because prepayment rates vary positively with refinance incentive, but not linearly. Very few borrowers refinance without an interest rate incentive for doing so. Consequently, on the left-hand side of the graph, where the refinance incentive is negative or out of the money, prepayment speeds are both low and fairly flat. This is because a borrower with a rate 1.0% lower than market rates is not very much more likely to refinance than a borrower with a rate 1.5% lower. They are both roughly equally unlikely to do so.

As the refinance incentive crosses over into the money (i.e., when prevailing interest rates fall below rates the borrowers are currently paying), the prepayment rate spikes upward, as a significant number of borrowers take advantage of the opportunity to refinance. But this spike is short-lived. Once the refinance incentive gets above 1.0% or so, prepayment rates begin to flatten out again. This reflects a segment of borrowers that do not refinance even when they have an interest rate incentive to do so. Some of these borrowers have credit or other issues preventing them from refinancing. Others are simply disinclined to go through the trouble. In either case, the growing refinance incentive has little impact and the prepayment rate flattens out.

These two bends—moving from non-incentivized borrowers to incentivized borrowers and then from incentivized borrowers to borrowers who can’t or choose not to refinance—are what gives the S-curve its distinctive shape.

Figure 1: S-Curve Example

An S-Curve Example – Servicer Effects

Interestingly, the shape of a deal’s S-curve tends to vary depending on who is servicing the deal. Many things contribute to this difference, including how actively servicers market refinance opportunities. How important is it to be able to evaluate and analyze the S-curves for the servicers specific to a given deal? It depends, but it could be imperative.

In this example, we’ll analyze a subset of the collateral (“Group 4”) supporting a recently issued Fannie Mae deal, FNR 2017-11. This collateral consists of four Fannie multi-issuer pools of recently originated jumbo-conforming loans with a current weighted average coupon (WAC) of 3.575% and a weighted average maturity (WAM) of 348 months. The table below shows the breakout of the top six servicers in these four pools based on the combined balance.

Figure 2: Breakout of Top Six Servicers

Over half (54%) of the Group 4 collateral is serviced by these six servicers. To begin the analysis, we pulled all jumbo-conforming, 30-year loans originated between 2015 and 2017 for the six servicers and bucketed them based on their refi incentive. A longer timeframe is used to ensure that there are sufficient observations at each point. The graph below shows the prepayment rate relative to the refi incentive for each of the servicers as well as the universe.

Figure 3: S-curve by Servicer

For loans that are at the money—i.e., the point at which the S-curve would be expected to begin spiking upward—only those serviced by IMPAC prepay materially faster than the entire cohort. However, as the refi incentive increases, IMPAC, Seneca Mortgage, and New American Funding all experience a sharp pick-up in speeds while loans serviced by Pingora, Lakeview, and Wells behave comparable to the market.

The last step is to compute the weighted average S-curve for the top six servicers using the current UPB percentages as the weights, shown in Figure 4 below. On the basis of the individual servicer observations, prepays for out-of-the-money loans should mirror the universe, but as loans become more re-financeable, speeds should accelerate faster than the universe. The difference between the six-servicer average and the universe reaches a peak of approximately 4% CPR between 50 bps and 100 bps in the money. This is valuable information for framing expectations for future prepayment rates. Analysts can calibrate prepayment models (or their outputs) to account for observed differences in CPRs that may be attributable to the servicer, rather than loan characteristics.

Figure 4: Weighted Average vs. Universe

This analysis was generated using RiskSpan’s data and analytics platform, RS Edge.


Validating Interest Rate Models

Many model validations—particularly validations of market risk models, ALM models, and mortgage servicing rights valuation models—require validators to evaluate an array of sub-models. These almost always include at least one interest rate model, which are designed to predict the movement of interest rates.

Validating interest rate models (i.e. short-rate models) can be challenging because many different ways of modeling how interest rates change over time (“interest rate dynamics”) have been created over the years. Each approach has advantages and shortcomings, and it is critical to distinguish the limitations and advantages of each of them  to understand whether the short-rate model being used is appropriate to the task. This can be accomplished via the basic tenets of model validation—evaluation of conceptual soundness, replication, benchmarking, and outcomes analysis. Applying these concepts to interest rate models, however, poses some unique complications.

A brief Introduction to the Short-Rate Model

In general, a short-rate model solves the short-rate evolution as a stochastic differential equation. Short-rate models can be categorized based on their interest rate dynamics.

A one-factor short-rate model has only one diffusion term. The biggest limitation of one-factor models is that the correlation between two continuously-compound spot rates at two dates is equal to one, which means a shock at a certain maturity is transmitted thoroughly across the curve that is not realistic in the market.

A multi-factor short-rate model, as its name implies, contains more than one diffusion term. Unlike one-factor models, multi-factor models consider the correlation between forward rates, which makes a multi-factor model more realistic and consistent with actual multi-dimension yield curve movements.

Validating Conceptual Soundness

Validating an interest rate model’s conceptual soundness includes reviewing its data inputs, mean-reversion feature, distributions of short rate, and model selection. Reviewing these items sufficiently requires a validator to possess a basic knowledge of stochastic calculus and stochastic differential equations.

Data Inputs

The fundamental data inputs to the interest rate model could be the zero-coupon curve (also known as term structure of interest rates) or the historical spot rates. Let’s take the Hull-White (H-W) one-factor model (H-W: drt = k(θ – rt)dt + σtdwt) as an example. H-W is an affine term structure model, of which analytical tractability is one of its most favorable properties. Analytical tractability is a valuable feature to model validators because it enables calculations to be replicated. We can calibrate the level parameter (θ) and the rate parameter (k) from the inputs curve. Commonly, the volatility parameter (σt) can be calibrated from historical data or swaptions volatilities. In addition, the analytical formulas are also available for zero-coupon bonds, caps/floors, and European swaptions.

Mean Reversion

Given the nature of mean reversion, both the level parameter and rate parameter should be positive. Therefore, an appropriate calibration method should be selected accordingly. Note the common approaches for the one-factor model—least square estimation and maximum likelihood estimation—could generate negative results, which are unacceptable by the mean-reversion feature. The model validator should compare different calibration results from different methods to see which method is the best approach for addressing the model assumption.

Short-Rate Distribution and Model Selection

The distribution of the short rate is another feature that we need to consider when we validate the short-rate model assumptions. The original short-rate models—Vasicek and H-W, for example—presume the short rate to be normally distributed, allowing for the possibility of negative rates. Because negative rates were not expected to be seen in the simulated term structures, the Cox-Ingersoll-Ross model (CIR, non-central chi-squared distributed) and Black-Karasinski model (BK, lognormal distributed) were invented to preclude the existence of negative rates. Compared to the normally distributed models, the non-normally distributed models forfeit a certain degree of analytical tractability, which makes validating them less straightforward. In recent years, as negative rates became a reality in the market, the shifted lognormal distributed model was introduced. This model is dependent on the shift size, determining a lower limit in the simulation process. Note there is no analytical formula for the shift size. Ideally, the shift size should be equal to the absolute value of the minimum negative rate in the historical data. However, not every country experienced negative interest rates, and therefore, the shift size is generally determined by the user’s experience by means of fundamental analysis.

The model validator should develop a method to quantify the risk from any analytical judgement. Because the interest rate model often serves as a sub-model in a larger module, the model selection should also be commensurate with the module’s ultimate objectives.

Replication

Effective model validation frequently relies on a replication exercise to determine whether a model follows the building procedures stated in its documentation. In general, the model documentation provides the estimation method and assorted data inputs. The model validator could consider recalibrating the parameters from the provided interest rate curve and volatility structures. This process helps the model validator better understand the model, its limitations, and potential problems.

Ongoing Monitoring & Benchmarking

Interest rate models are generally used to simulate term structures in order to price caps/floors and swaptions and measure the hedge cost. Let’s again take the H-W model as an example. Two standard simulation methods are available for the H-W model: 1) Monte Carlo simulation and 2) trinomial lattice method. The model validator could use these two methods to perform benchmarking analysis against one another.

The Monte Carlo simulation works ideally for the path-dependent interest rate derivatives. The Monte Carlo method is mathematically easy to understand and convenient for implementation. At each time step, a random variable is simulated and added into the interest rate dynamics. A Monte Carlo simulation is usually considered for products that can only be exercised at maturity. Since the Monte Carlo method simulates the future rates, we cannot be sure at which time the rate or the value of an option becomes optimal. Hence, a standard Monte Carlo approach cannot be used for derivatives with early-exercise capability.

On the other hand, we can price early-exercise products by means of the trinomial lattice method. The trinomial lattice method constructs a trinomial tree under the risk-neutral measure, in which the value at each node can be computed. Given the tree’s backward-looking feature, at each node we can compare the intrinsic value (current value) with the backwardly inducted value (continuous value), determining whether to exercise at a given node. The comparison step will keep running backwardly until it reaches the initial node and returns the final estimated value. Therefore, trinomial lattice works ideally for non-path-dependent interest rate derivatives. Nevertheless, lattice can be also implemented for path-dependent derivatives for the purpose of benchmarking.

Normally, we would expect to see that the simulated result from the lattice method is less accurate and more volatile than the result from the Monte Carlo simulation method, because a larger number of simulated paths can be selected in the Monte Carlo method. This will make the simulated result more stable, assuming the same computing cost and the same time step.

Outcomes Analysis

The most straightforward method for outcomes analysis is to perform sensitivity tests on the model’s key drivers. A standardized one-factor short-rate model usually contains three parameters. For the level parameter (θ), we can calibrate the equilibrium rate-level from the simulated term structure and compare with θ. For the mean-reversion speed parameter (k), we can examine the half-life, which equals to ln ⁡(2)/k , and compare with the realized half-life from simulated term structure. For the volatility parameter (σt), we would expect to see the larger volatility yields a larger spread in the simulated term structure. We can also recalibrate the volatility surface from the simulated term structure to examine if the number of simulated paths is sufficient to capture the volatility assumption.

As mentioned above, an affine term structure model is analytically tractable, which means we can use the analytical formula to price zero-coupon bonds and other interest rate derivatives. We can compare the model results with the market prices, which can also verify the functionality of the given short-rate model.

Conclusion

The popularity of certain types of interest rate models changes as fast as the economy. In order to keep up, it is important to build a wide range of knowledge and continue learning new perspectives. Validation processes that follow the guidelines set forth in the OCC’s and FRB’s Supervisory Guidance on Model Risk Management (OCC 2011-12 and SR 11-7) seek to answer questions about the model’s conceptual soundness, development, process, implementation, and outcomes.  While the details of the actual validation process vary from bank to bank and from model to model, an interest rate model validation should seek to address these matters by asking the following questions:

  • Are the data inputs consistent with the assumptions of the given short-rate model?
  • What distribution does the interest rate dynamics imply for the short-rate model?
  • What kind of estimation method is applied in the model?
  • Is the model analytically tractable? Are there explicit analytical formulas for zero-coupon bond or bond-option from the model?
  • Is the model suitable for the Monte Carlo simulation or the lattice method?
  • Can we recalibrate the model parameters from the simulated term structures?
  • Does the model address the needs of its users?

These are the fundamental questions that we need to think about when we are trying to validate any interest rate model. Combining these with additional questions specific to the individual rate dynamics in use will yield a robust validation analysis that will satisfy both internal and regulatory demands.


AML Model Validation: Effective Process Verification Requires Thorough Documentation

Increasing regulatory scrutiny due to the catastrophic risk associated with anti-money-laundering (AML) non-compliance is prompting many banks to tighten up their approach to AML model validation. Because AML applications would be better classified as highly specialized, complex systems of algorithms and business rules than as “models,” applying model validation techniques to them presents some unique challenges that make documentation especially important.

In addition to devising effective challenges to determine the “conceptual soundness” of an AML system and whether its approach is defensible, validators must determine the extent to which various rules are firing precisely as designed. Rather than commenting on the general reasonableness of outputs based on back-testing and sensitivity analysis, validators must rely more heavily on a form of process verification that requires precise documentation.

Vendor Documentation of Transaction Monitoring Systems

Above-the-line and below-the-line testing—the backbone of most AML transaction monitoring testing—amounts to a process verification/replication exercise. For any model replication exercise to return meaningful results, the underlying model must be meticulously documented. If not, validators are left to guess at how to fill in the blanks. For some models, guessing can be an effective workaround. But it seldom works well when it comes to a transaction monitoring system and its underlying rules. Absent documentation that describes exactly what rules are supposed to do, and when they are supposed to fire, effective replication becomes nearly impossible.

Anyone who has validated an AML transaction monitoring system knows that they come with a truckload of documentation. Vendor documentation is often quite thorough and does a reasonable job of laying out the solution’s approach to assessing transaction data and generating alerts. Vendor documentation typically explains how relevant transactions are identified, what suspicious activity each rule is seeking to detect, and (usually) a reasonably detailed description of the algorithms and logic each rule applies.

This information provided by the vendor is valuable and critical to a validator’s ability to understand how the solution is intended to work. But because so much more is going on than what can reasonably be captured in vendor documentation, it alone provides insufficient information to devise above-the-line and below-the-line testing that will yield worthwhile results.

Why An AML Solution’s Vendor Documentation is Not Enough

Every model validator knows that model owners must supplement vendor-supplied documentation with their own. This is especially true with AML solutions, in which individual user settings—thresholds, triggers, look-back periods, white lists, and learning algorithms—are arguably more crucial to the solution’s overall performance than the rules themselves.

Comprehensive model owner documentation helps validators (and regulatory supervisors) understand not only that AML rules designed to flag suspicious activity are firing correctly, but also that each rule is sufficiently understood by those who use the solution. It also provides the basis for a validator’s testing that rules are calibrated reasonably. Testing these calibrations is analogous to validating the inputs and assumptions of a predictive model. If they are not explicitly spelled out, then they cannot be evaluated.

Here are some examples.

Transaction Input Transformations

Details about how transaction data streams are mapped, transformed, and integrated into the AML system’s database vary by institution and cannot reasonably be described in generic vendor documentation. Consequently, owner documentation needs to fully describe this. To pass model validation muster, the documentation should also describe the review process for input data and field mapping, along with all steps taken to correct inaccuracies or inconsistencies as they are discovered.

Mapping and importing AML transaction data is sometimes an inexact science. To mitigate risks associated with missing fields and customer attributes, risk-based parameters must be established and adequately documented. This documentation enables validators who test the import function to go into the analysis with both eyes open. Validators must be able to understand the circumstances under which proxy data is used in order to make sound judgments about the reasonableness and effectiveness of established proxy parameters and how well they are being adhered to. Ideally, documentation pertaining to transaction input transformation should describe the data validations that are performed and define any error messages that the system might generate.

Risk Scoring Methodologies and Related Monitoring

Specific methodologies used to risk score customers and countries and assign them to various lists (e.g., white, gray, or black lists) also vary enough by institution that vendor documentation cannot be expected to capture them. Processes and standards employed in creating and maintaining these lists must be documented. This documentation should include how customers and countries get on these lists to begin with, how frequently they are monitored once they are on a list, what form that monitoring takes, the circumstances under which they can move between lists, and how these circumstances are ascertained. These details are often known and usually coded (to some degree) in BSA department procedures. This is not sufficient. They should be incorporated in the AML solution’s model documentation and include data sources and a log capturing the history of customers and countries moving to and from the various risk ratings and lists.

Output Overrides

Management overrides are more prevalent with AML solutions than with most models. This is by design. AML solutions are intended to flag suspicious transactions for review, not to make a final judgment about them. That job is left to BSA department analysts. Too often, important metrics about the work of these analysts are not used to their full potential. Regular analysis of these overrides should be performed and documented so that validators can evaluate AML system performance and the justification underlying any tuning decisions based on the frequency and types of overrides.

Successful AML model validations require rule replication, and incompletely documented rules simply cannot be replicated. Transaction monitoring is a complicated, data-intensive process, and getting everything down on paper can be daunting, but AML “model” owners can take stock of where they stand by asking themselves the following questions:

  1. Are my transaction monitoring rules documented thoroughly enough for a qualified third-party validator to replicate them? (Have I included all systematic overrides, such as white lists and learning algorithms?)
  2. Does my documentation give a comprehensive description of how each scenario is intended to work?
  3. Are thresholds adequately defined?
  4. Are the data and parameters required for flagging suspicious transactions described well enough to be replicated?

If the answer to all these questions is yes, then AML solution owners can move into the model validation process reasonably confident that the state of their documentation will not be a hindrance to the AML model validation process.


Machine Learning and Portfolio Performance Analysis

Attribution analysis of portfolios typically aims to discover the impact that a portfolio manager’s investment choices and strategies had on overall profitability. They can help determine whether success was the result of an educated choice or simply good luck. Usually a benchmark is chosen and the portfolio’s performance is assessed relative to it.

This post, however, considers the question of whether a non-referential assessment is possible. That is, can we deconstruct and assess a portfolio’s performance without employing a benchmark? Such an analysis would require access to historical return as well as the portfolio’s weights and perhaps the volatility of interest rates, if some of the components exhibit a dependence on them. This list of required variables is by no means exhaustive.

There are two prevalent approaches to attribution analysis—one based on factor models and the other on return decomposition. The factor model approach considers the equities in a portfolio at a single point in time and attributes performance to various macro- and micro-economic factors prevalent at that time. The effects of these factors are aggregated at the portfolio level and a qualitative assessment is done. Return decomposition, on the other hand, explores the manner in which positive portfolio returns are achieved across time. The principal drivers of performance are separated and further analyzed. In addition to a year’s worth of time series data for the variables listed in the previous paragraph, covariance, correlation, and cluster analyses and other mathematical methods would likely be required.

Normality Assumption

Is the normality assumption for stock returns fully justified? Are sample means and variances good proxies for population means and variances? This assumption is worth testing because Normality and the Central Limit Theorem are widely assumed when dealing with financial data. The Delta-Normal Value at Risk (VaR) method, which is widely used to compute portfolio VaR, assumes that stock returns and allied risk factors are normally distributed. Normality is also implicitly assumed in financial literature. Consider the distribution of S&P returns from May 1980 to May 2017 displayed in Figure 1.

Figure One: Distribution of S&P Returns

Panel (a) is a histogram of S&P daily returns from January 2001 to January 2017. The red curve is a Gaussian fit. Panel (b) shows the same data on a semi-log plot (logarithmic Y axis). The semi-log plot emphasizes the tail events.

The returns displayed in the left panel of figure 1 have a higher central peak and the “shoulders” are somewhat wider than what is predicted by the Gaussian fit. This mismatch in the tails is more visible in the semi-log plot shown in panel (b). This demonstrates that a normal distribution is probably not a very accurate assumption. Sigma, the standard deviation, is typically used as a measure of the relative magnitude of market moves and as a rough proxy for the occurrence of such events. The normal distribution places the odds of a minus-5 sigma swing at only 2.86×10-5 %. In other words, assuming 252 trading days per year, a drop of this magnitude should occur once in every 13,000 years! However, an examination of S&P returns over the 37-year period cited shows drops of 5 standard deviations or greater on 15 occasions. Assuming a normal distribution would consistently underestimate the occurrence of tail events.

We conducted a subsequent analysis focusing on the daily returns of SPY, a popular exchange-traded fund (ETF). This ETF tracks 503 component instruments. Using returns from July 01, 2016 through June 31, 2017, we tested each component instrument’s return vector for normality using the Chi-Square Test, the Kurtosis estimate, and a visual inspection of the Q-Q plot. Brief explanations of these methods are provided below.

Chi-Square Test

This is a goodness-of-fit test that assumes a specific data distribution (Null hypothesis) and then tests that assumption. The test evaluates the deviations of the model predictions (Normal distribution, in this instance) from empirical values. If the resulting computed test statistic is large, then the observed and expected values are not close and the model is deemed a poor fit to the data. Thus, the Null hypothesis assumption of a specific distribution is rejected.

Kurtosis

The kurtosis of any univariate standard-Normal distribution is 3. Any deviations from this value imply that the data distribution is correspondingly non-Normal. An example is illustrated in Figures 2, 3, and 4, below.

Q-Q Plot

Quantile-quantile (QQ) plots are graphs on which quantiles from two distributions are plotted relative to each other. If the distributions correspond, then the plot appears linear. This is a visual assessment rather than a quantitative estimation. A sample set of results is shown in Figures 2, 3, and 4, below.

Figure Two: Year’s Returns for Exxon

Figure 2. The left panel shows the histogram of a year’s returns for Exxon (XOM). The null hypothesis was rejected with the conclusion that the data is not normally distributed. The kurtosis was 6 which implies a deviation from normality. The Q-Q plot in the right panel reinforces these conclusions.

Figure Three: Year’s Returns for Boeing

Figure 3. The left panel shows the histogram of a year’s returns for Boeing (BA). The data is not normally distributed and shows a significant skewness also. The kurtosis was 12.83 and implies a significant deviation from normality. The Q-Q plot in the right panel confirms this.

For the sake of comparison, we also show returns that exhibit normality in the next figure.

Figure Four: Year’s Returns for Xerox

The left panel shows the histogram of a year’s returns for Xerox (XRX). The data is normally distributed, which is apparent from a visual inspection of both panels. The kurtosis was 3.23 which is very close to the value for a theoretical normal distribution.

Machine learning literature has several suggestions for addressing this problem, including Kernel Density Estimation and Mixture Density Networks. If the data exhibits multi-modal behavior, learning a multi-modal mixture model is a possible approach.

Stationarity Assumption

In addition to normality, we also make untested assumptions regarding stationarity. This critical assumption is implicit when computing covariances and correlations. We also tend to overlook insufficient sample sizes. As observed earlier, the SPY dataset we had at our disposal consisted of 503 instruments, with around 250 returns per instrument. The number of observations is much lower than the dimensionality of the data. This will produce a covariance matrix which is not full-rank and, consequently, its inverse will not exist. Singular covariance matrices are highly problematic when computing the risk-return efficiency loci in the analysis of portfolios. We tested the returns of all instruments for stationarity using the Augmented Dickey Fuller (ADF) test. Several return vectors were non-stationary. Non-stationarity and sample size issues can’t be wished away because the financial markets are fluid with new firms coming into existence and existing firms disappearing due bankruptcies or acquisitions. Consequently, limited financial histories will be encountered and must be dealt with.

This is a problem where machine learning can be profitably employed. Shrinkage methods, Latent factor models, Empirical Bayes estimators and Random matrix theory based models are widely published techniques that are applicable here.

Portfolio Performance Analysis

Once issues surrounding untested assumptions have addressed, we can focus on portfolio performance analysis–a subject with a vast collection of books and papers devoted to it. We limit our attention here to one aspect of portfolio performance analysis – an inquiry into the clustering behavior of stocks in a portfolio.

Books on portfolio theory devote substantial space to the discussion of asset diversification to achieve an optimum balance of risk and return. To properly diversify assets, we need to know if resources have been over-allocated to a specific sector and, consequently, under-allocated to others. Cluster analysis can help to answer this. A pertinent question is how to best measure the difference or similarity between stocks. One way would be to estimate correlations between stocks. This approach has its own weaknesses, some of which have been discussed in earlier sections. Even if we had a statistically significant set of observations, we are faced with the problem of changing correlations during the course of a year due to structural and regime shifts caused by intermittent periods of stress. Even in the absence of stress, correlations can break down or change due to factors that are endogenous to individual stocks.

We can estimate similarity and visualize clusters using histogram analysis. However, histograms eliminate temporal information. To overcome this constraint, we used Spectral Clustering, which is a machine learning technique that explores cluster formation without neglecting temporal information.

Figures 5 to 7 display preliminary results from our cluster analysis. Analyses like this will enable portfolio managers to realize clustering patterns and their strengths in their portfolios. They will also help guide decisions on reweighting portfolio components and diversification.

Figures 5-7: Cluster Analyses

Figure 5. Cluster analysis of a limited set of stocks is shown here. The labels indicate the names of the firms. Clusters are illustrated by various colored bullets, and increasing distances indicate decreasing similarities. Within clusters, stronger affinities are indicated by greater connecting line weights.

The following figures display magnified views of individual clusters.

Figure 6. We can see that Procter & Gamble, Kimberly Clark and Colgate Palmolive form a cluster (top left, dark green bullets). Likewise, Bank of America, Wells Fargo and Goldman Sachs form a cluster (top right, light green bullets). This is not surprising as these two clusters represent two sectors: consumer products and banking. Line weights are correlated to affinities within sectors.

Figure 7. The cluster on the left displays stocks in the technology sector, while the clusters on the right represent firms in the defense industry (top) and the energy sector (bottom).

In this post, we raised questions about standard assumptions that are made when analyzing portfolios. We also suggested possible solutions from machine learning literature. We subsequently analyzed one year’s worth of returns of SPY to identify clusters and their strengths and discussed the value of such an analysis to portfolio managers in evaluating risk and reweighting or diversifying their portfolios.


Mitigating EUC Risk Using Model Validation Principles

The challenge associated with simply gauging the risk associated with “end user computing” applications (EUCs)— let alone managing it—is both alarming and overwhelming. Scanning tools designed to detect EUCs can routinely turn up tens of thousands of potential files, even at not especially large financial institutions. Despite the risks inherent in using EUCs for mission-critical calculations, EUCs are prevalent in nearly any institution due to their ease of use and wide-ranging functionality.

This reality has spurred a growing number of operational risk managers to action. And even though EUCs, by definition, do not rise to the level of models, many of these managers are turning to their model risk departments for assistance. This is sensible in many cases because the skills associated with effectively validating a model translate well to reviewing an EUC for reasonableness and accuracy.  Certain model risk management tools can be tailored and scaled to manage burgeoning EUC inventories without breaking the bank.

Identifying an EUC

One risk of reviewing EUCs using personnel accustomed to validating models is the tendency of model validators to do more than is necessary. Subjecting an EUC to a full battery of effective challenges, conceptual soundness assessments, benchmarking, back-testing, and sensitivity analyses is not an efficient use of resources, nor is it typically necessary. To avoid this level of overkill, reviewers ought to be able to quickly recognize when they are looking an EUC and when they are looking at something else.

Sometimes the simplest definitions work best: an EUC is a spreadsheet.

While neither precise, comprehensive, nor 100 percent accurate, that definition is a reasonable approximation. Not every EUC is a spreadsheet (some are Access databases) but the overwhelming majority of EUCs we see are Excel files. And not every Excel file is an EUC—conference room schedules and other files in Excel that do not do any serious calculating do not pose EUC risk. Some Excel spreadsheets are models, of course, and if an EUC review discovers quantitative estimates in a spreadsheet used to compute forecasts, then analysts should be empowered to flag such applications for review and possible inclusion in the institution’s formal model inventory. Once the dust has settled, however, the final EUC inventory is likely to contain almost exclusively spreadsheets.

Building an EUC Inventory

EUCs are not models, but much of what goes into building a model inventory applies equally well to building an EUC inventory. Because the overwhelming majority of EUCs are Excel files, the search for latent EUCs typically begins with an automated search for files with .xls and .xlsx extensions. Many commercially available tools conduct these sorts of scans. The exercise typically returns an extensive list of files that must be sifted through.

Simple analytical tools, such as Excel’s “Inquire” add-in, are useful for identifying the number and types of unique calculations in a spreadsheet as well as a spreadsheet’s reliance on external data sources. Spreadsheets with no calculations can likely be excluded from further consideration from the EUC inventory. Likewise, spreadsheets with no data connections (i.e., links to or from other spreadsheets) are unlikely to qualify for the EUC inventory because such files do not typically have significant downstream impact. Spreadsheets with many tabs and hundreds of unique calculations are likely to qualify as EUCs (at least—if not as models) regardless of their specific use.

Most spreadsheets fall somewhere between these two extremes. In many cases, questioning the owners/users of identified spreadsheets is necessary to determine its use and help ascertain any potential institutional risks if the spreadsheet does not work as intended. When making inquiries of spreadsheet owners, open-ended questions may not always be as helpful as those designed to elicit a narrow band of responses. Instead of asking, “What is this spreadsheet used for?” A more effective request would be, “What other systems and files is this spreadsheet used to populate?”

Answers to these sorts of questions aid not only in determining whether a spreadsheet qualifies as an EUC but the risk-rating of the EUC as well.

Testing Requirements

For now, regulator interest in seeing that EUCs are adequately monitored and controlled appears to be outpacing any formal guidance on how to go about doing it.

Absent such guidance, many institutions have started approaching EUC testing like a limited-scope model validation. Effective reviews include a documentation review, a tie-out of input data to authorized, verified sources, an examination of formulas and coding, a form of benchmarking, and an overview of spreadsheet governance and controls.

Documentation Review

Not unlike a model, each EUC should be accompanied by documentation that explains its purpose and how it accomplishes what it intends to do. Documentation should describe the source of input data and what the EUC does with it. Sufficient information should be provided for a reasonably informed reviewer to re-create the EUC based solely on the documentation. If a reviewer must guess the purpose of any calculation, then the EUC’s documentation is likely deficient.

Input Review

The reviewer should be able to match input data in the EUC back to an authoritative source. This review can be performed manually; however, any automated lookups used to pull data in from other files should be thoroughly reviewed, as well.

Formula and Function Review

Each formula in the EUC should be independently reviewed to verify that it is consistent with its documented purposes. Reviewers do not need to test the functionality of Excel—e.g., they do not need to test arithmetic functions on a calculator—however, formulas and functions should be reviewed for reasonableness.

Benchmarking

A model validation benchmarking exercise generally consists of comparing the subject model’s forecasts with those of a challenger model designed to do the same thing, but perhaps in a different way. Benchmarking an EUC, in contrast, typically involves constructing an independent spreadsheet based on the EUC documentation and making sure it returns the same answers as the EUC.

Governance and Controls

An EUC should ideally be subjected to the same controls requirements as a model. Procedures designed to ensure process checks, access and change control management, output reconciliation, and tolerance levels should be adequately documented.

The extent to which these tools should be applied depends largely on how much risk an EUC poses. Properly classifying EUCs as high-, medium, or low-risk during the inventory process is critical to determining how much effort to invest in the review.

Other model validation elements, such as back-testing, stress testing, and sensitivity analysis, are typically not applicable to an EUC review. Because EUCs are not predictive by definition, these sorts of analyses are not likely to bring much value to an EUC review .

Striking an appropriate balance — leveraging effective model risk management principles without doing more than needs to be done — is the key to ensuring that EUCs are adequately accounted for, well controlled, and functioning properly without incurring unnecessary costs.


The Non-Agency MBS Market: Re-Assessing Securitization Market Conditions

Since the financial crisis began in 2007, the “Non-Agency” MBS market, i.e., securities neither issued nor guaranteed by Fannie Mae, Freddie Mac, or Ginnie Mae, has been sporadic and has not rebounded from pre-crisis levels. In recent months, however, activity by large financial institutions, such as AIG and Wells Fargo, has indicated a return to the issuance of Non-Agency MBS. What is contributing to the current state of the securitization market for high-quality mortgage loans? Does the recent, limited-scale return to issuance by these institutions signal an increase in private securitization activity in this sector of the securitization market? If so, what is sparking this renewed interest?

 

The MBS Securitization Market

Three entities – Ginnie Mae, Fannie Mae, and Freddie Mac – have been the dominant engine behind mortgage-backed securities (MBS) issuance since 2007. These entities, two of which remain in federal government conservatorship and the third a federal government corporation, have maintained the flow of capital from investors into guaranteed MBS and ensured that mortgage originators have adequate funds to originate certain types of single-family mortgage loans.

Virtually all mortgage loans backed by federal government insurance or guaranty programs, such as those offered by the Federal Housing Administration and the Department of Veterans Affairs, are issued in Ginnie Mae pools. Mortgage loans that are not eligible for these programs are referred to as “Conventional” mortgage loans. In the current market environment, most Conventional mortgage loans are sold to Fannie Mae and Freddie Mac (i.e. “Conforming” loans) and are securitized in Agency-guaranteed pass-through securities.

 

The Non-Agency MBS Market

Not all Conventional mortgage loans are eligible for purchase by Fannie Mae or Freddie Mac, however, due to collateral restrictions (i.e., their loan balances are too high or they do not meet certain underwriting requirements). These are referred to as “Non-Conforming” loans and, for most of the past decade, have been held in portfolio at large financial institutions, rather than placed in private, Non-Agency MBS. The Non-Agency MBS market is further divided into sectors for “Qualified Mortgage” (QM) loans, non-QM loans, re-performing loans and nonperforming loans. This post deals with the securitization of QM loans through Non-Agency MBS programs.

Since the crisis, Non-Agency MBS issuance has been the exclusive province of JP Morgan and Redwood Trust, both of which continue to issue a relatively small number of deals each year. The recent entry of AIG into the Non-Agency MBS market and, combined with Wells Fargo’s announcement that it intends to begin issuing as well, makes this a good time to discuss reasons why these institutions with other funding sources available to them are now moving back to this securitization market sector.

 

Considerations for Issuing QM Loans

Three potential considerations may lead financial institutions to investigate issuing QM Loans through Non-Agency MBS transactions:

  • “All-In” Economics
  • Portfolio Concentration or Limitations
  • Regulatory Pressures

Investigate “All-In” Economics

Over the long-term, mortgage originators gravitate to funding sources that provide the lowest cost to borrowers and profitability for their firms.  To improve the “all-in” economics of a Non-Agency MBS transaction, investment banks work closely with issuers to broaden the investor base for each level of the securitization capital structure.  Partly due to the success of the Fannie Mae and Freddie Mac Credit Risk Transfer transactions, there appears to be significant interest in higher-yielding mortgage-related securities at the lower-rated (i.e. higher risk) end of the securitization capital structure. This need for higher yielding assets has also increased demand for lower-rated securities in the Non-Agency MBS sector.

However, demand from investors at the higher-rated end of the securitization capital structure (i.e. ‘AAA’ and ‘AA’ securities) has not resulted in “all-in” economics for a Non-Agency MBS transaction that surpass the economics of balance sheet financing provided by portfolios funded with low deposit rates or low debt costs. If deposit rates and debt costs remain at historically low levels, the portfolio funding alternative will remain attractive. Notwithstanding the low interest rate environment, some institutions may develop operational capabilities for Non-Agency MBS programs as a risk mitigation process for future periods where balance sheet financing alternatives may not be as beneficial.

 

Portfolio Concentration or Limitations

Due to the lack of robust investor demand and unfavorable economics in Non-Agency MBS, many banks have increased their portfolio exposure to both fixed-rate and intermediate-adjustable-rate QM loans. The ability to hold these mortgage loans in portfolio has provided attractive pricing to a key customer demographic and earned an attractive net interest rate margin during the historical low-rate environment. While bank portfolios have provided an attractive funding source for Non-Agency QM loans, some financial institutions may attempt to develop diversified funding sources in response to regulatory pressure or self-imposed portfolio concentration limits. Selling existing mortgage portfolio assets into the Non-Agency MBS securitization market is one way in which financial institutions might choose to reduce concentrated mortgage risk exposure.

 

Regulatory Pressure

Some financial institutions may be under pressure from their regulators to demonstrate their ability to sell assets out of their mortgage portfolio as a contingency plan. The Non-Agency MBS market is one way of complying with these sorts of regulatory requests. Developing a contingency ability to tap Non-Agency MBS markets develops operational capabilities under less critical circumstances, while assessing the time needed by the institution to liquidate such assets through securitization. This early establishment of securitization functionalities is a prudent activity for those institutions who foresee the possibility of securitization as a future funding option.

While the Non-Agency MBS market has been dormant for most of the past decade, some financial institutions that have relied upon portfolio funding now appear to be testing the current viability of the Non-Agency MBS market. Other mortgage originators would be wise to take notice of these events, monitor activity in these markets, and assess the viability of this alternative funding source for their on-Conforming QM Loans. With the continued issuance by JP Morgan and Redwood Trust and new entrants such as AIG and Wells Fargo, -Non-Agency MBS market activity should be monitored by other mortgage originators to determine whether securitization has the potential to provide an alternative funding source for future lending activity.

In our next article on the Non-Agency MBS market, we will review the changes in due diligence practices, loan-level data disclosures, the representation and warranty framework, and the ratings process made by securitization market participants and the impact of these changes on the Non-Agency MBS market segment.


Open Source Governance: Three Potential Risks

For many companies, the question is no longer whether to use open-source tools, but rather how to implement them with the appropriate governance and controls. Have security concerns been accounted for?  How does one effectively institute controls over bad code?  Are there legal implications for using open-source software?

Open Source Security Risks

Open-source software is not inherently more or less prone to malicious code injections than proprietary software. It is true that anyone can push a code enhancement for a new version, and it may be possible for the senior contributors to miss intentional malware. However, in these circumstances, open source has an advantage over proprietary, coined in 1999 by Eric S. Raymond as Linus’s Law: “Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.” It is unlikely that a deliberate security error goes unnoticed by the many pairs of eyes on each release. However, security issues persist.

Open-Source Security – An Example

Debian, a Unix-like computer operating system, was one of the first to be based on the Linux kernel. Like many systems, it utilizes OpenSSL, a software library that provides an open-source implementation of the Secure Sockets Layer (SSL) protocol, commonly used by applications that require secure communications over a network.

In 2006, a snippet of code was removed from Debian’s OpenSSL package after one of the contributors found that it caused runtime warnings generated by other packages. After the removal, the pseudorandom number generator (PRNG) generated SSL keys using only the process ID (in Linux, a number up to 32,768) to the exclusion of all other random data. Since a relatively small number of values was used, the keys created over a period of almost two years were too predictable to be used securely. Users became aware of the issue 20 months after the bug was introduced, leading to costly security resolutions for companies and individuals who relied on Debian’s OpenSSL implementation.[1]

OpenSSL was again the subject of negative attention when a bug dubbed ‘Heartbleed’ was introduced to the code in 2012 and disclosed to the public in 2014. A fixed version of OpenSSL was released on the same day the issue was announced. More than a month after the release, however, 1.5% of the 800,000 most popular affected websites were still vulnerable to the security bug. [2]

The good news is that such vulnerabilities are documented in the Common Vulnerabilities and Exposures (CVE) system, and they are not so common. For Python 2.7, the popular version released in 2010, 15 vulnerabilities were recorded from 2010 to 2016, only one of which is considered ‘High’ severity, with a CVSS score of 7.5.  jQuery, a JavaScript library that simplifies some components of web application development and the most common open-source component identified in the latest Open Source Security and Risk Analysis (OSSRA) report, only has four known vulnerabilities from 2007 to 2017, none of which rank higher than a ‘Medium’. The CVE is just one tool available for improving the security profile of software applications, but technologists must remain vigilant and abreast of known issues. Corporate IT governance frameworks should be continuously updated to keep up with the changing structure of the underlying technology itself.

Bad Code

Serious security vulnerabilities may not be a daily occurrence, but bad code can affect software at any time. pandas, a popular open-source software library used in Python implementations for data manipulation and analysis, was first released in 2009. Since then, its contributors have identified over 10,000 issues, 1,933 of which are currently considered unresolved.[3] A company that relies on accurate output from a codebase that uses pandas needs to be vigilant not only in testing the code written by its in-house developers, but also in verifying that all outstanding known pandas issues are covered by workarounds and the rest of the functionality is sound. Developers and testers who are not intimately familiar with the pandas source code must devise creative testing tools to ensure complete integrity of applications that rely on it.

Bad Code – An Example

The Comma-Separated Values (CSV) file is one of many data formats that can be loaded for data manipulation and analysis by pandas, in this case using the built-in read_csv function.  read_csv has a number of associated helper attributes intended to simplify the data import, one of which is parse_dates, which, as the name implies, tells pandas to automatically parse dates in the data using a recognition algorithm to determine the format in each date-populated column.

However, if a row of data contains a blank value where a date is expected, pandas may populate that field with today’s date — a bug first formalized in version 0.9 in 2012 [4] (closed three days after it was opened) and again in 2014.[5] The issue was not closed until the end of 2016, when one of the contributors noted that the tests passed for version 0.19, stating that he was “not sure when this was fixed, but it doesn’t seem like it occurred recently. [6]

In the meantime, pandas versions prior to 0.19 may have resulted in incorrect date-related parameters if blank fields were fed to the system. For example, a mortgage-backed security may have had an incorrect calculated weighted average loan age if some of its loans had blank first payment dates, causing these rows to have a loan age of zero.

In addition to implementing security testing, IT controls must include a clear framework for testing both in-house and open-source components of all applications, especially high-impact programs.

Open-Source Licensing

Finally, it is important to be aware of open-source licensing constraints and to maintain active licensing governance activities to avoid legal issues in the future. Similar to the copyright concept, some open source creators have adopted the concept of ‘copyleft’ to ensure that “anyone who redistributes the software, with or without changes, must pass along the freedom to further copy and change it. [7]  This means that, legally, for any software that contains a copylefted open source component, whether it comprises 99% or 0.1% of the application code, the entire source code must be distributed with the software or be made available upon request. This is not an issue when the software is distributed internally among corporate users, but it can become more problematic when the company intends to sell or otherwise provide the software without revealing the internally developed codebase. Not all open-source software is copylefted – in fact, many popular licenses are highly permissive with very few restrictions. Below is a summary of the four most popular open-source licenses. [8]

Of the four, only the GNU General Public License (GPL, all versions) requires the creators to disclose the source code.  Between 20% and 25% of all open-source software is covered by the GNU GPL.

OSSRA found that 75% of applications contained at least some components under the GPL family of licenses, and that only 45% of those applications complied with the GPL copyleft obligations. Overall, the Financial Services and FinTech industries maintained 89% of all applications with at least one licensing conflict.

Most open-source software, even that which is licensed under the GNU GPL, can be used commercially. For example, a company can use and internally distribute a financial model written in R, an open-source programming language licensed under the GNU GPL 2.0. However, important legal consequences must be considered if the developed code will be later distributed outside of the company as a proprietary application. If the organization were to sell the R-based model, the entire source code would have to be made available to the paying user, who would also be free to distribute the code, for free or at a price. Alternatively, a model implemented in Python, which is licensed under a Berkeley Software Distribution (BSD)-like agreement, could be distributed without exposing the source code.

Open-Source Governance and Controls

Governance risks are specific to how open-source tools are integrated into existing operations. These risks can stem from a lack of formal training, lack of service and support, violations of third-party intellectual property rights, or instability and incompatibility with existing operating environments. Successful users of open-source code and tools devise effective means of identifying and measuring these risks. They ensure that these risks are included in process risk assessments to facilitate identification and mitigation of potential control weaknesses. Security vulnerabilities, code issues, and software licensing should not deter developers from using the plethora of useful open-source tools. Open-source issues and bugs are viewed and tested by thousands of capable developers, increasing the likelihood of a speedy resolution. In addition, a company’s own development team has full access to the source code, making it possible to fix issues without relying on anyone else. As with any application, effective governance and controls are essential to a successful open-source application. These ensure that software is used securely and appropriately and that a comprehensive testing framework is applied to minimize inaccuracies. The world of open source is changing constantly –we all just need to keep up.

WANT TO LEARN MORE?


Advantages and Disadvantages of Open Source Data Modeling Tools

Using open source data modeling tools has been a topic of debate as large organizations, including government agencies and financial institutions, are under increasing pressure to keep up with technological innovation to maintain competitiveness. Organizations must be flexible in development and identify cost-efficient gains to reach their organizational goals, and using the right tools is crucial. Organizations must often choose between open source software, i.e., software whose source code can be modified by anyone, and closed software, i.e., proprietary software with no permissions to alter or distribute the underlying code.

Mature institutions often have employees, systems, and proprietary models entrenched in closed source platforms. For example, SAS Analytics is a popular provider of proprietary data analysis and statistical software for enterprise data operations among financial institutions. But several core computations SAS performs can also be carried out using open source data modeling tools, such as Python and R. The data wrangling and statistical calculations are often fungible and, given the proper resources, will yield the same result across platforms.

Open source is not always a viable replacement for proprietary software, however. Factors such as cost, security, control, and flexibility must all be taken into consideration. The challenge for institutions is picking the right mix of platforms to streamline software development.  This involves weighing benefits and drawbacks.

Advantages of Open Source Programs

The Cost of Open Source Software

The low cost of open source software is an obvious advantage. Compared to the upfront cost of purchasing a proprietary software license, using open source programs seems like a no-brainer. Open source programs can be distributed freely (with some possible restrictions to copyrighted work), resulting in virtually no direct costs. However, indirect costs can be difficult to quantify. Downloading open source programs and installing the necessary packages is easy and adopting this process can expedite development and lower costs. On the other hand, a proprietary software license may bundle setup and maintenance fees for the operational capacity of daily use, the support needed to solve unexpected issues, and a guarantee of full implementation of the promised capabilities. Enterprise applications, while accompanied by a high price tag, provide ongoing and in-depth support of their products. The comparable cost of managing and servicing open source programs that often have no dedicated support is difficult to determine.

Open Source Talent Considerations

Another advantage of open source is that it attracts talent who are drawn to the idea of sharable and communitive code. Students and developers outside of large institutions are more likely to have experience with open source applications since access is widespread and easily available. Open source developers are free to experiment and innovate, gain experience, and create value outside of the conventional industry focus. This flexibility naturally leads to more broadly skilled inter-disciplinarians. The chart below from Indeed’s Job Trend Analytics tool reflects strong growth in open source talent, especially Python developers.

From an organizational perspective, the pool of potential applicants with relevant programming experience widens significantly compared to the limited pool of developers with closed source experience. For example, one may be hard-pressed to find a new applicant with development experience in SAS since comparatively few have had the ability to work with the application. Key-person dependencies become increasingly problematic as the talent or knowledge of the proprietary software erodes down to a shrinking handful of developers.

Job Seekers Interests via Indeed

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

Support and Collaboration

The collaborative nature of open source facilitates learning and adapting to new programming languages. While open source programs are usually not accompanied by the extensive documentation and user guides typical of proprietary software, the constant peer review from the contributions of other developers can be more valuable than a user guide. In this regard, adopters of open source may have the talent to learn, experiment with, and become knowledgeable in the software without formal training.

Still, the lack of support can pose a challenge. In some cases, the documentation accompanying open source packages and the paucity of usage examples in forums do not offer a full picture. For example, RiskSpan built a model in R that was driven by the available packages for data infrastructure – a precursor to performing statistical analysis – and their functionality. R does not have an active support solutions line and the probability of receiving a response from the author of the package is highly unlikely. This required RiskSpan to thoroughly vet packages.

Flexibility and Innovation

Another attractive feature of open source is its inherent flexibility. Python allows users to use different integrated development environments (IDEs) that have multiple different characteristics or functions, as compared to SAS Analytics, which only provides SAS EG or Base SAS. R makes possible web-based interfaces for server-based deployments. These functionalities grant more access to users at a lower cost. Thus, there can be more firm-wide development and participation in development. The ability to change the underlying structure of open source makes it possible to mold it per the organization’s goals and improve efficiency.

Another advantage of open source is the sheer number of developers trying to improve the software by creating many functionalities not found in their closed source equivalent. For example, R and Python can usually perform many functions like those available in SAS, but also have many capabilities not found in SAS: downloading specific packages for industry specific tasks, scraping the internet for data, or web development (Python). These specialized packages are built by programmers seeking to address the inefficiencies of common problems. A proprietary software vendor does not have the expertise nor the incentive to build equivalent specialized packages since their product aims to be broad enough to suit uses across multiple industries.

RiskSpan uses open source data modeling tools and operating systems for data management, modeling, and enterprise applications. R and Python have proven to be particularly cost effective in modeling. R provides several packages that serve specialized techniques. These include an archive of packages devoted to estimating the statistical relationship among variables using an array of techniques, which cuts down on development time. The ease of searching for these packages, downloading them, and researching their use incurs nearly no cost.

Open source makes it possible for RiskSpan to expand on the tools available in the financial services space. For example, a leading cash flow analytics software firm that offers several proprietary solutions in modeling structured finance transactions lacks the full functionality RiskSpan was seeking.  Seeking to reduce licensing fees and gain flexibility in structuring deals, RiskSpan developed deal cashflow programs in Python for STACR, CAS, CIRT, and other consumer lending deals. The flexibility of Python allowed us to choose our own formatted cashflows and build different functionalities into the software. Python, unlike closed source applications, allowed us to focus on innovating ways to interact with the cash flow waterfall.

Disadvantages of Open Source Programs

Deploying open source solutions also carries intrinsic challenges. While users may have a conceptual understanding of the task at hand, knowing which tools yield correct results, whether derived from open or closed source, is another dimension to consider. Different parameters may be set as default, new limitations may arise during development, or code structures may be entirely different. Different challenges may arise from translating a closed source program to an open source platform. Introducing open source requires new controls, requirements, and development methods.

Redundant code is an issue that might arise if a firm does not strategically use open source. Across different departments, functionally equivalent tools may be derived from distinct packages or code libraries. There are several packages offering the ability to run a linear regression, for example. However, there may be nuanced differences in the initial setup or syntax of the function that can propagate problems down the line. In addition to the redundant code, users must be wary of “forking” where the development community splits on an open source application. For example, R develops multiple packages performing the same task/calculations, sometimes derived from the same code base, but users must be cognizant that the package is not abandoned by developers.

Users must also take care to track the changes and evolution of open source programs. The core calculations of commonly used functions or those specific to regular tasks can change. Maintaining a working understanding of these functions in the face of continual modification is crucial to ensure consistent output. Open source documentation is frequently lacking. In financial services, this can be problematic when seeking to demonstrate a clear audit trail for regulators. Tracking that the right function is being sourced from a specific package or repository of authored functions, as opposed to another function, which may have an identical name, sets up blocks on unfettered usage of these functions within code. Proprietary software, on the other hand, provides a static set of tools, which allows analysts to more easily determine how legacy code has worked over time.

Using Open Source Data Modeling Tools

Deciding on whether to go with open source programs directly impacts financial services firms as they compete to deliver applications to the market. Open source data modeling tools are attractive because of their natural tendency to spur innovation, ingrain adaptability, and propagate flexibility throughout a firm. But proprietary software solutions are also attractive because they provide the support and hard-line uses that may neatly fit within an organization’s goals. The considerations offered here should be weighed appropriately when deciding between open source and proprietary data modeling tools.

Questions to consider before switching platforms include:

  • How does one quantify the management and service costs for using open source programs? Who would work on servicing it, and, once all-in expenses are considered, is it still more cost-effective than a vendor solution?
  • When might it be prudent to move away from proprietary software? In a scenario where moving to a newer open source technology appears to yield significant efficiency gains, when would it make sense to end terms with a vendor?
  • Does the institution have the resources to institute new controls, requirements, and development methods when introducing open source applications?
  • Does the open source application or function have the necessary documentation required for regulatory and audit purposes?

Open source is certainly on the rise as more professionals enter the space with the necessary technical skills and a new perspective on the goals financial institutions want to pursue. As competitive pressures mount, financial institutions are faced with a difficult yet critical decision of whether open source is appropriate for them. Open source may not be a viable solution for everyone—the considerations discussed above may block the adoption of open source for some organizations. However, often the pros outweigh the cons, and there are strategic precautions that can be taken to mitigate any potential risks.


References

 https://www.redhat.com/en/open-source/open-source-way

http://www.stackoverflow.blog/code-for-a-living/how-i-open-sourced-my-way-to-my-dream-job-mohamed-said

https://www.redhat.com/f/pdf/whitepapers/WHITEpapr2.pdf

http://www.forbes.com/sites/benkepes/2013/10/02/open-source-is-good-and-all-but-proprietary-is-still-winning/#7d4d544059e9

https://www.indeed.com/jobtrends/q-SAS-q-R-q-python.html


Open Source Software for Mortgage Data Analysis

While open source has been around for decades, using open source software for mortgage data analysis is a recent trend. Financial institutions have traditionally been slow to adopt the latest data and technology innovations due to the strict regulatory and risk-averse nature of the industry, and open source has been no exception. As open source becomes more mainstream, however, many of our clients have come to us with questions regarding its viability within the mortgage industry.

The short answer is simple: open source has a lot of potential for the financial services and mortgage industries, particularly for data modeling and data analysis. Within our own organization, we frequently use open source data modeling tools for our proprietary models as well as models built for clients. While a degree of risk is inherent, prudent steps can be taken to mitigate them and profit from the many worthwhile benefits of open source.

Open source has a lot of potential for the mortgage industry, particularly for data modeling & analysis @RiskSpan (Click to Tweet)

To address the common concerns that arise with open source, we’ll be publishing a series of blog posts aimed at alleviating these concerns and providing guidelines for utilizing open source software for data analysis within your organization. Some of the questions we’ll address include:

  • Can open source programming languages be applied to mortgage data modeling and data analysis?
  • What risks does open source expose me to and what can I do to mitigate them?
  • What are the pitfalls of open source and do the benefits outweigh them?
  • How does using open source software for mortgage data analysis affect the control and governance of my models?
  • What factors do I need to consider when deciding whether to use open source at my institution?

Throughout the series, we’ll also include examples of how RiskSpan has used open source software for mortgage data analysis, why we chose to use it, and what factors were considered. Before we dive in on the considerations for open source, we thought it would be helpful to offer an introduction to open source and provide some context around its birth and development within the financial industry.

What Is Open Source Software?

Software has conventionally been considered open source when the original code is made publicly available so that anyone can edit, enhance, or modify it freely. This original concept has recently been expanded to incorporate a larger movement built on values of collaboration, transparency, and community.

Open Source Software Vs Proprietary Software

Proprietary software refers to applications for which the source code is only accessible to those who created it. Thus, only the original author(s) has control over any updates or modifications. Outside players are barred from even viewing the code to protect the owners from copying and theft. To use proprietary software, users agree to a licensing agreement and typically pay a fee. The agreement legally binds the user to the owners’ terms and prevents the user from any actions the owners have not expressly permitted.

Open source software, on the other hand, gives any user free rein to view, copy, or modify it. The idea is to foster a community built on collaboration, allowing users to learn from each other and build on each other’s work. Like with proprietary software, open source users must still agree to a licensing agreement, but the terms are very differ significantly from those of a proprietary license.1

History of Open Source Software

The idea of open source software first developed in the 1950s, when much of software development was done by computer scientists in higher education. In line with the value of sharing knowledge among academics, source code was openly accessible. By the 1960s, however, as the cost of software development increased, hardware companies were charging additional fees for software that used to be bundled with their products.

Change came again in the 1980s. At this point, it was clear that technology and software were important factors of the growing business economy. Technology leaders were frustrated with the increasing costs of software. In 1984, Richard Stallman launched the GNU Project with the purpose of creating a complete computer operating system with no limitations on the use of its source code. In 1991, the operating system now referred to as Linux was released.

The final tipping point came in 1997, when Eric Raymond published his book, The Cathedral and the Bazaar, in which he articulated the underlying principles behind open source software. His book was a driving factor in Netscape’s decision to release its source code to the public, inspired by the idea that allowing more people to find and fix bugs will improve the system for everyone. Following Netscape’s release, the term “open source software” was introduced in 1998.

In the data-driven economy of the past two decades, open source has played an ever-increasing role. The field of software development has evolved to embrace the values of open source. Open source has made it not only possible but easy for anyone to access and manipulate source code, improving our ability to create and share valuable software.2

Adoption of Open Source Software in Business

The growing relevance of open source software has also changed the way large organizations approach their software solutions. While open source software was at one point rare in an enterprise’s system, it’s now the norm. A survey conducted by Black Duck Software revealed that fewer than 3% of companies don’t rely on open source at all. Even the most conservative organizations are hopping on board the open source trend.3
Even the most conservative organizations are hopping on board the open source trend.

In a blog post from June 2016, TechCrunch writes:

“Open software has already rooted itself deep within today’s Fortune 500, with many contributing back to the projects they adopt. We’re not just talking stalwarts like Google and Facebook; big companies like Walmart, GE, Merck, Goldman Sachs — even the federal government — are fleeing the safety of established tech vendors for the promises of greater control and capability with open software. These are real customers with real budgets demanding a new model of software.”4

The expected benefits of open source software are alluring all types of institutions, from small businesses, to technology giants, to governments. This shift away from proprietary software in favor of open source is streamlining business operations. As more companies make the switch, those who don’t will fall behind the times and likely be at a serious competitive disadvantage.

Open Source Software for Mortgage Data Analysis

Open source software is slowly finding its way into the financial services industry as well. We’ve observed that smaller entities that don’t have the budgets to buy expensive proprietary software have been turning to open source as a viable substitute. Smaller companies are either building software in house or turning to companies like RiskSpan to achieve a cost-effective solution. On the other hand, bigger companies with the resources to spare are also dabbling in open source. These companies have the technical expertise in house and give their skilled workers the freedom to experiment with open source software.

Within our own work, we see tremendous potential for open source software for mortgage data analysis. Open source data modeling tools like Python, R, and Julia are useful for analyzing mortgage loan and securitization data and identifying historical trends. We’ve used R to build models for our clients and we’re not the only ones: several of our clients are now building their DFAST challenger models using R.

Open source has grown enough in the past few years that more and more financial institutions will make the switch. While the risks associated with open source software will continue to give some organizations pause, the benefits of open source will soon outweigh those concerns. It seems open source is a trend that is here to stay, and luckily, it is a trend ripe with opportunity.


[1] https://opensource.com/resources/what-open-source

[2] https://www.longsight.com/learning-center/history-open-source

[3] https://techcrunch.com/2016/06/19/the-next-wave-in-software-is-open-adoption-software/

[4] https://techcrunch.com/2016/06/19/the-next-wave-in-software-is-open-adoption-software/


Balancing Internal and External Model Validation Resources

The question of “build versus buy” is every bit as applicable and challenging to model validation departments as it is to other areas of a financial institution. With no “one-size-fits-all” solution, banks are frequently faced with a balancing act between the use of internal and external model validation resources. This article is a guide for deciding between staffing a fully independent internal model validation department, outsourcing the entire operation, or a combination of the two.

Striking the appropriate balance is a function of at least five factors:

  1. control and independence
  2. hiring constraints
  3. cost
  4. financial risk
  5. external (regulatory, market, and other) considerations

Control and Independence

Internal validations bring a measure of control to the operation. Institutions understand the specific skill sets of their internal validation team beyond their resumes and can select the proper team for the needs of each model. Control also extends to the final report, its contents, and how findings are described and rated.

Further, the outcome and quality of internal validations may be more reliable. Because a bank must present and defend validation work to its regulators, low-quality work submitted by an external validator may need to be redone by yet another external validator, often on short notice, in order to bring the initial external model validation up to spec.

Elements of control, however, must sometimes be sacrificed in order to achieve independence. Institutions must be able to prove that the validator’s interests are independent from the model validation outcomes. While larger banks frequently have large, freestanding internal model validation departments whose organizational independence is clear and distinct, quantitative experts at smaller institutions must often wear multiple hats by necessity.

Ultimately the question of balancing control and independence can only be suitably addressed by determining whether internal personnel qualified to perform model validations are capable of operating without any stake in the outcome (and persuading examiners that this is, in fact, the case).

Hiring Constraints

Practically speaking, hiring constraints represent a major consideration. Hiring limitations may result from budgetary or other less obvious factors. Organizational limits aside, it is not always possible to hire employees with a needed skill set at a workable salary range at the time when they are needed. For smaller banks with limited bandwidth or larger banks that need to further expand, external model validation resources may be sought out of sheer necessity.

Cost

Cost is an important factor that can be tricky to quantify. Model validators tend to be highly specialized; many typically work on one type of model, for example, Basel models. If your bank is large enough and has enough Basel models to keep a Basel model validator busy with internal model validations all year, then it may be cost effective to have a Basel model validator on staff. But if your Basel model validator is only busy for six months of the year, then a full-time Basel validator is only efficient if you have other projects that are suited to that validator’s experience and cost. To complicate things further, an employee’s cost is typically housed in one department, making it difficult from a budget perspective to balance an employee’s time and cost across departments.

If we were building a cost model to determine how many internal validators we should hire, the input variables would include:

  1. the number of models in our inventory
  2. the skills required to validate each model
  3. the risk classification of each model (i.e., how often validations must be completed)
  4. the average fully loaded salary expense for a model validator with those specific skills

Only by comparing the cost of external validations to the year-round costs associated with hiring personnel with the specialized knowledge required to validate a given type of model (e.g., credit models, market risk models, operational risk models, ALM models, Basel models, and BSA/AML models) can a bank arrive at a true apples-to-apples comparison.

Financial Risk

While cost is the upfront expense of internal or external model validations, financial risk accounts for the probability of unforeseen circumstances. Assume that your bank is staffed with internal validators and your team of internal validators can handle the schedule of model validations (validation projects are equally spaced throughout the year). However, operations may need to deploy a new version of a model or a new model on a schedule that requires a validation at a previously unscheduled time with no flexibility. In this case, your bank may need to perform an external validation in addition to managing and paying a fully-staffed team of internal validators.

A cost model for determining whether to hire additional internal validators should include a factor for the probability that models will need to be validated off-schedule, resulting in unforeseen external validation costs. On the other hand, a cost model might also consider the probability that an external validator’s product will be inferior and incur costs associated with required remediation.

External Risks

External risks are typically financial risks caused by regulatory, market, and other factors outside an institution’s direct control. The risk of a changing regulatory environment under a new presidential administration is always real and uncertainty clearly abounds as market participants (and others) attempt to predict President Trump’s priorities. Changes may include exemptions for regional banks from certain Dodd-Frank requirements; the administration has clearly signaled its intent to loosen regulations generally. Even though model validation will always be a best practice, these possibilities may influence a bank’s decision to staff an internal model validation team.

Recent regulatory trends can also influence validator hiring decisions. For example, our work with various banks over the past 12-18 months has revealed that regulators are trending toward requiring larger sample sizes for benchmarking and back-testing. Given the significant effort already associated with these activities, larger sample sizes could ultimately lower the number of model validations internal resources can complete per year. Funding external validations may become more expensive, as well.

Another industry trend is the growing acceptance of limited-scope validations. If only minimal model changes have occurred since a prior validation, the scope of a scheduled validation may be limited to the impact of these changes. If remediation activities were required by a prior validation, the scope may be limited to confirming that these changes were effectively implemented. This seemingly common-sense approach to model validations by regulators is a welcome trend and could reduce the number of internal and external validations required.

Joint Validations

In addition to reduced-scope validations, some of our clients have sought to reduce costs by combining internal and external resources. This enables institutions to limit hiring to validators without model-specific or highly quantitative skills. Such internal validators can typically validate a large number of lower-risk, less technical models independently.

For higher-risk, more technical models, such as ALM models, the internal validator may review the controls and documentation sufficiently, leaving the more technical portions of the validation—conceptual soundness, process verification, benchmarking, and back-testing, for example—to an external validator with specific expertise. In these cases, reports are produced jointly with internal and external validators each contributing the sections pertaining to procedures that they performed.

The resulting report often has the dual benefit of being more economical than a report generated externally and more defensible than one that relies solely on internal resources who may lack the specific domain knowledge necessary.

Conclusion

Model risk managers have limited time, resources, and budgets and face unending pressure from management and regulators. Striking an efficient resource-balancing strategy is critically important to consistently producing high-quality model validation reports on schedule and within budgets. The question of using internal vs. external model validation resources is not an either/or proposition. In considering it, we recommend that model risk management (MRM) professionals

  • consider the points above and initiate risk tolerance and budget conversations within the MRM framework.
  • reach out to vendors who have the skills to assist with your high-risk models, even if there is not an immediate need. Some institutions like to try out a model validation provider on a few low- or moderate-risk models to get a sense of their capabilities.
  • consider internal staffing to meet basic model validation needs and external vendors (either for full validations or outsourced staff) to fill gaps in expertise.

Get Started
Log in

Linkedin   

risktech2024