A Primer on HECM Loans
In September, RiskSpan announced the addition of Ginnie Mae’s loan-level Home Equity Conversion Mortgage (“HECM”) dataset to the Edge platform. The dataset contains over 330,000 HECM loans with origination dates from 2000 to 2018 and reporting periods from August 2013 to October 2018. This post is a primer on HECM loans, the HMBS securities they collateralize, and the structure of the new dataset. What is a HECM? HECMs are FHA-insured reverse mortgages that provide people 62 and older with cash payments or a line of credit in exchange for equity in their homes. Borrowers are not liable to make any payments on HECM balances until the house ceases to be their primary residence. In contrast to traditional mortgages that amortize down over time, reverse mortgage balances usually grow over time as accrued interest is added to the loan. The Federal Housing Administration (FHA) insures HECM lenders against default and loss and is paid a mortgage insurance premium in return. Because borrowers do not make principal and interest payments, the concept of HECM default differs from that of traditional forward mortgages. HECM default most commonly occurs when borrowers fail to keep current on property tax payments and insurance premiums or otherwise jeopardize the lender’s lien position on the property. Initial loan-to-value (LTV) ratios for HECMs average between 60% and 70% to allow for the balance to grow over time (taking into account borrower age and interest rate). The number of borrowers is arguably a more important factor when predicting HECM performance than when predicting traditional mortgage performance. Because reverse mortgages do not become due until all borrowers have left the property, reverse mortgages with multiple borrowers tend to have longer tenures—and consequently run a higher risk of growing beyond the point where the balance and accrued interest are supported by the underlying property’s value. Like traditional mortgages, HECM interest rates may be fixed or adjustable. Fixed-rate HECMs disburse a single, initial advance, while adjustable-rate HECMs combine a line of credit or monthly advance with an initial advance. Figure 1 (below), which was constructed using data from the newly available dataset, illustrates a steady increase in the share of ARM loans since 2013.
Figure 1 One net result of this trend is fewer one-time lump-sum distributions and more line-of-credit (LOC) distributions over time. LOCs give borrowers access to a source of funds that they can draw upon as needed. While LOCs constitute (by far) the most common type of HECM, two other loan types—“term” and “tenure”—also occupy the HECM landscape. “Term” loans provide monthly payments for a set period of time. “Tenure” loans provide monthly payments for as long as the borrower lives in the home as a primary residence. The lender receives principal, interest and possibly a share of the home appreciation upon expiry of the fixed term (in the case of term loans) or upon borrower’s death or move-out (in the case of either loan type). The dominance of the LOC loan type relative to term and tenure HECMs is depicted in Figure 2, below.
Figure 2 Fannie Mae had traditionally functioned as the primary investor in reverse mortgages for most of these loans’ 25-year existence. Since 2009, however, Fannie Mae has significantly scaled back its reverse mortgage portfolio, leaving the majority of the reverse mortgages to be picked up by the Ginnie Mae HMBS market. What is a HMBS? HECM loans are pooled into HECM mortgage-backed securities (HMBS) within the Ginnie Mae II MBS program. HMBS are made up of a pool of participations in the HECM loans. A participation in a HECM loan is a pro-rata share of the loan that is securitized in a HMBS. As explained above, many HECM loans are structured as a line of credit, which allows borrowers to draw on their lines as needed. When these draws occur, the drawn-down loans become a smaller pro-rata share of the loan and the participation balance doesn’t change. HMBS participations have a mandatory repurchase clause requiring a lender to buy back all the participations of a HECM loan when its LTV reaches 98%. For HECM loans, LTV is calculated as a proportion of the current HECM balance against the maximum claim amount. As of June 2018, participation unpaid balance stood at approximately $56.18 billion with 11,380,452 active participations. Figures 3 and 4, below, show the trend of participation composition (by number of participations and UPB) over time. These reflect the shift toward ARM lines of credit (and away from fixed-rate lump sum disbursements) illustrated in Figures 1 and 2.
Figure 3
Figure 4 HMBS Dataset Ginnie Mae provides two monthly loan-level files related to the HECMs that collateralize its HMBS offering. One of these files contains fixed-rate and annually adjusting rate loans, and the other contains monthly adjusting rate loans. Because individual security participations are spread across several different pools (often with several column values repeating for a single loan) working with this dataset can be challenging. An example of a single loan spread across multiple security participations is illustrated in the table below. Note that for a single loan ID, the current UPB and Max Claim Amount columns are repeated for each participation.
| Loan ID | Current HECM UPB | Max Claim Amount | Participation UPB |
| 1000033608 | 260,784.73 | 365,000.00 | 860.70 |
| 1000033608 | 260,784.73 | 365,000.00 | 321.87 |
| 1000033608 | 260,784.73 | 365,000.00 | 12,079.98 |
| 1000033608 | 260,784.73 | 365,000.00 | 483.81 |
Table 1 The most important risk factors associated with HECMs relate to borrower mortality and mobility (i.e., borrowers’ remaining in their homes until the increasing mortgage balance exceeds the value of the property). Borrowers are more likely to move out of their homes for health reasons as they age, but they become less likely to move out for other reasons. Having more than one borrower tends to extend the life of a HECM because the loan does not become due until the last surviving borrower leaves the property. As of the most recent reporting period, about 43% of the aggregate HMBS balance was associated with HECMs with more than one borrower. In order to calculate HECM prepayment speeds, we look at the zero balance codes provided in the dataset to exclude loans which have reached a 98% LTV from the opening balance. (As noted earlier, loans must be purchased out of the HMBS once they reach this threshold.) Because interest is deferred in HECM loans, it is added to the opening balance. We calculate the total prepayments and obtain the single monthly mortality to calculate the CPR. Figure 5, below, shows the one-month CPR by vintage over the past five years.
Figure 5 Because borrower mortality and mobility tend to remain stable over time, HECM prepayment speeds exhibit less variability than traditional mortgages do. An important aspect of evaluating CPR includes looking at the outstanding participation balance relative to borrower age. Figure 6 contains a heatmap plotting borrower age against HECM purpose for the most recent reporting period (July 2018).
Figure 6 Because most HECM borrowers are younger than age 80, prepayments are likely to increase as this cohort ages and becomes more likely to move out or pass away. Figure 7 below shows the five largest HMBS originators by participation as of July 2018. As discussed above, lines of credit (LOCs) are the most popular HECM type with Single Disbursement Lump Sum the next most frequent.
Stay tuned for future blog posts in which we will use the Edge platform to glean additional insights from this newly available and very interesting dataset. For information on how to use the Edge platform to conduct your own analyses of this or any other dataset, please contact us.
CRT Exposure to Hurricane Michael

With Hurricane Michael approaching the Gulf Coast, we put together some interactive charts looking at the affected metro areas, and their related CRT exposure (Both CAS and STACR). Given the large area of impact with Hurricane Michael, we have included a nearly exhaustive selection of MSA’s. Click on a deal ID along the left-hand side of the plot to view its exposure to each MSA. Most of the mortgage delinquencies in the wake of Hurricane Harvey quickly cured. Holders of securities backed by loans that ultimately defaulted (typically because the property was completely destroyed) had much of their exposure mitigated by insurance proceeds, government intervention, and other relief provisions.
CRT Deal Monitor: Understanding When Credit Becomes Risky
This analysis tracks several metrics related to deal performance and credit profile, putting them into a historical context by comparing the same metrics for recent-vintage deals against those of ‘similar’ cohorts in the time leading up to the 2008 housing crisis. You’ll see how credit metrics are trending today and understand the significance of today’s shifts in the context of historical data. Some of the charts in this post have interactive features, so click around! We’ll be tweaking the analysis and adding new metrics in subsequent months. Please shoot us an email if you have an idea for other metrics you’d like us to track.
Highlights
- Performance metrics signal steadily increasing credit risk, but no cause for alarm.
- We’re starting to see the hurricane-related (2017 Harvey and Irma) delinquency spikes subside in the deal data. Investors should expect a similar trend in 2019 due to Hurricane Florence.
- The overall percentage of delinquent loans is increasing steadily due to the natural age ramp of delinquency rates and the ramp-up of the program over the last 5 years.
- Overall delinquency levels are still far lower than historical rates.
- While the share of delinquency is increasing, loans that go delinquent are ending up in default at a lower rate than before.
- Deal Profiles are becoming riskier as new GSE acquisitions include higher-DTI business.
- It’s no secret that both GSEs started acquiring a lot of high-DTI loans (for Fannie this moved from around 16% of MBS issuance in Q2 2017 to 30% of issuance as of Q2 this year). We’re starting to see a shift in CRT deal profiles as these loans are making their way into CRT issuance.
- The credit profile chart toward the end of this post compares the credit profiles of recently issued deals with those of the most recent three months of MBS issuance data to give you a sense of the deal profiles we’re likely to see over the next 3 to 9 months. We also compare these recently issued deals to a similar cohort from 2006 to give some perspective on how much the credit profile has improved since the housing crisis.
- RiskSpan’s Vintage Quality Index reflects an overall loosening of credit standards–reminiscent of 2003 levels–driven by this increase in high-DTI originations.
- Fannie and Freddie have fundamental differences in their data disclosures for CAS and STACR.
- Delinquency rates and loan performance all appear slightly worse for Fannie Mae in both the deal and historical data.
- Obvious differences in reporting (e.g., STACR reporting a delinquent status in a terminal month) have been corrected in this analysis, but some less obvious differences in reporting between the GSEs may persist.
- We suspect there is something fundamentally different about how Freddie Mac reports delinquency status—perhaps related to cleaning servicing reporting errors, cleaning hurricane delinquencies, or the way servicing transfers are handled in the data. We are continuing our research on this front and hope to follow up with another post to explain these anomalies.
The exceptionally low rate of delinquency, default, and loss among CRT deals at the moment makes analyzing their credit-risk characteristics relatively boring. Loans in any newly issued deal have already seen between 6 and 12 months of home price growth, and so if the economy remains steady for the first 6 to 12 months after issuance, then that deal is pretty much in the clear from a risk perspective. The danger comes if home prices drift downward right after deal issuance. Our aim with this analysis is to signal when a shift may be occurring in the credit risk inherent in CRT deals. Many data points related to the overall economy and home prices are available to investors seeking to answer this question. This analysis focuses on what the Agency CRT data—both the deal data and the historical performance datasets—can tell us about the health of the housing market and the potential risks associated with the next deals that are issued.
Current Performance and Credit Metrics
Delinquency Trends
The simplest metric we track is the share of loans across all deals that is 60+ days past due (DPD). The charts below compare STACR (Freddie) vs. CAS (Fannie), with separate charts for high-LTV deals (G2 for CAS and HQA for STACR) vs. low-LTV deals (G1 for CAS and DNA for STACR). Both time series show a steadily increasing share of delinquent loans. This slight upward trend is related to the natural aging curve of delinquency and the ramp-up of the CRT program. Both time series show a significant spike in delinquency around January of this year due to the 2017 hurricane season. Most of these delinquent loans are expected to eventually cure or prepay. For comparative purposes, we include a historical time series of the share of loans 60+ DPD for each LTV group. These charts are derived from the Fannie Mae and Freddie Mac loan-level performance datasets. Comparatively, today’s deal performance is much better than even the pre-2006 era. You’ll note the systematically higher delinquency rates of CAS deals. We suspect this is due to reporting differences rather than actual differences in deal performance. We’ll continue to investigate and report back on our findings.
Delinquency Outcome Monitoring
While delinquency rates might be trending up, loans that are rolling to 60-DPD are ultimately defaulting at lower and lower rates. The tables below track the status of loans that were 60+ DPD. Each bar in the chart represents the population of loans that were 60+ DPD exactly 6 months prior to the x-axis date. Over time, we see growing 60-DPD and 60+ DPD groups, and a shrinking Default group. This indicates that a majority of delinquent loans wind up curing or prepaying, rather than proceeding to default. The choppiness and high default rates in the first few observations of the data are related to the very low counts of delinquent loans as the CRT program ramped up. The following table repeats the 60-DPD delinquency analysis for the Freddie Mac Loan Level Performance dataset leading up to and following the housing crisis. (The Fannie Mae loan level performance set yields a nearly identical chart.) Note how many more loans in these cohorts remained delinquent (rather than curing or defaulting) relative to the more recent CRT loans. https://plot.ly/~dataprep/30.embed
Vintage Quality Index
RiskSpan’s Vintage Quality Index (VQI) reflects a reversion to the looser underwriting standards of the early 2000s as a result of the GSEs’ expansion of high-DTI lending. RiskSpan introduced the VQI in 2015 as a way of quantifying the underwriting environment of a particular vintage of mortgage originations. We use the metric as an empirically grounded way to control for vintage differences within our credit model.
While both GSEs increased high-DTI lending in 2017, it’s worth noting that Fannie Mae saw a relatively larger surge in loans with DTIs greater than 43%. The chart below shows the share of loans backing MBS with DTI > 43. We use the loan-level MBS issuance data to track what’s being originated and acquired by the GSEs because it is the timeliest data source available. CRT deals are issued with loans that are between 6 and 20 months seasoned, and so tracking MBS issuance provides a preview of what will end up in the next cohort of deals. 
Deal Profile Comparison
The tables below compare the credit profiles of recently issued deals. We focus on the key drivers of credit risk, highlighting the comparatively riskier features of a deal. Each table separates the high-LTV (80%+) deals from the low-LTV deals (60%-80%). We add two additional columns for comparison purposes. The first is the ‘Coming Cohort,’ which is meant to give an indication of what upcoming deal profiles will look like. The data in this column is derived from the most recent three months of MBS issuance loan-level data, controlling for the LTV group. These are newly originated and acquired by the GSEs—considering that CRT deals are generally issued with an average loan age between 6 and 15 months, these are the loans that will most likely wind up in future CRT transactions. The second comparison cohort consists of 2006 originations in the historical performance datasets (Fannie and Freddie combined), controlling for the LTV group. We supply this comparison as context for the level of risk that was associated with one of the worst-performing cohorts. The latest CAS deals—both high- and low-LTV—show the impact of increased >43% DTI loan acquisitions. Until recently, STACR deals typically had a higher share of high-DTI loans, but the latest CAS deals have surpassed STACR in this measure, with nearly 30% of their loans having DTI ratios in excess of 43%. CAS high-LTV deals carry more risk in LTV metrics, such as the percentage of loans with a CLTV > 90 or CLTV > 95. However, STACR includes a greater share of loans with a less-than-standard level of mortgage insurance, which would provide less loss protection to investors in the event of a default.
Low-LTV deals generally appear more evenly matched in terms of risk factors when comparing STACR and CAS. STACR does display the same DTI imbalance as seen in the high-LTV deals, but that may change as the high-DTI group makes its way into deals. 
Deal Tracking Reports
Please note that defaults are reported on a delay for both GSEs, and so while we have CPR numbers available for August, CDR numbers are not provided because they are not fully populated yet. Fannie Mae CAS default data is delayed an additional month relative to STACR. We’ve left loss and severity metrics blank for fixed-loss deals.

Big Companies; Big Data Issues
Data issues plague organizations of all sorts and sizes. But generally, the bigger the dataset, and the more transformations the data goes through, the greater the likelihood of problems. Organizations take in data from many different sources, including social media, third-party vendors and other structured and unstructured origins, resulting in massive and complex data storage and management challenges. This post presents ideas to keep in mind when seeking to address these.
First, a couple of definitions:
Data quality generally refers to the fitness of a dataset for its purpose in a given context. Data quality encompasses many related aspects, including:
- Accuracy,
- Completeness,
- Update status,
- Relevance,
- Consistency across data sources,
- Reliability,
- Appropriateness of presentation, and
- Accessibility
Data lineage tracks data movement, including its origin and where it moves over time. Data lineage can be represented visually to depict how data flows from its source to its destination via various changes and hops.
The challenges facing many organizations relate to both data quality and data lineage issues, and a considerable amount of time and effort is spent both in tracing the source of data (i.e., its lineage) and correcting errors (i.e., ensuring its quality). Business intelligence and data visualization tools can do a magnificent job of teasing stories out of data, but these stories are only valuable when they are true. It is becoming increasingly vital to adopt best practices to ensure that the massive amounts of data feeding downstream processes and presentation engines are both reliable and properly understood.
Financial institutions must frequently deal with disparate systems either because of mergers and acquisitions or in order to support different product types—consumer lending, commercial banking and credit cards, for example. Disparate systems tend to result in data silos, and substantial time and effort must go into providing compliance reports and meeting the various regulatory requirements associated with analyzing data provenance (from source to destination). Understanding the workflow of data and access controls around security are also vital applications of data lineage and help ensure data quality.
In addition to the obvious need for financial reporting accuracy, maintaining data lineage and quality is vital to identifying redundant business rules and data and to ensuring that reliable, analyzable data is constantly available and accessible. It also helps to improve the data governance echo system, enabling data owners to focus on gleaning business insights from their data rather than focusing attention on rectifying data issues.
Common Data Lineage Issues
A surprising number of data issues emerge simply from uncertainty surrounding a dataset’s provenance. Many of the most common data issues stem from one or more of the following categories:
- Human error: “Fat fingering” is just the tip of the iceberg. Misconstruing and other issues arising from human intervention are at the heart of virtually all data issues.
- Incomplete Data: Whether it’s drawing conclusions based on incomplete data or relying on generalizations and judgment to fill in the gaps, many data issues are caused by missing data.
- Data format: Systems expect to receive data in a certain format. Issues arise when the actual input data departs from these expectations.
- Data consolidation: Migrating data from legacy systems or attempting to integrate newly acquired data (from a merger, for instance) frequently leads to post-consolidation issues.
- Data processing: Calculation engines, data aggregators, or any other program designed to transform raw data into something more “usable” always run the risk of creating output data with quality issues.
Addressing Issues
Issues relating to data lineage and data quality are best addressed by employing some combination of the following approaches. The specific blend of approaches depends on the types of issues and data in question, but these principles are broadly applicable.
Employing a top-down discovery approach enables data analysts to understand the key business systems and business data models that drive an application. This approach is most effective when logical data models are linked to the physical data and systems.
Creating a rich metadata repository for all the data elements flowing from the source to destination can be an effective way of heading off potential data lineage issues. Because data lineage is dependent on the metadata information, creating a robust repository from the outset often helps preserve data lineage throughout the life cycle.
Imposing useful data quality rules is an important element in establishing a framework in which data is always validated against a set of well-conceived business rules. Ensuring not only that data passes comprehensive rule sets but also that remediation factors are in place for appropriately dealing with data that fails quality control checks is crucial for ensuring end-to-end data quality.
Data lineage and data quality both require continuous monitoring by a defined stewardship council to ensure that data owners are taking appropriate steps to understand and manage the idiosyncrasies of the datasets they oversee.
Our Data Lineage and Data Quality Background
RiskSpan’s diverse client base includes several large banks (with we define as banks with assets totaling in excess of $50 billion). Large banks are characterized by a complicated web of departments and sub-organizations, each offering multiple products, sometimes to the same base of customers. Different sub-organizations frequently rely on disparate systems (sometimes due to mergers/acquisitions; sometimes just because they develop their businesses independent of one another). Either way, data silos inevitably result.
RiskSpan has worked closely with chief data officers of large banks to help establish data stewardship teams charged with taking ownership of the various “areas” of data within the bank. This involves the identification of data “curators” within each line of business to coordinate with the CDO’s office and be the advocate (and ultimately the responsible party) for the data they “own.” In best practice scenarios, a “data curator” group is formed to facilitate collaboration and effective communication for data work across the line of business.
We have found that a combination of top-down and bottom-up data discovery approaches is most effective when working accross stakeholders to understand existing systems and enterprise data assets. RiskSpan has helped create logical data flow diagrams (based on the top-down approach) and assisted with linking physical data models to the logical data models. We have found Informatica and Collibra tools to be particularly useful in creating data lineage, tracking data owners, and tracing data flow from source to destination.
Complementing our work with financial clients to devise LOB-based data quality rules, we have built data quality dashboards using these same tools to enable data owners and curators to rectify and monitor data quality issues. These projects typically include elements of the following components.
- Initial assessment review of the current data landscape.
- Establishment of a logical data flow model using both top-down and bottom-up data discovery approaches.
- Coordination with the CDO / CIO office to set up a data governance stewardship team and to identify data owners and curators from all parts of the organization.
- Delineation of data policies, data rules and controls associated with different consumers of the data.
- Development of a target state model for data lineage and data quality by outlining the process changes from a business perspective.
- Development of future-state data architecture and associated technology tools for implementing data lineage and data quality.
- Invitation to client stakeholders to reach a consensus related to future-state model and technology architecture.
- Creation of a project team to execute data lineage and data quality projects by incorporating the appropriate resources and client stakeholders.
- Development of a change management and migration strategy to enable users and stakeholders to use data lineage and data quality tools.
Ensuring data quality and lineage is ultimately the responsibility of business lines that own and use the data. Because “data management” is not the principal aim of most businesses, it often behooves them to leverage the principles outlined in this post (sometimes along with outside assistance) to implement tactics that will to help ensure that the stories their data tell are reliable.
Here Come the CECL Models: What Model Validators Need to Know
As it turns out, model validation managers at regional banks didn’t get much time to contemplate what they would do with all their newly discovered free time. Passage of the Economic Growth, Regulatory Relief, and Consumer Protection Act appears to have relieved many model validators of the annual DFAST burden. But as one class of models exits the inventory, a new class enters—CECL models.
Banks everywhere are nearing the end of a multi-year scramble to implement a raft of new credit models designed to forecast life-of-loan performance for the purpose of determining appropriate credit-loss allowances under the Financial Accounting Standards Board’s new Current Expected Credit Loss (CECL) standard, which takes full effect in 2020 for public filers and 2021 for others.
The number of new models CECL adds to each bank’s inventory will depend on the diversity of asset portfolios. More asset classes and more segmentation will mean more models to validate. Generally model risk managers should count on having to validate at least one CECL model for every loan and debt security type (residential mortgage, CRE, plus all the various subcategories of consumer and C&I loans) plus potentially any challenger models the bank may have developed.
In many respects, tomorrow’s CECL model validations will simply replace today’s allowance for loan and lease losses (ALLL) model validations. But CECL models differ from traditional allowance models. Under the current standard, allowance models typically forecast losses over a one-to-two-year horizon. CECL requires a life-of-loan forecast, and a model’s inputs are explicitly constrained by the standard. Accounting rules also dictate how a bank may translate the modeled performance of a financial asset (the CECL model’s outputs) into an allowance. Model validators need to be just as familiar with the standards governing how these inputs and outputs are handled as they are with the conceptual soundness and mathematical theory of the credit models themselves.
CECL Model Inputs – And the Magic of Mean Reversion
Not unlike DFAST models, CECL models rely on a combination of loan-level characteristics and macroeconomic assumptions. Macroeconomic assumptions are problematic with a life-of-loan credit loss model (particularly with long-lived assets—mortgages, for instance) because no one can reasonably forecast what the economy is going to look like six years from now. (No one really knows what it will look like six months from now, either, but we need to start somewhere.) The CECL standard accounts for this reality by requiring modelers to consider macroeconomic input assumptions in two separate phases: 1) a “reasonable and supportable” forecast covering the time frame over which the entity can make or obtain such a forecast (two or three years is emerging as common practice for this time frame), and 2) a “mean reversion” forecast based on long-term historical averages for the out years. As an alternative to mean reverting by the inputs, entities may instead bypass their models in the out years and revert to long-term average performance outcomes by the relevant loan characteristics.
Assessing these assumptions (and others like them) requires a model validator to simultaneously wear a “conceptual soundness” testing hat and an “accounting policy” compliance hat. Because the purpose of the CECL model is to prove an accounting answer and satisfy an accounting requirement, what can validators reasonably conclude when confronted with an assumption that may seem unsound from purely statistical point of view but nevertheless satisfies the accounting standard?
Taking the mean reversion requirement as an example, the projected performance of loans and securities beyond the “reasonable and supportable” period is permitted to revert to the mean in one of two ways: 1) modelers can feed long-term history into the model by supplying average values for macroeconomic inputs, allowing modeled results to revert to long-term means in that way, or 2) modelers can mean revert “by the outputs” – bypassing the model and populating the remainder of the forecast with long-term average performance outcomes (prepayment, default, recovery and/or loss rates depending on the methodology). Either of these approaches could conceivably result in a modeler relying on assumptions that may be defensible from an accounting perspective despite being statistically dubious, but the first is particularly likely to raise a validator’s eyebrow. The loss rates that a model will predict when fed “average” macroeconomic input assumptions are always going to be uncharacteristically low. (Because credit losses are generally large in bad macroeconomic environments and low in average and good environments, long-term average credit losses are higher than the credit losses that occur during average environments. A model tuned to this reality—and fed one path of “average” macroeconomic inputs—will return credit losses substantially lower than long-term average credit losses.) A credit risk modeler is likely to think that these are not particularly realistic projections, but an auditor following the letter of the standard may choose not find any fault with them. In such situations, validators need to fall somewhere in between these two extremes—keeping in mind that the underlying purpose of CECL models is to reasonably fulfill an accounting requirement—before hastily issuing a series of high-risk validation findings.
CECL Model Outputs: What are they?
CECL models differ from some other models in that the allowance (the figure that modelers are ultimately tasked with getting to) is not itself a direct output of the underlying credit models being validated. The expected losses that emerge from the model must be subject to a further calculation in order to arrive at the appropriate allowance figure. Whether these subsequent calculations are considered within the scope of a CECL model validation is ultimately going to be an institutional policy question, but it stands to reason that they would be.
Under the CECL standard, banks will have two alternatives for calculating the allowance for credit losses: 1) the allowance can be set equal to the sum of the expected credit losses (as projected by the model), or 2) the allowance can be set equal to the cost basis of the loan minus the present value of expected cash flows. While a validator would theoretically not be in a position to comment on whether the selected approach is better or worse than the alternative, principles of process verification would dictate that the validator ought to determine whether the selected approach is consistent with internal policy and that it was computed accurately.
When Policy Trumps Statistics
The selection of a mean reversion approach is not the only area in which a modeler may make a statistically dubious choice in favor of complying with accounting policy.
Discount Rates
Translating expected losses into an allowance using the present-value-of-future-cash-flows approach (option 2—above) obviously requires selecting an appropriate discount rate. What should it be? The standard stipulates the use of the financial asset’s Effective Interest Rate (or “yield,” i.e., the rate of return that equates an instrument’s cash flows with its amortized cost basis). Subsequent accounting guidance affords quite a bit a flexibility in how this rate is calculated. Institutions may use the yield that equates contractual cash flows with the amortized cost basis (we can call this “contractual yield”), or the rate of return that equates cash flows adjusted for prepayment expectations with the cost basis (“prepayment-adjusted yield”).
The use of the contractual yield (which has been adjusted for neither prepayments nor credit events) to discount cash flows that have been adjusted for both prepayments and credit events will allow the impact of prepayment risk to be commingled with the allowance number. For any instruments where the cost basis is greater than unpaid principal balance (a mortgage instrument purchased at 102, for instance) prepayment risk will exacerbate the allowance. For any instruments where the cost basis is less than the unpaid principal balance, accelerations in repayment will offset the allowance. This flaw has been documented by FASB staff, with the FASB Board subsequently allowing but not requiring the use of a prepay-adjusted yield.
Multiple Scenarios
The accounting standard neither prohibits nor requires the use of multiple scenarios to forecast credit losses. Using multiple scenarios is likely more supportable from a statistical and model validation perspective, but it may be challenging for a validator to determine whether the various scenarios have been weighted properly to arrive at the correct, blended, “expected” outcome.
Macroeconomic Assumptions During the “Reasonable and Supportable” Period
Attempting to quantitatively support the macro assumptions during the “reasonable and supportable” forecast window (usually two to three years) is likely to be problematic both for the modeler and the validator. Such forecasts tend to be more art than science and validators are likely best off trying to benchmark them against what others are using than attempting to justify them using elaborately contrived quantitative methods. The data that is mostly likely to be used may turn out to be simply the data that is available. Validators must balance skepticism of such approaches with pragmatism. Modelers have to use something, and they can only use the data they have.
Internal Data vs. Industry Data
The standard allows for modeling using internal data or industry proxy data. Banks often operate under the dogma that internal data (when available) is always preferable to industry data. This seems reasonable on its face, but it only really makes sense for institutions with internal data that is sufficiently robust in terms of quantity and history. And the threshold for what constitutes “sufficiently robust” is not always obvious. Is one business cycle long enough? Is 10,000 loans enough? These questions do not have hard and fast answers.
———-
Many questions pertaining to CECL model validations do not yet have hard and fast answers. In some cases, the answers will vary by institution as different banks adopt different policies. Industry best practices will doubtless emerge in response to others. For the rest, model validators will need to rely on judgment, sometimes having to balance statistical principles with accounting policy realities. The first CECL model validations are around the corner. It’s not too early to begin thinking about how to address these questions.
Houston Strong: Communities Recover from Hurricanes. Do Mortgages?
The 2017 hurricane season devastated individual lives, communities, and entire regions. As one would expect, dramatic increases in mortgage delinquencies accompanied these events. But the subsequent recoveries are a testament both to the resilience of the people living in these areas and to relief mechanisms put into place by the mortgage holders.
Now, nearly a year later, we wanted to see what the credit-risk transfer data (as reported by Fannie Mae CAS and Freddie Mac STACR) could tell us about how these borrowers’ mortgage payments are coming along.
The timing of the hurricanes’ impact on mortgage payments can be approximated by identifying when Current-to-30 days past due (DPD) roll rates began to spike. Barring other major macroeconomic events, we can reasonably assume that most of this increase is directly due to hurricane-related complications for the borrowers.

The effect of the hurricanes is clear—Puerto Rico, the U.S. Virgin Islands, and Houston all experienced delinquency spikes in September. Puerto Rico and the Virgin Islands then experienced a second wave of delinquencies in October due to Hurricanes Irma and Maria.
But what has been happening to these loans since entering delinquency? Have they been getting further delinquent and eventually defaulting, or are they curing? We focus our attention on loans in Houston (specifically the Houston-The Woodlands-Sugar Land Metropolitan Statistical Area) and Puerto Rico because of the large number of observable mortgages in those areas.
First, we look at Houston. Because the 30-DPD peak was in September, we track that bucket of loans. To help us understand the path 30-DPD might reasonably be expected to take, we compared the Houston delinquencies to 30-DPD loans in the 48 states other than Texas and Florida.


Of this group of loans in Houston that were 30 DPD in September, we see that while many go on to be 60+ DPD in October, over time this cohort is decreasing in size.
Recovery is slower than the non-hurricane-affected U.S. loans, but persistent. The biggest difference is that a significant number of 30-day delinquencies in the rest of the country loans continue to hover at 30 DPD (rather than curing or progressing to 60 DPD) while the Houston cohort is more evenly split between the growing number loans that cure and the shrinking number of loans progressing to 60+ DPD.
Puerto Rico (which experienced its 30 DPD peak in October) shows a similar trend:


To examine loans even more affected by the hurricanes, we can perform the same analysis on loans that reached 60 DPD status.

Here, Houston’s peak is in October while Puerto Rico’s is in November.
Houston vs. the non-hurricane-affected U.S.:


Puerto Rico vs. the non-hurricane-affected U.S.:


In both Houston and Puerto Rico, we see a relatively small 30-DPD cohort across all months and a growing Current cohort. This indicates many people paying their way to Current from 60+ DPD status. Compare this to the rest of the US where more people pay off just enough to become 30 DPD, but not enough to become Current.
The lack of defaults in post-hurricane Houston and Puerto Rico can be explained by several relief mechanisms Fannie Mae and Freddie Mac have in place. Chiefly, disaster forbearance gives borrowers some breathing room with regards to payment. The difference is even more striking among loans that were 90 days delinquent, where eventual default is not uncommon in the non-hurricane affected U.S. grouping:


And so, both 30-DPD and 60-DPD loans in Houston and Puerto Rico proceed to more serious levels of delinquency at a much lower rate than similarly delinquent loans in the rest of the U.S. To see if this is typical for areas affected by hurricanes of a similar scale, we looked at Fannie Mae loan-level performance data for the New Orleans MSA after Hurricane Katrina in August 2005.
As the following chart illustrates, current-to-30 DPD roll rates peaked in New Orleans in the month following the hurricane:

What happened to these loans?

Here we see a relatively speedy recovery, with large decreases in the number of 60+ DPD loans and a sharp increase in prepayments. Compare this to non-hurricane affected states over the same period, where the number of 60+ DPD loans held relatively constant, and the number of prepayments grew at a noticeably slower rate than in New Orleans.

The remarkable number of prepayments in New Orleans was largely due to flood insurance payouts, which effectively prepay delinquent loans. Government assistance lifted many others back to current. As of March, we do not see this behavior in Houston and Puerto Rico, where recovery is moving much more slowly. Flood insurance incidence rates are known to have been low in both areas, a likely suspect for this discrepancy.
While loans are clearly moving out of delinquency in these areas, it is at a much slower rate than the historical precedent of Hurricane Katrina. In the coming months we can expect securitized mortgages in Houston and Puerto Rico to continue to improve, but getting back to normal will likely take longer than what was observed in New Orleans following Katrina. Of course, the impending 2018 hurricane season may complicate this matter.
—————————————————————————————————————-
Note: The analysis in this blog post was developed using RiskSpan’s Edge Platform. The RiskSpan Edge Platform is a module-based data management, modeling, and predictive analytics software platform for loans and fixed-income securities. Click here to learn more.
From Main Street to King Abdullah Financial District: Lessons Learned in International Mortgage Finance
In December 2016, I was asked to consult on a start-up real estate refinance company located in the Saudi Arabia. I wasn’t sure I understood what he was saying. As someone who has worked in the U.S. mortgage business since college, the word “refinance” has very strong connotations, but its use seemed wrong in this context. As it turned out in overseas mortgage markets, the phrase real estate refinance refers to “providing funding” or “purchasing mortgage assets.” And that started my quick introduction into the world of international mortgage finance where, “everything is different but in the end it’s all the same.”
By early January 2017 I found myself in Riyadh, Saudi Arabia, working as an adviser to a consulting firm contracted to manage the start-up of the new enterprise. Riyadh in January is nice—cool temperatures and low humidity. In the summer it’s another story. Our client was the Ministry of Housing and the Saudi Sovereign Wealth fund. One of the goals of Saudi Arabia’s ambitious Vision 2030 is the creation of its own secondary mortgage company. Saudi Arabia has 18 banks and finance companies originating Islamic mortgages, but the future growth of the economy and population is expected to create demand for mortgages that far exceeds the current financial system’s capacity. The travel and hotel accommodations were delightful. The jet lag and working hours were not.
My foremost motivation for taking the project was to check off “worked overseas” from my career bucket list. Having spent my entire career in the U.S. mortgage business, this had always seemed too distant an opportunity. The project was supposed to last three months, but seventeen months later I’m writing this article in a hotel room overlooking downtown Riyadh. The cultural experience living and working in Saudi Arabia is something I have spent hours discussing with family and friends.
But the goal of this article is not to describe my cultural experiences but to write about the lessons I’ve learned about the U.S. mortgage business sitting 7,000 miles away. Below, I’ve laid out some of my observations.
Underwriting is underwriting
As simple as that. Facts, practices and circumstances may be local, but the principles of sound mortgage underwriting are universal: 1) develop your risk criteria, 2) validate and verify the supporting documentation, 3) underwrite the file and 4) capture performance data to confirm your risk criteria. Although mortgage lending is only 10 years old in Saudi Arabia, underwriting criteria and methodologies here strongly resemble those in the USA. Loan-to-value ratios, use of appraisals, asset verification, and debt-to-income (DTI) determination—it’s basically the same. All mortgages are fully documented.
But it is different. In Saudi Arabia, where macro-economic issues—i.e., oil prices and lack of economic diversification—dominate the economy, lenders need to find alternatives in underwriting. For example, the use of credit scores takes a second seat to employment stability. To lenders, a borrower’s employer—i.e., government or the military—is more important than a high credit score. Why? Lower oil prices can crush economic growth, leading to higher unemployment with little opportunity for displaced workers to find new jobs. The lack of a diversified economy makes lenders wary of lending to employees of private-sector companies, hence their focus on lending to government employees. This impact leads to whole segments of potential borrowers being left out of the mortgage market.
The cold reality in emerging economic countries like Saudi Arabia is that only the best borrowers can get loans. Even then, lenders may require a “salary assignment,” in which a borrower’s employer pays the lender directly. The lesson is that the primary credit risk strategy in Saudi Arabia is to avoid credit losses by all means—the best way to manage credit risk is to avoid it.
Finance is finance
Finance is the same everywhere and concepts of cash flow and return analysis are universal, whether the transaction is Islamic or conventional. There’s lots of confusion about what Islamic finance is and how it works. Many people misunderstand shariah law and its rules on paying interest. Not all banks in Saudi Arabia are Islamic, and although many are, while paying interest on debt is non-sharia, leases and equity returns are sharia compliant. The key to Islamic finance is selecting appropriate finance products that comply with shariah but also meet the needs of lenders.
In Saudi Arabia, most lenders originate Islamic mortgages called Ijarah. With an Ijarah mortgage the borrower selects a property to purchase and then goes to the lender. At closing the lender accepts a down payment from the borrower and the lender purchases the property directly from the seller. The lender then executes an agreement to lease the property to the borrower for the life of the mortgage. This looks a lot like a long-term lease. Instead of paying an interest rate, the borrower pays an APR on a stated equity return or “profit rate” to the lender on the lease arrangement.
Similarly, Islamic warehouse lending on mortgage collateral resembles a traditional repo transaction—an agreed upon sale price and repurchase price and a bunch of commodity trades linked to the transaction. In Islamic finance, the art relies on a sound understanding of the cash flows, the collateral limitations, the needs of all parties, and Islamic law. Over the past decade, the needs of the lenders, investors and intermediaries has evolved into set of standardized transactions that meet the financing needs of the market.
People are people
People are the same everywhere—good, bad and otherwise—and it’s no different overseas. And there is a lot of great talent out there. The people I have worked with are talented, motivated and educated. I have had the opportunity to work with Saudis and people from at least 15 other countries. Fortunately for me, English is the operating business language in Saudi Arabia and no one is any wiser to whether my explanations of the U.S. mortgage market are accurate or not. The international consulting and accounting firms have done a tremendous job creating strong business models to identify, hire, train and manage employees, cultivating a rich talent pool of consultants and future employees. A rich country like Saudi Arabia is a magnet for expats—it has both the money and vision to afford talent. In addition, Saudi Arabia’s rapid population growth and strong education system has added to a homegrown pool of talented employees.
Standardization is a benefit worth fighting for
One of the primary goals of any international refinance or secondary market company is standardization. The benefits of standardization extend to all market participants—borrowers, lenders and investors. Secondary market companies thrive where transactions are cheaper, faster and better, making it an easy choice for government policymakers to support. For consumers, rates are lower, the choices of lenders and products are better, and the origination process is more transparent. For investors, the standardization of structures, cash flows and obligations improves liquidity, increases the number of active market participants and ultimately lowers the transactional bid/ask spreads and yields.
However, the benefits of standardization are less clear for the primary customer they are meant to help—the lenders. While standardization can lower operating expenses or improve business processes, it does little to increase the comparative advantages of each lender.
Saudi lenders are focused on customer service and product design, leaving price aside. This focus has led lenders to design mortgage products with unique interest rate adjustment periods, payment options and one-of-a-kind mortgage notes and customized purchase and sale agreements.[1] This degree of customization can be a recipe for disaster, leading to endless negotiations, misunderstandings of rate reset mechanisms, extended deal timelines, and differences of opinion among shariah advisers. When negotiations are culturally a zero-sum game, trying to persuade lenders of the rationale for advancing monthly payments by the 10th of each month is exhausting.
Saudi lenders see the long-term benefits of increased volume, selling credit exposure and servicing income. But they haven’t figured out that strong secondary markets lead to the development of tertiary markets like forward trading in MBS, trading of Mortgage Servicing Rights (MSRs) or better terms for warehouse lending.
Mortgages are sold, not purchased
It’s a universal tenet throughout the world: buying real estate and financing it with a mortgage is a complex transaction. It requires experienced and well-trained loan officers to aid and walk the consumers through the process. A loan officer’s skill at persuading a potential customer to submit a loan application is every bit as important as his knowledge of mortgages. It’s no different in Saudi Arabia. While building relationships with realtors is important, the Saudi market is more of a construction-to-permanent market than a resale market. Individuals builders are simply too small to be able to channel consumers to lenders.
What to do? The Saudi mortgage origination market has quickly evolved to using alternatives like social media to capture consumer traffic. Saudi citizens are some of the most active users of social media in world.[2] (How active? From my experience, 9 out of 10 drivers on the road are reading their smart phones instead on looking at the road—it’s downright scary.) Lenders have developed sophisticated media campaigns using Twitter, You Tube and other platforms to drive traffic to their call centers where loan officers can sell mortgages to potential borrowers.
Whatever the language, closing lines are the same everywhere.
Regulation – A necessary evil
Saudi Arabia’s is a highly regulated financial market. Its primary financial regulator is the Saudi Arabia Monetary Authority, better known as SAMA. Regulation and oversight is centrally controlled and has been in place for almost 70 years. SAMA has placed a premium on well-capitalized financial institutions and closely monitors transactions and the liquidity of its institutions. The approval process is detailed and time consuming, but it has resulted in well-capitalized institutions. The minimum capital of the country’s five non-bank mortgage lenders exceeds $100MM USD.
A secondary role of SAMA has been to maintain stability within the financial markets—protecting consumers against bad actors and minimizing the market’s systematic risks. Financial literacy among Saudi citizens is low and comprehensive consumer protections akin to the Real Estate Settlement Procedures Act (RESPA) in the U.S. don’t exist here. SAMA fills this role, resulting in an ad hoc mix of consumer protections with mixed enforcement actions. Sometimes the cost of the protection is greater than evil it’s ostensibly protecting against.
As examples, SAMA regulates the maximum LTVs for the mortgage market and limits the consumer’s out-of-pocket cash fees to $1,250 USD. Managing LTV limits for the market goes a long way toward preventing over-lending when the markets are speculative. This was extremely beneficial in cooling down a hot Saudi real estate market in 2013.
Capping a borrower’s out-of-pocket expenses makes sense to limit unscrupulous market players from hustling borrowers. But the downside is the inability of lenders to monetize their transactions—i.e., to get cash from borrowers, sell mortgages at premium prices or sell servicing rights. This results in higher mortgage rates as lenders push up their mortgage coupons to generate cash to reimburse them for the higher costs associated with originating the mortgage. It is also a factor in the lenders’ use of prepayment penalties.
External constraints affect the design of local mortgage products
Ultimately, mortgage financing products available to consumers in any country are a function of the maturity level and the previous legacy development of its financial and capital markets. In Saudi Arabia, where large banks dominate, the deposit funding strategies determine mortgage product design. Capital markets are relatively new in the Kingdom. Only in the past several years has the Saudi government issued enough Sukuks to fill the Saudi Arabian yield curve out to ten years. While the government has plenty of buyers for its debt, the primary mortgage lenders do not. The concept of amortizing debt products is anathema to the market’s debt investors. Without access to longer-term debt buyers, the mortgage market products are primarily linked to 1-year SAIBOR (the Saudi version of LIBOR). This inability to secure long-term funding impacts amortization periods the lenders can offer, with most mortgages limited to a maximum amortization period of 20 years. The high mortgage rates, short-fixed payment tenors and short amortization periods all contribute to affordability issues for the average Saudi citizen.
Affordable Housing is an issue everywhere
Over the past 50 years Saudi Arabia’s vast oil wealth has enabled it to become an educated, middle-class society. The trillions of dollars in oil revenues have enabled the country to transform from a nomadic culture to a modern economy with growth centered in its primary cities. But its population growth rate and urban migration has created a mismatch of affordable housing in the growth centers of the country. The lack of affordable urban housing, outdated government housing policies and restrictive mortgage lending policies has stifled both the demand and supply of affordable housing units.
While well-functioning capital markets can help to lower mortgage rates and improve credit terms, it is only a small part of the solution for helping people afford and remain in housing. In this regard, Saudi Arabia looks a lot like the United States. With entities like the Real Estate Development Fund (REDF), Saudia Arabia is trying to manage the challenges of creating housing programs that solve housing issues for all, as opposed to subsidy programs that only help a small minority of people, operating with the high cost of program administration and with nominal benefits to its participants.
Concluding Thoughts
The past year and half have been both personally and professionally rewarding. The opportunity to live and work abroad and to become immersed in another culture has been gratifying. Professionally, it’s been eye-opening to see the limits of my previous experiences and need to recalibrate my core assumptions and thinking.
I maintain that the United States absolutely has the best mortgage finance system in the world. The ability of our secondary markets to provide consumers with low mortgage rates and a 30-yr fixed rate mortgage has no match in the world. The modern U.S. mortgage market, with its century of history and supportive policy decisions, has the luxury of scale, government guarantees and depth of investor classes.
Saudi Arabia’s own mortgage solutions are mostly a result of necessity. For the country, it has been more important to build a stable and well-capitalized banking system—and then to provide affordable mortgage products and terms. Think of it in terms of airline safely instructions—secure your own oxygen mask first, and then take care of your children.
Housing finance systems aren’t like building smart phone networks. You can’t just import the technology and billing systems and flip a switch. It’s a long-cycle development that requires the legal systems, regulatory framework and entities and a mature finance industry before you can start contemplating and building a secondary market.
As I reflect on my experiences in Saudi Arabia, I would describe the role I have played as that of an intermediary—applying proven “best in class” secondary market and risk management approaches I learned at home to Saudi Arabia. And then trying to understand their limits and coming up with Plan B. And sometimes Plan C…
[1] Competition has not prompted an expansion of the credit box, as lenders are generally risk averse and their regulators are hyper diligent on credit standards.
MDM to the Rescue for Financial Institutions
Data becomes an asset only when it is efficiently harnessed and managed. Because firms tend to evolve into silos, their data often gets organized that way as well, resulting in multiple references and unnecessary duplication of data that dilute its value. Master Data Management (MDM) architecture helps to avoid these and other pitfalls by applying best practices to maximize data efficiency, controls, and insights.
MDM has particular appeal to banks and other financial institutions where non-integrated systems often make it difficult to maintain a comprehensive, 360-degree view of a customer who simultaneously has, for example, multiple deposit accounts, a mortgage, and a credit card. MDM provides a single, common data reference across systems that traditionally have not communicated well with each other. Customer-level reports can point to one central database instead of searching for data across multiple sources.
Financial institutions also derive considerable benefit from MDM when seeking to comply with regulatory reporting requirements and when generating reports for auditors and other examiners. Mobile banking and the growing number of new payment mechanisms make it increasingly important for financial institutions to have a central source of data intelligence. An MDM strategy enables financial institutions to harness their data and generate more meaningful insights from it by:
- Eliminating data redundancy and providing one central repository for common data;
- Cutting across data “silos” (and different versions of the same data) by providing a single source of truth;
- Streamlining compliance reporting (through the use of a common data source);
- Increasing operational and business efficiency;
- Providing robust tools to secure and encrypt sensitive data;
- Providing a comprehensive 360-degree view of customer data;
- Fostering data quality and reducing the risks associated with stale or inaccurate data, and;
- Reducing operating costs associated with data management.
Not surprisingly, there’s a lot to think about when contemplating and implementing a new MDM solution. In this post, we lay out some of the most important things for financial institutions to keep in mind.
MDM Choice and Implementation Priorities
MDM is only as good as the data it can see. To this end, the first step is to ensure that all of the institution’s data owners are on board. Obtaining management buy-in to the process and involving all relevant stakeholders is critical to developing a viable solution. This includes ensuring that everyone is “speaking the same language”—that everyone understands the benefits related to MDM in the same way—and establishing shared goals across the different business units.
Once all the relevant parties are on board, it’s important to identify the scope of the business process within the organization that needs data refinement through MDM. Assess the current state of data quality (including any known data issues) within the process area. Then, identify all master data assets related to the process improvement. This generally involves identifying all necessary data integration for systems of record and the respective subscribing systems that would benefit from MDM’s consistent data. The selected MDM solution should be sufficiently flexible and versatile that it can govern and link any sharable enterprise data and connect to any business domain, including reference data, metadata and any hierarchies.
An MDM “stewardship team” can add value to the process by taking ownership of the various areas within the MDM implementation plan. MDM is just not about technology itself but also involves business and analytical thinking around grouping data for efficient usage. Members of this team need to have the requisite business and technical acumen in order for MDM implementation to be successful. Ideally this team would be responsible for identifying data commonalities across groups and laying out a plan for consolidating them. Understanding the extent of these commonalities helps to optimize architecture-related decisions.
Architecture-related decisions are also a function of how the data is currently stored. Data stored in heterogeneous legacy systems calls for a different sort of MDM solution than does a modern data lake architecture housing big data. The solutions should be sufficiently flexible and scalable to support future growth. Many tools in the marketplace offer MDM solutions. Landing on the right tool requires a fair amount of due diligence and analysis. The following evaluation criteria are often helpful:
- Enterprise Integration: Seamless integration into the existing enterprise set of tools and workflows is an important consideration for an MDM solution. Solutions that require large-scale customization efforts tend to carry additional hidden costs.
- Support for Multiple Devices: Because modern enterprise data must by consumable by a variety of devices (e.g., desktop, tablet and mobile) the selected MDM architecture must support each of these platforms and have multi-device access capability.
- Cloud and Scalability: With most of today’s technology moving to the cloud, an MDM solution must be able to support a hybrid environment (cloud as well as on-premise). The architecture should be sufficiently scalable to accommodate seasonal and future growth.
- Security and Compliance: With cyber-attacks becoming more prevalent and compliance and regulatory requirements continuing to proliferate, the MDM architecture must demonstrate capabilities in these areas.
Start Small; Build Gradually; Measure Success
MDM implementation can be segmented into small, logical projects based on business units or departments within an organization. Ideally, these projects should be prioritized in such a way that quick wins (with obvious ROI) can be achieved in problem areas first and then scaling outward to other parts of the organization. This sort of stepwise approach may take longer overall but is ultimately more likely to be successful because it demonstrates success early and gives stakeholders confidence about MDM’s benefits.
The success of smaller implementations is easier to measure and see. A small-scale implementation also provides immediate feedback on the technology tool used for MDM—whether it’s fulfilling the needs as envisioned. The larger the implementation, the longer it takes to know whether the process is succeeding or failing and whether alternative tools should be pursued and adopted. The success of the implementation can be measured using the following criteria:
- Savings on data storage—a result of eliminating data redundancy.
- Increased ease of data access/search by downstream data consumers.
- Enhanced data quality—a result of common data centralization.
- More compact data lineage across the enterprise—a result of standardizing data in one place.
Practical Case Studies
RiskSpan has helped several large banks consolidate multiple data stores across different lines of business. Our MDM professionals work across heterogeneous data sets and teams to create a common reference data architecture that eliminates data duplication, thereby improving data efficiency and reducing redundant data. These professionals have accomplished this using a variety of technologies, including Informatica, Collibra and IBM Infosphere.
Any successful project begins with a survey of the current data landscape and an assessment of existing solutions. Working collaboratively to use this information to form the basis of an approach for implementing a best-practice MDM strategy is the most likely path to success.
Making Data Dictionaries Beautiful Using Graph Databases
Most analysts estimate that for a given project well over half of the time is spent on collecting, transforming, and cleaning data in preparation for analysis. This task is generally regarded as one of the least appetizing portions of the data analysis process and yet it is the most crucial, as trustworthy analyses are borne out of clean, reliable data. Gathering and preparing data for analysis can be either enhanced or hindered based on the data management practices in place at a firm. When data are readily available, clearly defined, and well documented it will lead to faster and higher-quality insights. As the size and variability of data grows, however, so too does the challenge of storing and managing it. Like many firms, RiskSpan manages a multitude of large, complex datasets with varying degrees of similarity and connectedness. To streamline the analysis process and improve the quantity and quality of our insights, we have made our datasets, their attributes, and relationships transparent and quickly accessible using graph database technology. Graph databases differ significantly from traditional relational databases because data are not stored in tables. Instead, data are stored in either a node or a relationship (also called an edge), which is a connection between two nodes. The image below contains a grey node labeled as a dataset and a blue node labeled as a column. The line connecting these two nodes is a relationship which, in this instance, signifies that the dataset contains the column.
There are many advantages to this data structure including decreased redundancy. Rather than storing the same “Column1” in multiple tables for each dataset that contain it (as you would in a relational database), you can simply create more relationships between the datasets demonstrated below:
With this flexible structure it is possible to create complex schema that remain visually intuitive. In the image below the same grey (dataset) -contains-> blue (column) format is displayed for a large collection of datasets and columns. Even at such a high level, the relationships between datasets and columns reveal patterns about the data. Here are three quick observations:
- In the top right corner there is a dataset with many unique columns.
- There are two datasets that share many columns between them and have limited connectivity to the other datasets.
- Many ubiquitous columns have been pulled to the center of the star pattern via the relationships to the multiple datasets on the outer rim.
In addition to containing labels, nodes can store data as key-value pairs. The image below displays the column “orig_upb” from dataset “FNMA_LLP”, which is one of Fannie Mae’s public datasets that is available on RiskSpan’s Edge Platform. Hovering over the column node displays some information about it, including the name of the field in the RiskSpan Edge platform, its column type, format, and data type.
Relationships can also store data in the same key-value format. This is an incredibly useful property which, for the database in this example, can be used to store information specific to a dataset and its relationship to a column. One of the ways in which RiskSpan has utilized this capability is to hold information pertinent to data normalization in the relationships. To make our datasets easier to analyze and combine, we have normalized the formats and values of columns found in multiple datasets. For example, the field “loan_channel” has been mapped from many unique inputs across datasets to a set of standardized values. In the images below, the relationships between two datasets and loan_channel are highlighted. The relationship key-value pairs contain a list of “mapped_values” identifying the initial values from the raw data that have been transformed. The dataset on the left contains the list: [“BROKER”, “CORRESPONDENT”, “RETAIL”]
While the dataset on the right contains: [“R”, “B”, “C”, “T”, “9”]
We can easily merge these lists with a node containing a map of all the recognized enumerations for the field. This central repository of truth allows us to deploy easy and robust changes to the ETL processes for all datasets. It also allows analysts to easily query information related to data availability, formats, and values.
In addition to queries specific to a column, this structure allows an analyst to answer questions about data availability across datasets with ease. Normally, comparing pdf data dictionaries, excel worksheets, or database tables can be a painstaking process. Using the graph database, however, a simple query can return the intersection of three datasets as shown below. The resulting graph is easy to analyze and use to define the steps required to obtain and manipulate the data.
In addition to these benefits for analysts and end users, utilizing graph database technology for data management comes with benefits from a data governance perspective. Within the realm of data stewardship, ownership and accountability of datasets can be assigned and managed within a graph database like the one in this blog. The ability to store any attribute in a node and create any desired relationship makes it simple to add nodes representing data owners and curators connected to their respective datasets.
The ease and transparency with which any data related information can be stored makes graph databases very attractive. Graph databases can also support a nearly infinite number of nodes and relationships while also remaining fast. While every technology has a learning curve, the intuitive nature of graphs combined with their flexibility makes them an intriguing and viable option for data management.


