RiskSpan Adds Home Equity Conversion Mortgage Data to Edge Platform
ARLINGTON, VA, September 12, 2018 — Leading mortgage data analytics provider RiskSpan added Home Equity Conversion Mortgage (HECM) Data to the library of datasets available through its RS Edge Platform. The dataset includes over half a billion records from Ginnie Mae that will expand the RS Edge Platform’s critical applications in Reverse-Mortgage Analysis. RS Edge is a SaaS platform that integrates normalized data, predictive models and complex scenario analytics for customers in the capital markets, commercial banking, and insurance industries. The Edge Platform solves the hardest data management and analytical problem – affordable off-the-shelf integration of clean data and reliable models.
The HECM dataset is the latest in a series of recent additions to the RS Edge data libraries. The platform now holds over five billion records across decades of collection and is the solution of choice for whole loan and securities analytics. “RiskSpan’s data strategy is simple. Provide our customers with normalized, tested, analysis-ready data that their enterprise modeling and analytics teams can leverage for faster, more reliable insight. We do the grunt work so that you don’t have to,” said Patrick Doherty, RiskSpan’s Chief Operating Officer. The HECM dataset has been subjected to RiskSpan’s comprehensive data normalization process for simpler analysis in RS Edge. Edge users will be able to drill down to snapshot and historical data available through the UI. Users will also be able to benchmark the HECM data against their own portfolio and leverage it to develop and deploy more sophisticated credit models. RiskSpan’s Edge API also makes it easier-than-ever to access large datasets for analytics, model development and benchmarking. Major quant teams that prefer APIs now have access to normalized and validated data to run scenario analytics, stress testing or shock analysis. RiskSpan makes data available through its proprietary instance of RStudio and Python.


















There are many advantages to this data structure including decreased redundancy. Rather than storing the same “Column1” in multiple tables for each dataset that contain it (as you would in a relational database), you can simply create more relationships between the datasets demonstrated below:
With this flexible structure it is possible to create complex schema that remain visually intuitive. In the image below the same grey (dataset) -contains-> blue (column) format is displayed for a large collection of datasets and columns. Even at such a high level, the relationships between datasets and columns reveal patterns about the data. Here are three quick observations:
In addition to containing labels, nodes can store data as key-value pairs. The image below displays the column “orig_upb” from dataset “FNMA_LLP”, which is one of Fannie Mae’s public datasets that is available on RiskSpan’s Edge Platform. Hovering over the column node displays some information about it, including the name of the field in the RiskSpan Edge platform, its column type, format, and data type.
Relationships can also store data in the same key-value format. This is an incredibly useful property which, for the database in this example, can be used to store information specific to a dataset and its relationship to a column. One of the ways in which RiskSpan has utilized this capability is to hold information pertinent to data normalization in the relationships. To make our datasets easier to analyze and combine, we have normalized the formats and values of columns found in multiple datasets. For example, the field “loan_channel” has been mapped from many unique inputs across datasets to a set of standardized values. In the images below, the relationships between two datasets and loan_channel are highlighted. The relationship key-value pairs contain a list of “mapped_values” identifying the initial values from the raw data that have been transformed. The dataset on the left contains the list: [“BROKER”, “CORRESPONDENT”, “RETAIL”]
While the dataset on the right contains: [“R”, “B”, “C”, “T”, “9”]
We can easily merge these lists with a node containing a map of all the recognized enumerations for the field. This central repository of truth allows us to deploy easy and robust changes to the ETL processes for all datasets. It also allows analysts to easily query information related to data availability, formats, and values.
In addition to queries specific to a column, this structure allows an analyst to answer questions about data availability across datasets with ease. Normally, comparing pdf data dictionaries, excel worksheets, or database tables can be a painstaking process. Using the graph database, however, a simple query can return the intersection of three datasets as shown below. The resulting graph is easy to analyze and use to define the steps required to obtain and manipulate the data.
In addition to these benefits for analysts and end users, utilizing graph database technology for data management comes with benefits from a data governance perspective. Within the realm of data stewardship, ownership and accountability of datasets can be assigned and managed within a graph database like the one in this blog. The ability to store any attribute in a node and create any desired relationship makes it simple to add nodes representing data owners and curators connected to their respective datasets.
The ease and transparency with which any data related information can be stored makes graph databases very attractive. Graph databases can also support a nearly infinite number of nodes and relationships while also remaining fast. While every technology has a learning curve, the intuitive nature of graphs combined with their flexibility makes them an intriguing and viable option for data management.
