Get Started
Articles Tagged with: Data Governance

Is Free Public Data Worth the Cost?

No such thing as a free lunch.

The world is full of free (and semi-free) datasets ripe for the picking. If it’s not going to cost you anything, why not supercharge your data and achieve clarity where once there was only darkness?

But is it really not going to cost you anything? What is the total cost of ownership for a public dataset, and what does it take to distill truly valuable insights from publicly available data? Setting aside the reliability of the public source (a topic for another blog post), free data is anything but free. Let us discuss both the power and the cost of working with public data.

To illustrate the point, we borrow from a classic RiskSpan example: anticipating losses to a portfolio of mortgage loans due to a hurricane—a salient example as we are in the early days of the 2020 hurricane season (and the National Oceanic and Atmospheric Administration (NOAA) predicts a busy one). In this example, you own a portfolio of loans and would like to understand the possible impacts to that portfolio (in terms of delinquencies, defaults, and losses) of a recent hurricane. You know this will likely require an external data source because you do not work for NOAA, your firm is new to owning loans in coastal areas, and you currently have no internal data for loans impacted by hurricanes.

Know the Data.

The first step in using external data is understanding your own data. This may seem like a simple task. But data, its source, its lineage, and its nuanced meaning can be difficult to communicate inside an organization. Unless you work with a dataset regularly (i.e., often), you should approach your own data as if it were provided by an external source. The goal is a full understanding of the data, the data’s meaning, and the data’s limitations, all of which should have a direct impact on the types of analysis you attempt.

Understanding the structure of your data and the limitations it puts on your analysis involves questions like:

  • What objects does your data track?
  • Do you have time series records for these objects?
  • Do you only have the most recent record? The most recent 12 records?
  • Do you have one record that tries to capture life-to-date information?

Understanding the meaning of each attribute captured in your data involves questions like:

  • What attributes are we tracking?
  • Which attributes are updated (monthly or quarterly) and which remain static?
  • What are the nuances in our categorical variables? How exactly did we assign the zero-balance code?
  • Is original balance the loan’s balance at mortgage origination, or the balance when we purchased the loan/pool?
  • Do our loss numbers include forgone interest?

These same types of questions also apply to understanding external data sources, but the answers are not always as readily available. Depending on the quality and availability of the documentation for a public dataset, this exercise may be as simple as just reading the data dictionary, or as labor intensive as generating analytics for individual attributes, such as mean, standard deviation, mode, or even histograms, to attempt to derive an attribute’s meaning directly from the delivered data. This is the not-free part of “free” data, and skipping this step can have negative consequences for the quality of analysis you can perform later.

Returning to our example, we require at least two external data sets:  

  1. where and when hurricanes have struck, and
  2. loan performance data for mortgages active in those areas at those times.

The obvious choice for loan performance data is the historical performance datasets from the GSEs (Fannie Mae and Freddie Mac). Providing monthly performance information and loss information for defaulted loans for a huge sample of mortgage loans over a 20-year period, these two datasets are perfect for our analysis. For hurricanes, some manual effort is required to extract date, severity, and location from NOAA maps like these (you could get really fancy and gather zip codes covered in the landfall area—which, by leaving out homes hundreds of miles away from expected landfall, would likely give you a much better view of what happens to loans actually impacted by a hurricane—but we will stick to state-level in this simple example).

Make new data your own.

So you’ve downloaded the historical datasets, you’ve read the data dictionaries cover-to-cover, you’ve studied historical NOAA maps, and you’ve interrogated your own data teams for the meaning of internal loan data. Now what? This is yet another cost of “free” data: after all your effort to understand and ingest the new data, all you have is another dataset. A clean, well-understood, well-documented (you’ve thoroughly documented it, haven’t you?) dataset, but a dataset nonetheless. Getting the insights you seek requires a separate effort to merge the old with the new. Let us look at a simplified flow for our hurricane example:

  • Subset the GSE data for active loans in hurricane-related states in the month prior to landfall. Extract information for these loans for 12 months after landfall.
  • Bucket the historical loans by the characteristics you use to bucket your own loans (LTV, FICO, delinquency status before landfall, etc.).
  • Derive delinquency and loss information for the buckets for the 12 months after the hurricane.
  • Apply the observed delinquency and loss information to your loan portfolio (bucketed using the same scheme you used for the historical loans).

And there you have it—not a model, but a grounded expectation of loan performance following a hurricane. You have stepped out of the darkness and into the data-driven light. And all using free (or “free”) data!

Hyperbole aside, nothing about our example analysis is easy, but it plainly illustrates the power and cost of publicly available data. The power is obvious in our example: without the external data, we have no basis for generating an expectation of losses after a hurricane. While we should be wary of the impacts of factors not captured by our datasets (like the amount and effectiveness of government intervention after each storm – which does vary widely), the historical precedent we find by averaging many storms can form the basis for a robust and defensible expectation. Even if your firm has had experience with loans in hurricane-impacted areas, expanding the sample size through this exercise bolsters confidence in the outcomes. Generally speaking, the use of public data can provide grounded expectations where there had been only anecdotes.

But this power does come at a price—a price that should be appreciated and factored into the decision whether to use external data in the first place. What is worse than not knowing what to expect after a hurricane? Having an expectation based on bad or misunderstood data. Failing to account for the effort required to ingest and use free data can lead to bad analysis and the temptation to cut corners. The effort required in our example is significant: the GSE data is huge, complicated, and will melt your laptop’s RAM if you are not careful. Turning NOAA PDF maps into usable data is not a trivial task, especially if you want to go deeper than the state level. Understanding your own data can be a challenge. Applying an appropriate bucketing to the loans can make or break the analysis. Not all public datasets present these same challenges, but all public datasets present costs. There simply is no such thing as a free lunch. The returns on free data frequently justify these costs. But they should be understood before unwittingly incurring them.


Webinar: Data Analytics and Modeling in the Cloud – June 24th

On Wednesday, June 24th, at 1:00 PM EDT, join Suhrud Dagli, RiskSpan’s co-founder and chief innovator, and Gary Maier, managing principal of Fintova for a free RiskSpan webinar.

Suhrud and Gary will contrast the pros and cons of analytic solutions native to leading cloud platforms, as well as tips for ensuring data security and managing costs.

Click here to register for the webinar.


Webinar: Using Machine Learning in Whole Loan Data Prep

webinar

Using Machine Learning in Whole Loan Data Prep

Tackle one of your biggest obstacles: Curating and normalizing multiple, disparate data sets.

Learn from RiskSpan experts:

  • How to leverage machine learning to help streamline whole loan data prep
  • Innovative ways to manage the differences in large data sets
  • How to automate ‘the boring stuff’


About The Hosts

LC Yarnelle

Director – RiskSpan

LC Yarnelle is a Director with experience in financial modeling, business operations, requirements gathering and process design. At RiskSpan, LC has worked on model validation and business process improvement/documentation projects. He also led the development of one of RiskSpan’s software offerings, and has led multiple development projects for clients, utilizing both Waterfall and Agile frameworks.  Prior to RiskSpan, LC was as an analyst at NVR Mortgage in the secondary marketing group in Reston, VA, where he was responsible for daily pricing, as well as on-going process improvement activities.  Before a career move into finance, LC was the director of operations and a minority owner of a small business in Fort Wayne, IN. He holds a BA from Wittenberg University, as well as an MBA from Ohio State University. 

Matt Steele

Senior Analyst – RiskSpan

LC Yarnelle is a Director with experience in financial modeling, business operations, requirements gathering and process design. At RiskSpan, LC has worked on model validation and business process improvement/documentation projects. He also led the development of one of RiskSpan’s software offerings, and has led multiple development projects for clients, utilizing both Waterfall and Agile frameworks.  Prior to RiskSpan, LC was as an analyst at NVR Mortgage in the secondary marketing group in Reston, VA, where he was responsible for daily pricing, as well as on-going process improvement activities.  Before a career move into finance, LC was the director of operations and a minority owner of a small business in Fort Wayne, IN. He holds a BA from Wittenberg University, as well as an MBA from Ohio State University. 


Automate Your Data Normalization and Validation Processes

Robotic Process Automation (RPA) is the solution for automating mundane, business-rule based processes so that organizations high value business users can be deployed to more valuable work. 

McKinsey defines RPA as “software that performs redundant tasks on a timed basis and ensures that they are completed quickly, efficiently, and without error.” RPA has enormous savings potential. In RiskSpan’s experience, RPA reduces staff time spent on the target-state process by an average of 95 percent. On recent projects, RiskSpan RPA clients on average saved more than 500 staff hours per year through simple automation. That calculation does not include the potential additional savings gained from the improved accuracy of source data and downstream data-driven processes, which greatly reduces the need for rework. 

The tedious, error-ridden, and time-consuming process of data normalization is familiar to almost all organizations. Complex data systems and downstream analytics are ubiquitous in today’s workplace. Staff that are tasked with data onboarding must verify that source data is complete and mappable to the target system. For example, they might ensure that original balance is expressed as dollar currency figures or that interest rates are expressed as percentages with three decimal places. 

Effective data visualizations sometimes require additional steps, such as adding calculated columns or resorting data according to custom criteria. Staff must match the data formatting requirements with the requirements of the analytics engine and verify that the normalization allows the engine to interact with the dataset. When completed manually, all of these steps are susceptible to human error or oversight. This often results in a need for rework downstream and even more staff hours. 

Recently, a client with a proprietary datastore approached RiskSpan with the challenge of normalizing and integrating irregular datasets to comply with their data engine. The non-standard original format and the size of the data made normalization difficult and time consuming. 

After ensuring that the normalization process was optimized for automation, RiskSpan set to work automating data normalization and validation. Expert data consultants automated the process of restructuring data in the required format so that it could be easily ingested by the proprietary engine.  

Our consultants built an automated process that normalized and merged disparate datasets, compared internal and external datasets, and added calculated columns to the data. The processed dataset was more than 100 million loans, and more than 4 billion recordsTo optimize for speed, our team programmed a highly resilient validation process that included automated validation checks, error logging (for client staff review) and data correction routines for post-processing and post-validation. 

This custom solution reduced time spent onboarding data from one month of staff work down to two days of staff work. The end result is a fullyfunctional, normalized dataset that can be trusted for use with downstream applications. 

RiskSpan’s experience automating routine business processes reduced redundancies, eliminated errors, and saved staff time. This solution reduced resources wasted on rework and its associated operational risk and key-person dependencies. Routine tasks were automated with customized validations. This customization effectively eliminated the need for staff intervention until certain error thresholds were breached. The client determined and set these thresholds during the design process. 

RiskSpan data and analytics consultants are experienced in helping clients develop robotic process automation solutions for normalizing and aggregating data, creating routine, reliable data outputsexecuting business rules, and automating quality control testing. Automating these processes addresses a wide range of business challenges and is particularly useful in routine reporting and analysis. 

Talk to RiskSpan today about how custom solutions in robotic process automation can save time and money in your organization. 


Case Study: Datamart Design and Build

The Client

Government Sponsored Enterprise (GSE)

The Problem

A GSE needed a centralized data solution for its forecasting process which involved cross-functional teams from different business lines (Single Family, Multi Family, Capital Markets).​

The client also needed a cloud-based data warehouse to host forecasting outputs for reporting purpose with a faster querying and processing speed.​

The input and output files and datasets came from different sources and/or in different formats. Analysis and transformation were required prior to designing, developing and loading tables. The client was also migrating data from legacy data sources to new datamarts. 

The Solution

RiskSpan was engaged to build and maintain a new centralized datamart (in both Oracle and Amazon Web Services) for the client’s revenue and loss forecasting processes. This included data modeling, historical data upload as well as the monthly recurring data process.

The Deliverables

  • Analyzed the end-to-end data flow and data elements​
  • Designed data models satisfying business requirements​
  • Processed and mapped forecasting input and output files​
  • Migrated data from legacy databases to the new sources ​
  • Built an Oracle datamart and a cloud-based data warehouse (Amazon Web Services) ​
  • Led development team to develop schemas, tables and views, process scripts to maintain data updates and table partitioning logic​
  • Resolved data issues with the source and assisted in reconciliation of results

Case Study: ETL Solutions

The Client

Government Sponsored Enterprise (GSE)

The Problem

The client needed ETL solutions for handling data of any complexity or size in a variety of formats and/or from different upstream sources.​

The client’s data management team extracted and processed data from different sources and different types of databases (e.g. Oracle, Netezza, Excel files, SAS datasets, etc.), and needed to load into its Oracle and AWS datamarts for it’s revenue and loss forecasting processes. ​

The client’s forecasting process used very complex large-scale datasets in different formats which needed to be consumed and loaded in an automated and timely manner.

The Solution

RiskSpan was engaged to design, develop and implement ETL (Extract, Transform and Load) solutions for handling input and output data for the client’s revenue and loss forecasting processes. This included dealing with large volumes of data and multiple source systems, transforming and loading data to and from data marts and data ware houses.

The Deliverables

  • Analyzed data sources and developed ETL strategies for different data types and sources​
  • Performed source target mapping in support of report and warehouse technical designs​
  • Implemented business-driven requirements using Informatica ​
  • Collaborated with cross-functional business and development teams to document ETL requirements and turn them into ETL jobs ​
  • Optimized, developed, and maintained integration solutions as necessary to connect legacy data stores and the data warehouses

Case Study: Web Based Data Application Build

The Client

Government Sponsored Enterprise (GSE)

The Problem

The Structured Transactions group of a GSE needed to offer a simpler way for broker-dealers to  create new restructured securities (improved ease of use), that provided flexibility to do business at any hour and reduce the dependence on Structured Transactions team members’ availability. 

The Solution

RiskSpan led the development of a customer-facing web-based application for a GSE. Their structured transactions clients use the application to independently create pools of pools and re-combinable REMIC exchanges (RCRs) with existing pooling and pricing requirements.​

RiskSpan delivered the complete end-to-end technical implementation of the new portal.

The Deliverables

  • Development included self-service web portal that provides RCR, pool-of-pool exchange capabilities, reporting features ​
  • Managed data flows from various internal sources to the portal, providing real-time calculations​
  • Latest technology stack included Angular 2.0, Java for web services​
  • Development, testing, and config control methodology featured DevOps practices, CI/CD pipeline, 100% automated testing with Cucumber, Selenium​
  • GIT, JIRA, Gherkin, Jenkins, Fisheye/Crucible, SauceLabs, for config control, testing, deployment

Case Study: Web Based Data Application Build

The Client

GOVERNMENT SPONSORED ENTERPRISE (GSE)

The Problem

The Structured Transactions group of a GSE needed to offer a simpler way for broker-dealers to  create new restructured securities (improved ease of use), that provided flexibility to do business at any hour and reduce the dependence on Structured Transactions team members’ availability. 


The Solution

RiskSpan led the development of a customer-facing web-based application for a GSE. Their structured transactions clients use the application to independently create pools of pools and re-combinable REMIC exchanges (RCRs) with existing pooling and pricing requirements.​

RiskSpan delivered the complete end-to-end technical implementation of the new portal.


The Deliverables

  • Development included self-service web portal that provides RCR, pool-of-pool exchange capabilities, reporting features ​
  • Managed data flows from various internal sources to the portal, providing real-time calculations​
  • Latest technology stack included Angular 2.0, Java for web services​
  • Development, testing, and config control methodology featured DevOps practices, CI/CD pipeline, 100% automated testing with Cucumber, Selenium​
  • GIT, JIRA, Gherkin, Jenkins, Fisheye/Crucible, SauceLabs, for config control, testing, deployment

SOFR, So Good? The Main Anxieties Around the LIBOR Transition

SOFR Replacing LIBOR

The London Interbank Offered Rate (LIBOR) is going away, and the international financial community is working hard to plan for and mitigate risks to make a smooth transition. In the United States, the Federal Reserve’s Alternative Reference Rates Committee (ARRC) has recommended the Secured Overnight Financing Rate (SOFR) as the preferred replacement rate. The New York Fed began publishing SOFR regularly on April 3, 2018. In July 2018, Fannie Mae issued $6 billion in SOFR-denominated securities, leading the way for other institutions who have since followed suit. In November 2018, the Federal Home Loan (FHL) Banks issued $4 billion in debt tied to SOFR. CME Group, a derivatives and futures exchange company, launched 3-month and 1-month SOFR futures contracts in 2018. All of these steps to support liquidity and demonstrate SOFR demand are designed to create a rate more robust than LIBOR—the transaction volume underpinning SOFR rates is around $750 billon daily, compared to USD LIBOR’s estimated $500 million in daily transaction volume. 

USD LIBOR is referenced in an estimated $200 trillion of financial contracts, of which 95 percent is derivatives. However, the remaining cash market is not small. USD LIBOR is referenced in an estimated: $3.4 trillion in business loans, $1.3 trillion in retail mortgages and other consumer loans, $1.8 trillion in floating rate debt, and $1.8 trillion in securitized products. 

The ARRC has held consultations on its recommended fallback language for floating rate notes and syndicated business loans—the responses are viewable on the ARRC website. On December 7, the ARRC published consultations on securitizations and bilateral business loans, which are both open for comment through February 5, 2019.  

Amid the flurry of positive momentum in the transition towards SOFR, anxiety remains that the broader market is not moving quickly enough. ARRC consultations and working groups indicate that these anxieties derive primarily from a few specific points of debate: development of term rates, consistency of contracts, and implementation timing.

Term Rates

Because the SOFR futures market remains immature, term rates cannot be developed without significant market engagement with the newly created futures. The ARRC Paced Transition Plan includes a goal to create a forward-looking reference rate by end-of-year 2021 – just as LIBOR is scheduled to phase out. In the interim, financial institutions must figure out how to build into existing contracts fallback language or amendments that include a viable alternative to LIBOR term rates.  

The nascent SOFR futures market is growing quickly, with December 2018 daily trade volumes at nearly 16,000. However, they pale in comparison to Eurodollar futures volumes, which logged daily averages around 5 million per day at CME Group alone. This puts SOFR on track according to the ARRC plan, but means institutions remain in limbo until the futures market is more mature and term SOFR rates can be developed. 

In July 2018, the Financial Stability Board (FSB) stated their support for employment of term rates primarily in cash markets, while arguing that spreads are tightest in derivative markets focused around overnight risk-free rates (RFRs), which therefore are preferred. An International Swaps and Derivatives Association (ISDA) FAQ document published in September 2018 explained the FSB’s request that “ISDA should develop fallbacks that could be used in the absence of suitable term rates and, in doing so, should focus on calculations based on the overnight RFRs.” This marks a major change, given that derivatives commonly reference 3-month LIBOR, and cash products are dependent on forward-looking term rates. Despite the magnitude of change, transition from LIBOR term rates to an alternative term rate based on limited underlying transactions would be undesirable.

The FSB explained:

Moving the bulk of current exposures referencing term IBOR benchmarks that are not sufficiently anchored in transactions to alternative term rates that also suffer from thin underlying markets would not be effective in reducing risks and vulnerabilities in the financial system. Therefore, the FSB does not expect such RFR-derived term rates to be as robust as the RFRs themselves, and they should be used only where necessary.

In consultation report published December 20, 2018, ISDA stated the overwhelming majority of respondents preference for fallback language with a compounded setting in arrears rate for the adjusted RFR, with a significant and diverse majority preferring the historical mean/median approach for the spread adjustment.

Though ISDA’s consultation report noted some drawbacks to the historical mean/median approach for the spread adjustment, the diversity of supporters – in all regions of the world, representing many types of financial institutions – was a strong indicator of market preference. By comparison, there was no ambiguity about preference for the RFR in fallback language: In almost 90 percent of ISDA respondent rankings, the compounded setting in arrears rate was selected as the top preference for the adjusted RFR. 

In the Structured Finance Industry Group (SFIG) LIBOR Task Force Green Paper, the group indicates strong preference for viable term rates and leaves the question of whether such calculations should be done in advance or in arrears as an open item, while indicating preference for continuing prospectively determining rates at the start of each term. They list their preference for waterfall options as first an endorsed forward-looking term SOFR rate, and second, a compounded or average daily SOFR. SFIG is currently drafting their response to the ARRC Securitization Consultation, which will be made public on the ARRC website after submission. 

Despite stated preferences, working groups are making a concerted effort to follow the ARRC’s guidance to strive for consistency across cash and derivative products. Given the concerns about a viable term rate, some market participants in cash products are also exploring the realities of implementing ISDA’s recommended fallback language and intend to incorporate those considerations into their response to the ARRC consultations. 

In the absence of an endorsed term rate, pricing of other securities such as fixed-rate bonds is difficult, if not impossible. Additionally, the absence of an endorsed term rate creates issues of consistency within the rate itself (i.e., market standards will need to developed around how and over what periods the rate is compounded). The currently predominant recommendation of a compounding in arrears overnight risk-free rate would also have added complexity when compared with any forward-looking rate, which is exacerbated in the cash markets with consumer products where changes must be fully disclosed and explained. Compounding in arrears would require a lock-out period at the end of a term to allow institutions time to calculate the compounded interest. Market standards and consumer agreement around the specific terms governing the lock-out period would be difficult to establish.

Consistency:

While ISDA has not yet completed formal consultation specific to USD LIBOR and SOFR, and their analysis is only applicable to derivatives and swaps, there are several benefits to consistency across cash and derivatives markets. Consistency of contract terms across all asset classes during the transition away from USD LIBOR lowers operational, accounting, legal, and basis risk, according to the ARRC, and makes the change easier to communicate and negotiate with stakeholders.  

Though it is an easy case to make that consistency is advantageous, achieving it is not. For example, the Mortgage Bankers Association points out that the ISDA-selected compounding in arrears approach to interest accrual periods “would be a very material change from current practice as period interest expenses would not be determined until the end of the relevant period.” The nature of the historical mean/median spread adjustment does not come without drawbacks. ISDA’s consultation acknowledges that the approach is “likely to lead to value transfers and potential market disruption by not capturing contemporaneous market conditions at the trigger event, as well as creating potential issues with hedging.” Additionally, respondents acknowledge that relevant data may not yet be available for long lookback periods with the newly created overnight risk-free rates.  

The effort to achieve some level of consistency across the transition away from LIBOR poses several challenges related to timing. Because LIBOR will only be unsupported (rather than definitively discontinued) by the Financial Conduct Authority (FCA) at the end of 2021, some in the market retain a small hope that production of LIBOR rates could continue. The continuation of LIBOR is possible, but betting a portfolio of contracts on its continuation is an unnecessarily high-risk decision. That said, transition plans remain ambiguous about timing, and implementation of any contract changes is ultimately at the sole discretion of the contract holder. Earlier ARRC consultations acknowledged two possible implementation arrangements:   

  1. An “amendment approach,” which would provide a streamlined amendment mechanism for negotiating a replacement benchmark in the future and could serve as an initial step towards adopting a hardwired approach.  
  2. A “hardwired approach,” which would provide market participants with more clarity as to a how a potential replacement rate will be identified and implemented. 

However, the currently open-for-comment securitizations consultation has dropped the “amendment” and “hardwired” terminology and now describes what amounts to the hardwired approach as defined above – a waterfall of options that is implemented upon occurrence of a predefined set of “trigger” events. Given that the securitizations consultation is still open for comment, it remains possible that market respondents will bring the amendment approach back into discussions.  

Importantly, in the U.S. there are currently no legally binding obligations for organizations to plan for the cessation of LIBOR, nor policy governing how that plan be made. In contrast, the European Union has begun to require that institutions submit written plans to governing bodies.

Timing

Because the terms of implementation remain open for discussion and organizational preference, there is some ambiguity about when organizations will begin transitioning contracts away from LIBOR to the preferred risk-free rates. In the structured finance market, this compounds the challenge of consistency with timing. For commercial real estate securities, for example, there is possibility of mismatch in the process and timing of transition for rates in the index and for the underlying assets and resulting certificates or bonds. This potential challenge has not yet been addressed by the ARRC or other advisory bodies.

Mortgage Market

The mortgage market is still awaiting formal guidance. While the contributions by Fannie Mae and the FHLBanks to the SOFR market signal government sponsored entity (GSE) support for the newly selected reference rate, none of the GSEs has issued any commentary about recommended fallback language specific to mortgages or guidance on how to navigate the fact that SOFR does not yet have a viable term rate. An additional concern for consumer loan products, including mortgages, is the need to explain the contract changes to consumers. As a result, the ARRC Securitization consultation hypothesizes that consumer products are “likely to be simpler and involve less optionality and complexity, and any proposals would only be made after wide consultation with consumer advocacy groups, market participants, and the official sector.”  

For now, the Mortgage Bankers Association has recommended institutions develop a preliminary transition plan, beginning with a detailed assessment of exposures to LIBOR.

How can RiskSpan Help?

At any phase in the transition away from LIBOR, RiskSpan can provide institutions with analysts experienced in contract review, experts in model risk management and sophisticated technical tools—including machine learning capabilities—to streamline the process to identify and remediate LIBOR exposure. Our diverse team of professionals is available to deliver resources to financial institutions that will mitigate risks and streamline this forthcoming transition.


Case Study: Loan-Level Capital Reporting Environment​

The Client

Government Sponsored Enterprise (GSE)

The Problem

A GSE and large mortgage securitizer maintained data from multiple work streams in several disparate systems, provided at different frequencies. Quarterly and ad-hoc data aggregation, consolidation, reporting and analytics required a significant amount of time and personnel hours. ​

The client desired configurable integration with source systems, automated acquisition of over 375 million records and performance improvements in report development.

The Solution

The client engaged RiskSpan Consulting Services to develop a reporting environment backed by an ETL Engine to automate data acquisition from multiple sources. 

The Deliverables

  • Reviewed system architecture, security protocol, user requirements and data dictionaries to determine feasibility and approach.​
  • Developed a user-configurable ETL Engine, developed in Python, to load data from different sources into a PostgreSQL data repository hosted on Linux server. The engine provides real-time status updates and error tracking.​
  • Developed the reporting module of the ETL Engine in Python to automatically generate client-defined Excel reports, reducing report development time from days to minutes​
  • Made raw and aggregated data available for internal users to connect virtually any reporting tool, including Python, R, Tableau and Excel​
  • Developed a user interface, leveraging the API exposed by the ETL Engine, allowing users to create and schedule jobs as well as stand up user-controlled reporting environments​

Get Started