Sign In
Get Started
Articles Tagged with: Data Governance

Webinar: Using Machine Learning in Whole Loan Data Prep

webinar

Using Machine Learning in Whole Loan Data Prep

Tackle one of your biggest obstacles: Curating and normalizing multiple, disparate data sets.

Learn from RiskSpan experts:

  • How to leverage machine learning to help streamline whole loan data prep
  • Innovative ways to manage the differences in large data sets
  • How to automate ‘the boring stuff’


About The Hosts

LC Yarnelle

Director – RiskSpan

LC Yarnelle is a Director with experience in financial modeling, business operations, requirements gathering and process design. At RiskSpan, LC has worked on model validation and business process improvement/documentation projects. He also led the development of one of RiskSpan’s software offerings, and has led multiple development projects for clients, utilizing both Waterfall and Agile frameworks.  Prior to RiskSpan, LC was as an analyst at NVR Mortgage in the secondary marketing group in Reston, VA, where he was responsible for daily pricing, as well as on-going process improvement activities.  Before a career move into finance, LC was the director of operations and a minority owner of a small business in Fort Wayne, IN. He holds a BA from Wittenberg University, as well as an MBA from Ohio State University. 

Matt Steele

Senior Analyst – RiskSpan

LC Yarnelle is a Director with experience in financial modeling, business operations, requirements gathering and process design. At RiskSpan, LC has worked on model validation and business process improvement/documentation projects. He also led the development of one of RiskSpan’s software offerings, and has led multiple development projects for clients, utilizing both Waterfall and Agile frameworks.  Prior to RiskSpan, LC was as an analyst at NVR Mortgage in the secondary marketing group in Reston, VA, where he was responsible for daily pricing, as well as on-going process improvement activities.  Before a career move into finance, LC was the director of operations and a minority owner of a small business in Fort Wayne, IN. He holds a BA from Wittenberg University, as well as an MBA from Ohio State University. 


Automate Your Data Normalization and Validation Processes

Robotic Process Automation (RPA) is the solution for automating mundane, business-rule based processes so that organizations high value business users can be deployed to more valuable work. 

McKinsey defines RPA as “software that performs redundant tasks on a timed basis and ensures that they are completed quickly, efficiently, and without error.” RPA has enormous savings potential. In RiskSpan’s experience, RPA reduces staff time spent on the target-state process by an average of 95 percent. On recent projects, RiskSpan RPA clients on average saved more than 500 staff hours per year through simple automation. That calculation does not include the potential additional savings gained from the improved accuracy of source data and downstream data-driven processes, which greatly reduces the need for rework. 

The tedious, error-ridden, and time-consuming process of data normalization is familiar to almost all organizations. Complex data systems and downstream analytics are ubiquitous in today’s workplace. Staff that are tasked with data onboarding must verify that source data is complete and mappable to the target system. For example, they might ensure that original balance is expressed as dollar currency figures or that interest rates are expressed as percentages with three decimal places. 

Effective data visualizations sometimes require additional steps, such as adding calculated columns or resorting data according to custom criteria. Staff must match the data formatting requirements with the requirements of the analytics engine and verify that the normalization allows the engine to interact with the dataset. When completed manually, all of these steps are susceptible to human error or oversight. This often results in a need for rework downstream and even more staff hours. 

Recently, a client with a proprietary datastore approached RiskSpan with the challenge of normalizing and integrating irregular datasets to comply with their data engine. The non-standard original format and the size of the data made normalization difficult and time consuming. 

After ensuring that the normalization process was optimized for automation, RiskSpan set to work automating data normalization and validation. Expert data consultants automated the process of restructuring data in the required format so that it could be easily ingested by the proprietary engine.  

Our consultants built an automated process that normalized and merged disparate datasets, compared internal and external datasets, and added calculated columns to the data. The processed dataset was more than 100 million loans, and more than 4 billion recordsTo optimize for speed, our team programmed a highly resilient validation process that included automated validation checks, error logging (for client staff review) and data correction routines for post-processing and post-validation. 

This custom solution reduced time spent onboarding data from one month of staff work down to two days of staff work. The end result is a fullyfunctional, normalized dataset that can be trusted for use with downstream applications. 

RiskSpan’s experience automating routine business processes reduced redundancies, eliminated errors, and saved staff time. This solution reduced resources wasted on rework and its associated operational risk and key-person dependencies. Routine tasks were automated with customized validations. This customization effectively eliminated the need for staff intervention until certain error thresholds were breached. The client determined and set these thresholds during the design process. 

RiskSpan data and analytics consultants are experienced in helping clients develop robotic process automation solutions for normalizing and aggregating data, creating routine, reliable data outputsexecuting business rules, and automating quality control testing. Automating these processes addresses a wide range of business challenges and is particularly useful in routine reporting and analysis. 

Talk to RiskSpan today about how custom solutions in robotic process automation can save time and money in your organization. 


Case Study: Datamart Design and Build

The Client

Government Sponsored Enterprise (GSE)

The Problem

A GSE needed a centralized data solution for its forecasting process which involved cross-functional teams from different business lines (Single Family, Multi Family, Capital Markets).​

The client also needed a cloud-based data warehouse to host forecasting outputs for reporting purpose with a faster querying and processing speed.​

The input and output files and datasets came from different sources and/or in different formats. Analysis and transformation were required prior to designing, developing and loading tables. The client was also migrating data from legacy data sources to new datamarts. 

The Solution

RiskSpan was engaged to build and maintain a new centralized datamart (in both Oracle and Amazon Web Services) for the client’s revenue and loss forecasting processes. This included data modeling, historical data upload as well as the monthly recurring data process.

The Deliverables

  • Analyzed the end-to-end data flow and data elements​
  • Designed data models satisfying business requirements​
  • Processed and mapped forecasting input and output files​
  • Migrated data from legacy databases to the new sources ​
  • Built an Oracle datamart and a cloud-based data warehouse (Amazon Web Services) ​
  • Led development team to develop schemas, tables and views, process scripts to maintain data updates and table partitioning logic​
  • Resolved data issues with the source and assisted in reconciliation of results

Case Study: ETL Solutions

The Client

Government Sponsored Enterprise (GSE)

The Problem

The client needed ETL solutions for handling data of any complexity or size in a variety of formats and/or from different upstream sources.​

The client’s data management team extracted and processed data from different sources and different types of databases (e.g. Oracle, Netezza, Excel files, SAS datasets, etc.), and needed to load into its Oracle and AWS datamarts for it’s revenue and loss forecasting processes. ​

The client’s forecasting process used very complex large-scale datasets in different formats which needed to be consumed and loaded in an automated and timely manner.

The Solution

RiskSpan was engaged to design, develop and implement ETL (Extract, Transform and Load) solutions for handling input and output data for the client’s revenue and loss forecasting processes. This included dealing with large volumes of data and multiple source systems, transforming and loading data to and from data marts and data ware houses.

The Deliverables

  • Analyzed data sources and developed ETL strategies for different data types and sources​
  • Performed source target mapping in support of report and warehouse technical designs​
  • Implemented business-driven requirements using Informatica ​
  • Collaborated with cross-functional business and development teams to document ETL requirements and turn them into ETL jobs ​
  • Optimized, developed, and maintained integration solutions as necessary to connect legacy data stores and the data warehouses

Case Study: Web Based Data Application Build

The Client

Government Sponsored Enterprise (GSE)

The Problem

The Structured Transactions group of a GSE needed to offer a simpler way for broker-dealers to  create new restructured securities (improved ease of use), that provided flexibility to do business at any hour and reduce the dependence on Structured Transactions team members’ availability. 

The Solution

RiskSpan led the development of a customer-facing web-based application for a GSE. Their structured transactions clients use the application to independently create pools of pools and re-combinable REMIC exchanges (RCRs) with existing pooling and pricing requirements.​

RiskSpan delivered the complete end-to-end technical implementation of the new portal.

The Deliverables

  • Development included self-service web portal that provides RCR, pool-of-pool exchange capabilities, reporting features ​
  • Managed data flows from various internal sources to the portal, providing real-time calculations​
  • Latest technology stack included Angular 2.0, Java for web services​
  • Development, testing, and config control methodology featured DevOps practices, CI/CD pipeline, 100% automated testing with Cucumber, Selenium​
  • GIT, JIRA, Gherkin, Jenkins, Fisheye/Crucible, SauceLabs, for config control, testing, deployment

Case Study: Web Based Data Application Build

The Client

GOVERNMENT SPONSORED ENTERPRISE (GSE)

The Problem

The Structured Transactions group of a GSE needed to offer a simpler way for broker-dealers to  create new restructured securities (improved ease of use), that provided flexibility to do business at any hour and reduce the dependence on Structured Transactions team members’ availability. 


The Solution

RiskSpan led the development of a customer-facing web-based application for a GSE. Their structured transactions clients use the application to independently create pools of pools and re-combinable REMIC exchanges (RCRs) with existing pooling and pricing requirements.​

RiskSpan delivered the complete end-to-end technical implementation of the new portal.


The Deliverables

  • Development included self-service web portal that provides RCR, pool-of-pool exchange capabilities, reporting features ​
  • Managed data flows from various internal sources to the portal, providing real-time calculations​
  • Latest technology stack included Angular 2.0, Java for web services​
  • Development, testing, and config control methodology featured DevOps practices, CI/CD pipeline, 100% automated testing with Cucumber, Selenium​
  • GIT, JIRA, Gherkin, Jenkins, Fisheye/Crucible, SauceLabs, for config control, testing, deployment

Case Study: Loan-Level Capital Reporting Environment​

The Client

Government Sponsored Enterprise (GSE)

The Problem

A GSE and large mortgage securitizer maintained data from multiple work streams in several disparate systems, provided at different frequencies. Quarterly and ad-hoc data aggregation, consolidation, reporting and analytics required a significant amount of time and personnel hours. ​

The client desired configurable integration with source systems, automated acquisition of over 375 million records and performance improvements in report development.

The Solution

The client engaged RiskSpan Consulting Services to develop a reporting environment backed by an ETL Engine to automate data acquisition from multiple sources. 

The Deliverables

  • Reviewed system architecture, security protocol, user requirements and data dictionaries to determine feasibility and approach.​
  • Developed a user-configurable ETL Engine, developed in Python, to load data from different sources into a PostgreSQL data repository hosted on Linux server. The engine provides real-time status updates and error tracking.​
  • Developed the reporting module of the ETL Engine in Python to automatically generate client-defined Excel reports, reducing report development time from days to minutes​
  • Made raw and aggregated data available for internal users to connect virtually any reporting tool, including Python, R, Tableau and Excel​
  • Developed a user interface, leveraging the API exposed by the ETL Engine, allowing users to create and schedule jobs as well as stand up user-controlled reporting environments​

RiskSpan Edge Platform API

The RiskSpan Edge Platform API enables direct access to all data from the RS Edge Platform. This includes both aggregate analytics and loan-and pool-level data.  Standard licensed users may build queries in our browser-based graphical interface. But, our API is a channel for power users with programming skills (Python, R, even Excel) and production systems that are incorporating RS Edge Platform components as part of their Service Oriented Architecture (SOA).

Watch RiskSpan Director LC Yarnelle explain the Edge API in this video!

get a demo


Data-as-a-Service – Credit Risk Transfer Data

Watch RiskSpan Managing Director Janet Jozwik explain our recent Credit Risk Transfer data (CRT) additions to the RS Edge Platform.

Each dataset has been normalized to the same standard for simpler analysis in RS Edge, enabling users to compare GSE performance with just a few clicks. The data has also been enhanced to include helpful variables, such as mark-to-market loan-to-value ratios based on the most granular house price indexes provided by the Federal Housing Finance Agency. 

get a demo


   

Company

Products

Security & Compliance

Sign In
Get Started