Automate Your Data Normalization and Validation Processes
Robotic Process Automation (RPA) is the solution for automating mundane, business-rule based processes so that organizations high value business users can be deployed to more valuable work.
McKinsey defines RPA as “software that performs redundant tasks on a timed basis and ensures that they are completed quickly, efficiently, and without error.” RPA has enormous savings potential. In RiskSpan’s experience, RPA reduces staff time spent on the target-state process by an average of 95 percent. On recent projects, RiskSpan RPA clients on average saved more than 500 staff hours per year through simple automation. That calculation does not include the potential additional savings gained from the improved accuracy of source data and downstream data-driven processes, which greatly reduces the need for rework.
The tedious, error-ridden, and time-consuming process of data normalization is familiar to almost all organizations. Complex data systems and downstream analytics are ubiquitous in today’s workplace. Staff that are tasked with data onboarding must verify that source data is complete and mappable to the target system. For example, they might ensure that original balance is expressed as dollar currency figures or that interest rates are expressed as percentages with three decimal places.
Effective data visualizations sometimes require additional steps, such as adding calculated columns or resorting data according to custom criteria. Staff must match the data formatting requirements with the requirements of the analytics engine and verify that the normalization allows the engine to interact with the dataset. When completed manually, all of these steps are susceptible to human error or oversight. This often results in a need for rework downstream and even more staff hours.
Recently, a client with a proprietary datastore approached RiskSpan with the challenge of normalizing and integrating irregular datasets to comply with their data engine. The non-standard original format and the size of the data made normalization difficult and time consuming.
After ensuring that the normalization process was optimized for automation, RiskSpan set to work automating data normalization and validation. Expert data consultants automated the process of restructuring data in the required format so that it could be easily ingested by the proprietary engine.
Our consultants built an automated process that normalized and merged disparate datasets, compared internal and external datasets, and added calculated columns to the data. The processed dataset was more than 100 million loans, and more than 4 billion records. To optimize for speed, our team programmed a highly resilient validation process that included automated validation checks, error logging (for client staff review) and data correction routines for post-processing and post-validation.
This custom solution reduced time spent onboarding data from one month of staff work down to two days of staff work. The end result is a fully–functional, normalized dataset that can be trusted for use with downstream applications.
RiskSpan’s experience automating routine business processes reduced redundancies, eliminated errors, and saved staff time. This solution reduced resources wasted on rework and its associated operational risk and key-person dependencies. Routine tasks were automated with customized validations. This customization effectively eliminated the need for staff intervention until certain error thresholds were breached. The client determined and set these thresholds during the design process.
RiskSpan data and analytics consultants are experienced in helping clients develop robotic process automation solutions for normalizing and aggregating data, creating routine, reliable data outputs, executing business rules, and automating quality control testing. Automating these processes addresses a wide range of business challenges and is particularly useful in routine reporting and analysis.
Talk to RiskSpan today about how custom solutions in robotic process automation can save time and money in your organization.

All of these features are built into RS Edge, a cloud-native, data and analytics platform for loans and securities. The RS Edge user interface is accessible via any web browser, and the processing engine is accessible via an application programming interface (API). Accessing RS Edge via the API allows access to the full functionality of the platform, with direct integration into existing workflows in legacy systems such as Excel, Python, and R. To tailor RS Edge to the specific needs of a CRT investor, RiskSpan is rolling out a series of Excel tools, built using our APIs, which allow for powerful loan-level analysis from the tool everyone knows and loves. Accessing RS Edge via our new Excel templates, users can:
The images are examples of a RiskSpan template for CRT deal comparison: profile comparison, loan credit score distribution, and delinquency performance for five Agency credit risk transfer deals, pulled via the RiskSpan Data API and rendered in Excel. ______________________________________________________________________________________________



[1] Commercial real estate [2] Commercial and industrial loans To help customers choose their performance estimation methods, we walk them through the decision tree shown in Figure 3. These steps to select a performance estimation method should be followed for each portfolio segment, one at a time. As shown, the first step to shorten the menu of methods is to choose between Practical Methods and Premier Methods. Premier Methods available today in the RS Edge Platform include both methods built by RiskSpan (prefixed RS) and methods built by our partner, Global Market Intelligence (S&P). The choice between Premier Methods and Practical Methods is primarily a tradeoff between instrument-level precision and scientific incorporation of macroeconomic scenarios on the Premier side versus lower operational costs on the Practical side. Because Premier Models produce instrument-specific forecasts, they can be leveraged to accelerate and improve credit screening and pricing decisions in addition to solving CECL. The results of Premier Methods reflect macroeconomic outlook using consensus statistical techniques, whereas Practical Methods generate average, segment-level historical performance that management then adjusts via Q-Factors. Such adjustments may not withstand the intense audit and regulatory scrutiny that larger institutions face. Also, implicit in instrument-level precision and scientific macroeconomic conditioning is that Premier Methods are built on large-count, multi-cycle, granular performance datasets. While there are Practical Methods that reference third-party data like Call Reports, Call Report data represents a shorter economic period and lacks granularity by credit attributes. The Practical Methods have two advantages. First, they easier for non-technical stakeholders to understand. Secondly, license fees for Premier Methods are lower than for Practical Methods. Suppose that for a particular asset class, an institution wants a Premium Method. For most asset classes, RiskSpan’s CECL Module selectively features one Premier Method, as shown Figure 1. In cases where the asset class is not covered by a Premier Method in Edge, the next question becomes: does a suitable, affordable vendor model exist? We are familiar with many models in the marketplace, and can advise on the benefits, drawbacks, and pricing of each. Vendor models come with explanatory documentation that institutions can review pre-purchase to determine comfort. Where a viable vendor model exists, we assist institutions by integrating that model as a new Premier Method, accessible within their CECL workflow. Where no viable vendor model exists, institutions must evaluate their internal historical performance data. Does it contain
[3] Denotes fields required to perform method with customer’s historical performance data. If the customer’s data lacks the necessary fields, alternatively this method can be performed using Call Report data. Figure 3 – Methodology Selection Framework
Selecting Your Allowance Calculation After selecting a performance estimation method for each portfolio segment, we must select our corresponding allowance calculations. Note that all performance estimation methods in RS Edge generate, among their outputs, undiscounted expected credit losses of amortized cost. Therefore, users can elect the non-DCF allowance calculation for any portfolio segment regardless of the performance estimation method. Figure 5 shows this. A DCF allowance calculation requires the elements shown in Figure 4. Among the Premier (performance estimation) Methods, RS Resi, RS RMBS, and RS Structured Finance require contractual features as inputs and generate among their outputs the other elements of a DCF allowance calculation. Therefore, users can elect the DCF allowance calculation in combination with any of these methods without providing additional inputs or assumptions. For these methods, the choice between the DCF and non-DCF allowance calculation often comes down to anticipated
Figure 5 – Allowance Calculations Compatible with Each Performance Estimation Method Once you have selected a performance estimation method and allowance calculation method for each segment, you can begin the next phase of comparing modeled results to expectations and historical performance and tuning model settings accordingly and management inputs accordingly. We are available to discuss CECL methodology further with you; don’t hesitate to get in touch!
