Linkedin    Twitter   Facebook

Get Started
Get a Demo

Linkedin 

Articles Tagged with: Data Management

RiskSpan’s Snowflake Tutorial Series: Ep. 2

Learn how to use Python User-Defined Functions in Snowflake SQL

Using CPR computation for a pool of mortgage loans as an example, this six-minute tutorial succinctly demonstrates how to:

  1. Query Snowflake data using SQL
  2. Write and execute Python user-defined functions inside Snowflake
  3. Compute CDR using Python UDF inside Snowflake SQL

This is this second in a 10-part tutorial series demonstrating how RiskSpan’s Snowflake integration makes mortgage and structured finance analytics easier than ever before.

Episode 1, Setting Up a Database and Uploading 28 Million Mortgage Loans, is available here.

Future topics will include:

  • External Tables (accessing data without a database)
  • OLAP vs OLTP and hybrid tables in Snowflake
  • Time Travel functionality, clone and data replication
  • Normalizing data and creating a single materialized view
  • Dynamic tables data concepts in Snowflake
  • Data share
  • Data masking
  • Snowpark: Data analysis (pandas) functionality in Snowflake

RiskSpan Adds CRE, C&I Loan Analytics to Edge Platform

ARLINGTON, Va., March 23, 2023 – RiskSpan, a leading technology company and the most comprehensive source for data management and analytics for mortgage and structured products, has announced the addition of commercial real estate (CRE) and commercial and industrial (C&I) loan data intake, valuation, and risk analytics to its award-winning Edge Platform. This enhancement complements RiskSpan’s existing residential mortgage toolbox and provides clients with a comprehensive toolbox for constructing and managing diverse credit portfolios.

Now more than ever, banks and credit portfolio managers need tools to construct well diversified credit portfolios resilient to rate moves and to know the fair market values of their diverse credit assets.

The new support for CRE and C&I loans on the Edge Platform further cements RiskSpan’s position as a single-source provider for loan pricing and risk management analytics across multiple asset classes. The Edge Platform’s AI-driven Smart Mapping (tape cracking) tool lets clients easily work with CRE and C&I loan data from any format. Its forecasting tools let clients flexibly segment loan datasets and apply performance and pricing assumptions by segment to generate cash flows, pricing and risk analytics.

CRE and C&I loans have long been supported by the Edge Platform’s credit loss accounting module, where users provided such loans in the Edge standard data format. The new Smart Mapping support simplifies data intake, and the new support for valuation and risk (including market risk) analytics for these assets makes Edge a complete toolbox for constructing and managing diverse portfolios that include CRE and C&I loans. These tools include cash flow projections with loan-level precision and stress testing capabilities. They empower traders and asset managers to visualize the risks associated with their portfolios like never before and make more informed decisions about their investments.

Comprehensive details of this and other new capabilities are available by requesting a no-obligation demo at riskspan.com.

### 

About RiskSpan, Inc. 

RiskSpan offers cloud-native SaaS analytics for on-demand market risk, credit risk, pricing and trading. With our data science experts and technologists, we are the leader in data as a service and end-to-end solutions for loan-level data management and analytics.

Our mission is to be the most trusted and comprehensive source of data and analytics for loans and structured finance investments. Learn more at www.riskspan.com.


RiskSpan Incorporates Flexible Loan Segmentation into Edge Platform

ARLINGTON, Va., March 3, 2023 — RiskSpan, a leading technology company and the most comprehensive source for data management and analytics for residential mortgage and structured products, has announced the incorporation of Flexible Loan Segmentation functionality into its award-winning Edge Platform.

The new functionality makes Edge the only analytical platform offering users the option of alternating between the speed and convenience of rep-line-level analysis and the unmatched precision of loan-level analytics, depending on the purpose of their analysis.

For years, the cloud-native Edge Platform has stood alone in its ability to offer the computational scale necessary to perform loan-level analyses and fully consider each loan’s individual contribution to a mortgage or MSR portfolio’s cash flows. This level of granularity is of paramount importance when pricing new portfolios, taking property-level considerations into account, and managing tail risks from a credit/servicing cost perspective.

Not every analytical use case justifies the computational cost of a full loan-level analysis, however. For situations where speed requirements dictate the use of rep lines (such as for daily or intra-day hedging needs), the Edge Platform’s new Flexible Loan Segmentation affords users the option to perform valuation and risk analysis at the rep line level.

Analysts, traders and investors take advantage of Edge’s flexible calculation specification to run various rate and HPI scenarios, key rate durations, and other calculation-intensive metrics in an efficient and timely manner. Segment-level results run at both loan and rep line level can be easily compared to assess the impacts of each approach. Individual rep lines are easily rolled up to quickly view results on portfolio subcomponents and on the portfolio as a whole.

Comprehensive details of this and other new capabilities are available by requesting a no-obligation demo at riskspan.com.

This new functionality is the latest in a series of enhancements that further the Edge Platform’s objective of providing frictionless insight to Agency MBS traders and investors, knocking down barriers to efficient, clear and data-driven valuation and risk assessment.

###

About RiskSpan, Inc. 

RiskSpan offers cloud-native SaaS analytics for on-demand market risk, credit risk, pricing and trading. With our data science experts and technologists, we are the leader in data as a service and end-to-end solutions for loan-level data management and analytics. Our mission is to be the most trusted and comprehensive source of data and analytics for loans and structured finance investments. Learn more at www.riskspan.com.


RiskSpan’s Snowflake Tutorial Series: Ep. 1

Learn how to create a new Snowflake database and upload large loan-level datasets

The first episode of RiskSpan’s Snowflake Tutorial Series has dropped!

This six-minute tutorial succinctly demonstrates how to:

  1. Set up a new Snowflake #database
  2. Use SnowSQL to load large datasets (28 million #mortgage loans in this example)
  3. Use internal staging (without a #cloud provider)

This is this first in what is expected to be a 10-part tutorial series demonstrating how RiskSpan’s Snowflake integration makes mortgage and structured finance analytics easier than ever before.

Future topics will include:

  • Executing complex queries using python functions in Snowflake’s SQL
  • External Tables (accessing data without a database)
  • OLAP vs OLTP and hybrid tables in Snowflake
  • Time Travel functionality, clone and data replication
  • Normalizing data and creating a single materialized view
  • Dynamic tables data concepts in Snowflake
  • Data share
  • Data masking
  • Snowpark: Data analysis (pandas) functionality in Snowflake

RiskSpan Unveils New “Reverse ETL” Mortgage Data Mapping and Extract Functionality

ARLINGTON, Va., October 19, 2022 – Subscribers to RiskSpan’s Mortgage Data Management product can now not only leverage machine learning to streamline the intake of loan data from any format, but also define any target format for data extraction and sharing.

A recent enhancement to RiskSpan’s award-winning Edge Platform enables users to take in unformatted datasets from mortgage servicers, sellers and other counterparties and convert them into their preferred data format on the fly for sharing with accounting, client, and other downstream systems.

Analysts, traders, and portfolio managers have long used Edge to take in and store datasets, enabling them to analyze historical performance of custom cohorts using limitless combinations of mortgage loan characteristics and run predictive analytics on segments defined on the fly. With Edge’s novel “Reverse ETL” data extract functionality, these Platform users can now also easily and fully design an export format for exporting their data, creating the functional equivalent of a full integration node for sharing data with literally any system on or off the Edge Platform.   

Market participants tout the revolutionary technology as the end of having to share cumbersome and unformatted CSV files with counterparties. Now, the same smart mapping technology that for years has facilitated the ingestion of mortgage data onto the Edge Platform makes extracting and sharing mortgage data with downstream users just as easy.   

Comprehensive details of this and other new capabilities using RiskSpan’s Edge Platform are available by requesting a no-obligation live demo at riskspan.com.

SCHEDULE A FREE DEMO

This new functionality is the latest in a series of enhancements that is making the Edge Platform’s Data as a Service increasingly indispensable for mortgage loan and MSR traders and investors.

### 

About RiskSpan, Inc. 

RiskSpan is a leading technology company and the most comprehensive source for data management and analytics for residential mortgage and structured products. The company offers cloud-native SaaS analytics for on-demand market risk, credit risk, pricing and trading. With our data science experts and technologists, we are the leader in data as a service and end-to-end solutions for loan-level data management and analytics.

Our mission is to be the most trusted and comprehensive source of data and analytics for loans and structured finance investments.

Rethink loan and structured finance data. Rethink your analytics. Learn more at www.riskspan.com.

Media contact: Timothy Willis

CONTACT US


Optimizing Analytics Computational Processing 

We met with RiskSpan’s Head of Engineering and Development, Praveen Vairavan, to understand how his team set about optimizing analytics computational processing for a portfolio of 4 million mortgage loans using a cloud-based compute farm.

This interview dives deeper into a case study we discussed in a recent interview with RiskSpan’s co-founder, Suhrud Dagli.

Here is what we learned from Praveen. 


Speak to an Expert

Could you begin by summarizing for us the technical challenge this optimization was seeking to overcome? 

PV: The main challenge related to an investor’s MSR portfolio, specifically the volume of loans we were trying to run. The client has close to 4 million loans spread across nine different servicers. This presented two related but separate sets of challenges. 

The first set of challenges stemmed from needing to consume data from different servicers whose file formats not only differed from one another but also often lacked internal consistency. By that, I mean even the file formats from a single given servicer tended to change from time to time. This required us to continuously update our data mapping and (because the servicer reporting data is not always clean) modify our QC rules to keep up with evolving file formats.  

The second challenge relates to the sheer volume of compute power necessary to run stochastic paths of Monte Carlo rate simulations on 4 million individual loans and then discount the resulting cash flows based on option adjusted yield across multiple scenarios. 

And so you have 4 million loans times multiple paths times one basic cash flow, one basic option-adjusted case, one up case, and one down case, and you can see how quickly the workload adds up. And all this needed to happen on a daily basis. 

To help minimize the computing workload, our client had been running all these daily analytics at a rep-line level—stratifying and condensing everything down to between 70,000 and 75,000 rep lines. This alleviated the computing burden but at the cost of decreased accuracy because they couldn’t look at the loans individually. 

What technology enabled you to optimize the computational process of running 50 paths and 4 scenarios for 4 million individual loans?

PV: With the cloud, you have the advantage of spawning a bunch of servers on the fly (just long enough to run all the necessary analytics) and then shutting it down once the analytics are done. 

This sounds simple enough. But in order to use that level of compute servers, we needed to figure out how to distribute the 4 million loans across all these different servers so they can run in parallel (and then we get the results back so we could aggregate them). We did this using what is known as a MapReduce approach. 

Say we want to run a particular cohort of this dataset with 50,000 loans in it. If we were using a single server, it would run them one after the other – generate all the cash flows for loan 1, then for loan 2, and so on. As you would expect, that is very time-consuming. So, we decided to break down the loans into smaller chunks. We experimented with various chunk sizes. We started with 1,000 – we ran 50 chunks of 1,000 loans each in parallel across the AWS cloud and then aggregated all those results.  

That was an improvement, but the 50 parallel jobs were still taking longer than we wanted. And so, we experimented further before ultimately determining that the “sweet spot” was something closer to 5,000 parallel jobs of 100 loans each. 

Only in the cloud is it practical to run 5,000 servers in parallel. But this of course raises the question: Why not just go all the way and run 50,000 parallel jobs of one loan each? Well, as it happens, running an excessively large number of jobs carries overhead burdens of its own. And we found that the extra time needed to manage that many jobs more than offset the compute time savings. And so, using a fair bit of trial and error, we determined that 100-loan jobs maximized the runtime savings without creating an overly burdensome number of jobs running in parallel.  

Get A Demo

You mentioned the challenge of having to manage a large number of parallel processes. What tools do you employ to work around these and other bottlenecks? 

PV: The most significant bottleneck associated with this process is finding the “sweet spot” number of parallel processes I mentioned above. As I said, we could theoretically break it down into 4 million single-loan processes all running in parallel. But managing this amount of distributed computation, even in the cloud, invariably creates a degree of overhead which ultimately degrades performance. 

And so how do we find that sweet spot – how do we optimize the number of servers on the distributed computation engine? 

As I alluded to earlier, the process involved an element of trial and error. But we also developed some home-grown tools (and leveraged some tools available in AWS) to help us. These tools enable us to visualize computation server performance – how much of a load they can take, how much memory they use, etc. These helped eliminate some of the optimization guesswork.   

Is this optimization primarily hardware based?

PV: AWS provides essentially two “flavors” of machines. One “flavor” enables you to take in a lot of memory. This enables you to keep a whole lot of loans in memory so it will be faster to run. The other flavor of hardware is more processor based (compute intensive). These machines provide a lot of CPU power so that you can run a lot of processes in parallel on a single machine and still get the required performance. 

We have done a lot of R&D on this hardware. We experimented with many different instance types to determine which works best for us and optimizes our output: Lots of memory but smaller CPUs vs. CPU-intensive machines with less (but still a reasonably amount of) memory. 

We ultimately landed on a machine with 96 cores and about 240 GB of memory. This was the balance that enabled us to run portfolios at speeds consistent with our SLAs. For us, this translated to a server farm of 50 machines running 70 processes each, which works out to 3,500 workers helping us to process the entire 4-million-loan portfolio (across 50 Monte Carlo simulation paths and 4 different scenarios) within the established SLA.  

What software-based optimization made this possible? 

PV: Even optimized in the cloud, hardware can get pricey – on the order of $4.50 per hour in this example. And so, we supplemented our hardware optimization with some software-based optimization as well. 

We were able to optimize our software to a point where we could use a machine with just 30 cores (rather than 96) and 64 GB of RAM (rather than 240). Using 80 of these machines running 40 processes each gives us 2,400 workers (rather than 3,500). Software optimization enabled us to run the same number of loans in roughly the same amount of time (slightly faster, actually) but using fewer hardware resources. And our cost to use these machines was just one-third what we were paying for the more resource-intensive hardware. 

All this, and our compute time actually declined by 10 percent.  

The software optimization that made this possible has two parts: 

The first part (as we discussed earlier) is using the MapReduce methodology to break down jobs into optimally sized chunks. 

The second part involved optimizing how we read loan-level information into the analytical engine.  Reading in loan-level data (especially for 4 million loans) is a huge bottleneck. We got around this by implementing a “pre-processing” procedure. For each individual servicer, we created a set of optimized loan files that can be read and rendered “analytics ready” very quickly. This enables the loan-level data to be quickly consumed and immediately used for analytics without having to read all the loan tapes and convert them into a format that analytics engine can understand. Because we have “pre-processed” all this loan information, it is immediately available in a format that the engine can easily digest and run analytics on.  

This software-based optimization is what ultimately enabled us to optimize our hardware usage (and save time and cost in the process).  

Contact us to learn more about how we can help you optimize your mortgage analytics computational processing.


Asset Managers Improving Yields With Resi Whole Loans

An unmistakable transformation is underway among asset managers and insurance companies with respect to whole loan investments. Whereas residential mortgage loan investing has historically been the exclusive province of commercial banks, a growing number of other institutional investors – notably life insurance companies and third-party asset managers – have shifted their attention toward this often-overlooked asset class.

Life companies and other asset managers with primarily long-term, risk-sensitive objectives are no strangers to residential mortgages. Their exposure, however, has traditionally been in the form of mortgage-backed securities, generally taking refuge in the highest-rated bonds. Investors accustomed to the AAA and AA tranches may understandably be leery of whole-loan credit exposure. Infrastructure investments necessary for managing a loan portfolio and the related credit-focused surveillance can also seem burdensome. But a new generation of tech is alleviating more of the burden than ever before and making this less familiar and sometimes misunderstood asset class increasingly accessible to a growing cadre of investors.

Maximizing Yield

Following a period of low interest rates, life companies and other investment managers are increasingly embracing residential whole-loan mortgages as they seek assets with higher returns relative to traditional fixed-income investments (see chart below). As highlighted in the chart below, residential mortgage portfolios, on a loss-adjusted basis, consistently outperform other investments, such as corporate bonds, and look increasingly attractive relative to private-label residential mortgage-backed securities as well.

Nearly one-third of the $12 trillion in U.S. residential mortgage debt outstanding is currently held in the form of loans.

And while most whole loans continue to be held in commercial bank portfolios, a growing number of third-party asset managers have entered the fray as well, often on behalf of their life insurance company clients.

Investing in loans introduces a dimension of credit risk that investors do need to understand and manage through thoughtful surveillance practices. As the chart below (generated using RiskSpan’s Edge Platform) highlights, when evaluating yields on a loss-adjusted basis, resi whole loans routinely generate yield.

REQUEST A DEMO OR TRIAL

In addition to higher yields, whole loans investments offer investors other key advantages over securities. Notably:

Data Transparency

Although transparency into private label RMBS has improved dramatically since the 2008 crisis, nothing compares to the degree of loan-level detail afforded whole-loan investors. Loan investors typically have access to complete loan files and therefore complete loan-level datasets. This allows for running analytics based on virtually any borrower, property, or loan characteristic and contributes to a better risk management environment overall. The deeper analysis enabled by loan-level and property-specific information also permits investors to delve into ESG matters and better assess climate risk.

Daily Servicer Updates

Advancements in investor reporting are increasingly granting whole loan investors access to daily updates on their portfolio performance. Daily updating provides investors near real-time updates on prepayments and curtailments as well as details regarding problem loans that are seriously delinquent or in foreclosure and loss mitigation strategies. Eliminating the various “middlemen” between primary servicers and investors (many of the additional costs of securitization outlined below—master servicers, trustees, various deal and data “agents,” etc.—have the added negative effect of adding layers between security investors and the underlying loans) is one of the things that makes daily updates possible.

Lower Transaction Costs

Driven largely by a lack of trust in the system and lack of transparency into the underlying loan collateral, private-label securities investments incur a series of yield-eroding transactions costs that whole-loan investors can largely avoid. Consider the following transaction costs in a typical securitization:

  • Loan Data Agent costs: The concept of a loan data agent is unique to securitization. Data agents function essentially as middlemen responsible for validating the performance of other vendors (such as the Trustee). The fee for this service is avoided entirely by whole loan investors, which generally do not require an intermediary to get regularly updated loan-level data from servicers.
  • Securities Administrator/Custodian/Trustee costs: These roles present yet another layer of intermediary costs between the borrower/servicer and securities investors that are not incurred in whole loan investing.
  • Deal Agent costs: Deal agents are third party vendors typically charged with enhancing transparency in a mortgage security and ensuring that all parties’ interests are protected. The deal agent typically performs a surveillance role and charges investors ongoing annual fees plus additional fees for individual loan file reviews. These costs are not borne by whole loan investors.
  • Due diligence costs: While due diligence costs factor into loan and security investments alike, the additional layers of review required for agency ratings tends to drive these costs higher for securities. While individual file reviews are also required for both types of investments, purchasing loans only from trusted originators allows investors to get comfortable with reviewing a smaller sample of new loans. This can push due diligence costs on loan portfolios to much lower levels when compared to securities.
  • Servicing costs: Mortgage servicing costs are largely unavoidable regardless of how the asset is held. Loan investors, however, tend to have more options at their disposal. Servicing fees for securities vary from transaction to transaction with little negotiating power by the security investors. Further, securities investors incur master servicing fees which is generally not a required function for managing whole loan investments.

Emerging technology is streamlining the process of data cleansing, normalization and aggregation, greatly reducing the operational burden of these processes, particularly for whole loan investors, who can cut out many of these intermediary parties entirely.

Overcoming Operational Hurdles

Much of investor reluctance to delve into loans has historically stemmed from the operational challenges (real and perceived) associated with having to manage and make sense of the underlying mountain of loan, borrower, and property data tied to each individual loan. But forward-thinking asset managers are increasingly finding it possible to offload and outsource much of this burden to cloud-native solutions purpose built to store, manage, and provide analytics on loan-level mortgage data, such as RiskSpan’s Edge Platform supporting loan data management and analytics. RiskSpan solutions make it easy to mine available loan portfolios for profitable sub-cohorts, spot risky loans for exclusion, apply a host of credit and prepay scenario analyses, and parse static and performance data in any way imaginable.

At an increasing number of institutions, demonstrating the power of analytical tools and the feasibility of applying them to the operational and risk management challenges at hand will solve many if not most of the hurdles standing in the way of obtaining asset class approval for mortgage loans. The barriers to access are coming down, and the future is brighter than ever for this fascinating, dynamic and profitable asset class.


RiskSpan a Winner of 2022 HousingWire’s Tech100 Mortgage Award

RiskSpan named to HousingWire’s Tech100 for a fourth consecutive year — recognition of the firm’s continuous commitment to advancing mortagage, technology, data and analytics.

Our cloud-native data and predictive modeling analytical platform uncovers insights and mitigates risks for loans and structured products.

SCHEDULE A DEMO OR TRIAL

HousingWire is the most influential source of news and information for the U.S. mortgage and housing markets. Built on a foundation of independent and original journalism, HousingWire reaches over 60,000 newsletter subscribers daily and over 1.0 million unique visitors each month. Our audience of mortgage, real estate and fintech professionals rely on us to Move Markets Forward. 


Mortgage Data and the Cloud – Now is the Time

As the trend toward cloud computing continues its march across an ever-expanding set of industries, it is worth pausing briefly to contemplate how it can benefit those of us who work with mortgage data for a living.  

The inherent flexibility, efficiency and scalability afforded by cloud-native systems driving this trend are clearly of value to users of financial services data. Mortgages in particular, each accompanied by a dizzying array of static and dynamic data about borrower incomes, employment, assets, property valuations, payment histories, and detailed loan terms, stand to reap the benefits of cloud and the shift to this new form of computing.  

And yet, many of my colleagues still catch themselves referring to mortgage data files as “tapes.” 

Migrating to cloud evokes some of the shiniest words in the world of computing – cost reduction, security, reliability, agility – and that undoubtedly creates a stir. Cloud’s ability to provide on-demand access to servers, storage locations, databases, software and applications via the internet, along with the promise to ‘only pay for what you use’ further contributes to its popularity. 

These benefits are especially well suited to mortgage data. They include:  

  • On-demand self-service and the ability to provision resources without human interference – of particular use for mortgage portfolios that are constantly changing in both size and composition. 
  • Broad network access, diverse platforms having access to multiple resources available over the network – valuable when origination, secondary marketing, structuring, servicing, and modeling tools are seeking to simultaneously access the same evolving datasets for different purposes. 
  • Multi-tenancy and resource pooling, allowing resource sharing while maintaining privacy and security. 
  • Rapid elasticity and scalability, quick acquiring and disposing of resources and allowing quick but measured scaling based on demand. 

Cloud-native systems reduce ownership and operational expenses, increase speed and agility, facilitate innovation, improve client experience, and even enhance security controls. 

There is nothing quite like mortgage portfolios when it comes to massive quantities of financial data, often PII-laden, with high security requirements. The responsibility for protecting borrower privacy is the most frequently cited reason for financial institution reluctance when it comes to cloud adoption. But perhaps counterintuitively, migrating on-premises applications to cloud actually results in a more controlled environment as it provides for backup and access protocols that are not as easily implemented with on-premise solutions. 

The cloud affords a sophisticated and more efficient way of securing mortgage data. In addition to eliminating costs associated with running and maintaining data centers, the cloud enables easy and fast access to data and applications anywhere and at any time. As remote work takes hold as a more long-term norm, cloud-native platform help ensure employees can work effectively regardless of their location. Furthermore, the scalability of cloud-native data centers allows holders of mortgage assets to grow and expand storage capabilities as the portfolio grows and reduce it when it contracts. The cloud protects mortgage data from security breaches or disaster events, because the loan files are (by definition) backed up in a secure, remote location and easily restored without having to invest in expensive data retrieval methods.  

This is not to say that migrating to the cloud is without its challenges. Entrusting sensitive data to a new third-party partner and relying on its tech to remain online will always carry some measure of risk. Cloud computing, like any other innovation, comes with its own advantages and disadvantages, and redundancies mitigate virtually all of these uncertainties. Ultimately, the upside of being able work with mortgage data on cloud-native solutions far outweighs the drawbacks. The cloud makes it possible for processes to become more efficient in real-time, without having to undergo expensive hardware enhancements. This in turn creates a more productive environment for data analysts and modelers seeking to give portfolio managers, servicers, securitizers, and others who routinely deal with mortgage assets the edge they are looking for.

Kriti Asrani is an associate data analyst at RiskSpan.


Want to read more on this topic? Check out COVID-19 and the Cloud.


Anomaly Detection and Quality Control

In our most recent workshop on Anomaly Detection and Quality Control (Part I), we discussed how clean market data is an integral part of producing accurate market risk results. As incorrect and inconsistent market data is so prevalent in the industry, it is not surprising that the U.S. spends over $3 trillion on processes to identify and correct market data.

In taking a step back, it is worth noting what drives accurate market risk analytics. Clearly, having accurate portfolio holdings with correct terms and conditions for over-the-counter trades is central to calculating consistent risk measures that are scaled to the market value of the portfolio. The use of well-tested and integrated industry-standard pricing models is another key factor in producing reliable analytics. In comparison to the two categories above, clean, and consistent market data are the largest contributors that could lead to poor market risk analytics. The key driving factor behind detecting and correcting/transforming market data is risk and portfolio managers expectation that risk results are accurate at the start of the business day with no need to perform any time-consuming re-runs during the day to correct issues found.  

Broadly defined, market data is defined as any data that is used as input to the re-valuation models. This includes equity prices, interest rates, credit spreads. FX rates, volatility surfaces, etc.

Market data needs to be:

  • Complete – no true gaps when looking back historically.
  • Accurate
  • Consistent – data must be viewed across other data points to determine its accuracy (e.g., interest rates across tenor buckets, volatilities across volatility surface)

Anomaly types can be broken down into four major categories:

  • Spikes
  • Stale data
  • Missing data
  • Inconsistencies

Here are three example of “bad” market data:

Credit Spreads

The following chart depicts day-over-day changes in credit spreads for the 10-year consumer cyclical time series, returned from an external vendor. The changes indicate a significant spike on 12/3 that caused big swings, up and down, across multiple rating buckets​. Without an adjustment to this data, key risk measures would show significant jumps, up and down, depending on the dollar value of positions on two consecutive days​.

Swaption Volatilities

Market data also includes volatilities, which drive delta and possible hedging. The following chart shows implied swaption volatilities for different maturities of swaptions and their underlying swaps. The following chart shows implied swaption volatilities for different maturity of swaption and underlying swap​. Note the spikes in 7×10 and 10×10 swaptions. The chart also highlights inconsistencies between different tenors and maturities.

Equity Implied Volatilities

The 146 and 148 strikes in the table below reflect inconsistent vol data, as often occurs around expiration.

The detection of market data inconsistencies needs to be an automated process with multiple approaches targeted for specific types of market data. The detection models need to evolve over time as added information is gathered with the goal of reducing false negatives to a manageable level. Once the models detect the anomalies, the next step is to automate the transformation of the market data (e.g., backfill, interpolate, use prior day value). Together with the transformation, transparency must be recorded such that it is known what values were either changed or populated if not available. This should be shared with clients which could lead to alternative transformations or model detection routines.

Detector types typically fall into the following categories:

  • Extreme Studentized Deviate (ESD): finds outliers in a single data series (helpful for extreme cases.)
  • Level Shift: detects change in level by comparing means of two sliding time windows (useful for local outliers.)
  • Local Outliers: detects spikes in near values.
  • Seasonal Detector: detects seasonal patterns and anomalies (used for contract expirations and other events.)
  • Volatility Shift: detects shift of volatility by tracking changes in standard deviation.

On Wednesday, May 19th, we will present a follow-up workshop focusing on:

  • Coding examples
    • Application of outlier detection and pipelines
    • PCA
  • Specific loan use cases
    • Loan performance
    • Entity correction
  • Novelty Detection
    • Anomalies are not always “bad”
    • Market monitoring models

You can register for this complimentary workshop here.


Get Started
Get A Demo

Linkedin    Twitter    Facebook