Data Management for a Robust Risk Framework

In an article published last year, the Harvard Business Review quotes IBM research that estimates that bad data costs US business $3 Trillion per year. Although it is difficult to identify the specific cost associated with bad data in market-risk management, it is obvious that managing data has never been more important.

The success of a market-risk management implementation is largely dependent on a validated, scalable, and well-governed data management process.

The data for market-risk management may come from many disparate sources. To examine the details, we broadly classify these types of data as:

  • Terms and Conditions
  • Market Data
  •  Model Inputs
  • User Assumptions and Configurations
  • Calculated Results and Projections

Below, we introduce some of the challenges found with each data type. We will cover each data type more deeply in subsequent sections.


Terms and conditions

Terms and conditions on instrument drive nearly everything around risk reporting, including valuation models, risk reporting, and regulatory filing. Accurate results depend on accurate indicative information for securities and custom structures.  The identification and classification of securities remain an ongoing challenge, and interpretation and normalization of data elements across instrument classes can help avoid application errors.  In the following chapter, we will discuss some of approaches to address, validate, correct missing data elements.


Market data

Both scenario analysis and VaR depend on accurately valuing an instrument using its relevant risk factors. These risk factors may be measured using both current and historical market data, making accurate market data vital to risk analytics. Market data for both OTC instruments as well as exchange-traded benchmarks can be erroneous and inconsistent.  Therefore, it is critical to implement a framework to use pricing algorithms to uncover market data inconsistencies. In Chapter 3, we explore some of the methods that are useful to promote more consistent market data.


Model inputs

Models are algorithms which are only as good as the accuracy, interpretation, and timeliness of their input data. Input data may consist of security terms and conditions, market data, macroeconomic data, and user overlays or configurations.  Each data source may update at differing times and with varying frequencies, and a mismatch in update timing can result inconsistent model projections.


User assumptions and configuration

Investment theses often vary in subtle but important ways across managers and strategies.  Robust risk management needs to reflect these differences by stressing relevant factors and providing necessary reporting. Data enrichment may also be necessary to augment market data to avoid overstating risk due to mismatches in asset and hedge representations. For instance, for an index arbitrage strategy, risk would be misstated if index components are not independently simulated.


Risk Output and projections

Daily risk simulations on even a modest-sized portfolio of complex instruments can generate billions of data points. A risk platform needs to capture and make available this data, to provide risk and portfolio managers the granularity needed to parse the portfolio and help explain risk measures. Portfolio-level metrics can sometimes have unexpected day-over-day swings, and without the ability to quickly drill down into the underlying data risk managers would have a difficult time explaining results to front office users and regulators.


Each of these data classes can provide challenges and potentially introduce different types of input errors. We classify some types of errors below. Most of the data errors that challenge market-risk efforts originate from one of these sources:


Errors in source data

The time-series input data used in risk analytics derive from multiple sources and can be riddled with errors, both for thinly traded OTC market instruments as well as exchange-traded instruments such as equities. It is critical to implement validation algorithms to detect and eliminate bad data points. Additionally, terms and conditions for instruments with schedules and structures can be inconsistent with their original structure, and these inconsistencies are difficult to detect.


Missing data elements

Missing data can lead to many problems. For example, missing terms and conditions may either prevent the instrument from being analyzed, or in the worst case, generate an incorrect valuation that is difficult to detect. In certain cases, an algorithmic approach can be employed to infer missing data elements.  Offering documents for certain classes of listed instruments can also assist in populating missing data elements. We will discuss a few use-cases where machine learning and other techniques can be used to populate missing data elements.


User input errors

One of the most difficult set of errors to detect are errors due to user input. Unlike observed indicative data and market data, there are no benchmarks or alternate sources.  Use of guardrails and validation algorithms are some of the ways of avoiding and detecting user errors. Data governance supported by a rigorous audit feature can help recover from user errors swiftly.


Incorrect implementation and interpretation of data

Some input errors come from faulty interpretation of data. Units of data, relative measures, frequency of reporting, compounding, day-counts, and many other data conventions are important for the application to understand. For example, when bootstrapping a yield curve using deposit, futures, swaps, and bonds, incorrect implementation of discounting and day-counts can lead to an inconsistent spot curve and make downstream dependencies error-prone. A well-managed risk process makes the data units and conventions explicit, leaving no room for user interpretation.

In the next few sections, we cover the various sources of errors and a few of the methods users can implement to assure high-quality inputs.