Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Articles Tagged with: Data Management

SOFR, So Good? The Main Anxieties Around the LIBOR Transition

SOFR Replacing LIBOR

The London Interbank Offered Rate (LIBOR) is going away, and the international financial community is working hard to plan for and mitigate risks to make a smooth transition. In the United States, the Federal Reserve’s Alternative Reference Rates Committee (ARRC) has recommended the Secured Overnight Financing Rate (SOFR) as the preferred replacement rate. The New York Fed began publishing SOFR regularly on April 3, 2018. In July 2018, Fannie Mae issued $6 billion in SOFR-denominated securities, leading the way for other institutions who have since followed suit. In November 2018, the Federal Home Loan (FHL) Banks issued $4 billion in debt tied to SOFR. CME Group, a derivatives and futures exchange company, launched 3-month and 1-month SOFR futures contracts in 2018. All of these steps to support liquidity and demonstrate SOFR demand are designed to create a rate more robust than LIBOR—the transaction volume underpinning SOFR rates is around $750 billon daily, compared to USD LIBOR’s estimated $500 million in daily transaction volume. 

USD LIBOR is referenced in an estimated $200 trillion of financial contracts, of which 95 percent is derivatives. However, the remaining cash market is not small. USD LIBOR is referenced in an estimated: $3.4 trillion in business loans, $1.3 trillion in retail mortgages and other consumer loans, $1.8 trillion in floating rate debt, and $1.8 trillion in securitized products. 

The ARRC has held consultations on its recommended fallback language for floating rate notes and syndicated business loans—the responses are viewable on the ARRC website. On December 7, the ARRC published consultations on securitizations and bilateral business loans, which are both open for comment through February 5, 2019.  

Amid the flurry of positive momentum in the transition towards SOFR, anxiety remains that the broader market is not moving quickly enough. ARRC consultations and working groups indicate that these anxieties derive primarily from a few specific points of debate: development of term rates, consistency of contracts, and implementation timing.

Term Rates

Because the SOFR futures market remains immature, term rates cannot be developed without significant market engagement with the newly created futures. The ARRC Paced Transition Plan includes a goal to create a forward-looking reference rate by end-of-year 2021 – just as LIBOR is scheduled to phase out. In the interim, financial institutions must figure out how to build into existing contracts fallback language or amendments that include a viable alternative to LIBOR term rates.  

The nascent SOFR futures market is growing quickly, with December 2018 daily trade volumes at nearly 16,000. However, they pale in comparison to Eurodollar futures volumes, which logged daily averages around 5 million per day at CME Group alone. This puts SOFR on track according to the ARRC plan, but means institutions remain in limbo until the futures market is more mature and term SOFR rates can be developed. 

In July 2018, the Financial Stability Board (FSB) stated their support for employment of term rates primarily in cash markets, while arguing that spreads are tightest in derivative markets focused around overnight risk-free rates (RFRs), which therefore are preferred. An International Swaps and Derivatives Association (ISDA) FAQ document published in September 2018 explained the FSB’s request that “ISDA should develop fallbacks that could be used in the absence of suitable term rates and, in doing so, should focus on calculations based on the overnight RFRs.” This marks a major change, given that derivatives commonly reference 3-month LIBOR, and cash products are dependent on forward-looking term rates. Despite the magnitude of change, transition from LIBOR term rates to an alternative term rate based on limited underlying transactions would be undesirable.

The FSB explained:

Moving the bulk of current exposures referencing term IBOR benchmarks that are not sufficiently anchored in transactions to alternative term rates that also suffer from thin underlying markets would not be effective in reducing risks and vulnerabilities in the financial system. Therefore, the FSB does not expect such RFR-derived term rates to be as robust as the RFRs themselves, and they should be used only where necessary.

In consultation report published December 20, 2018, ISDA stated the overwhelming majority of respondents preference for fallback language with a compounded setting in arrears rate for the adjusted RFR, with a significant and diverse majority preferring the historical mean/median approach for the spread adjustment.

Though ISDA’s consultation report noted some drawbacks to the historical mean/median approach for the spread adjustment, the diversity of supporters – in all regions of the world, representing many types of financial institutions – was a strong indicator of market preference. By comparison, there was no ambiguity about preference for the RFR in fallback language: In almost 90 percent of ISDA respondent rankings, the compounded setting in arrears rate was selected as the top preference for the adjusted RFR. 

In the Structured Finance Industry Group (SFIG) LIBOR Task Force Green Paper, the group indicates strong preference for viable term rates and leaves the question of whether such calculations should be done in advance or in arrears as an open item, while indicating preference for continuing prospectively determining rates at the start of each term. They list their preference for waterfall options as first an endorsed forward-looking term SOFR rate, and second, a compounded or average daily SOFR. SFIG is currently drafting their response to the ARRC Securitization Consultation, which will be made public on the ARRC website after submission. 

Despite stated preferences, working groups are making a concerted effort to follow the ARRC’s guidance to strive for consistency across cash and derivative products. Given the concerns about a viable term rate, some market participants in cash products are also exploring the realities of implementing ISDA’s recommended fallback language and intend to incorporate those considerations into their response to the ARRC consultations. 

In the absence of an endorsed term rate, pricing of other securities such as fixed-rate bonds is difficult, if not impossible. Additionally, the absence of an endorsed term rate creates issues of consistency within the rate itself (i.e., market standards will need to developed around how and over what periods the rate is compounded). The currently predominant recommendation of a compounding in arrears overnight risk-free rate would also have added complexity when compared with any forward-looking rate, which is exacerbated in the cash markets with consumer products where changes must be fully disclosed and explained. Compounding in arrears would require a lock-out period at the end of a term to allow institutions time to calculate the compounded interest. Market standards and consumer agreement around the specific terms governing the lock-out period would be difficult to establish.

Consistency:

While ISDA has not yet completed formal consultation specific to USD LIBOR and SOFR, and their analysis is only applicable to derivatives and swaps, there are several benefits to consistency across cash and derivatives markets. Consistency of contract terms across all asset classes during the transition away from USD LIBOR lowers operational, accounting, legal, and basis risk, according to the ARRC, and makes the change easier to communicate and negotiate with stakeholders.  

Though it is an easy case to make that consistency is advantageous, achieving it is not. For example, the Mortgage Bankers Association points out that the ISDA-selected compounding in arrears approach to interest accrual periods “would be a very material change from current practice as period interest expenses would not be determined until the end of the relevant period.” The nature of the historical mean/median spread adjustment does not come without drawbacks. ISDA’s consultation acknowledges that the approach is “likely to lead to value transfers and potential market disruption by not capturing contemporaneous market conditions at the trigger event, as well as creating potential issues with hedging.” Additionally, respondents acknowledge that relevant data may not yet be available for long lookback periods with the newly created overnight risk-free rates.  

The effort to achieve some level of consistency across the transition away from LIBOR poses several challenges related to timing. Because LIBOR will only be unsupported (rather than definitively discontinued) by the Financial Conduct Authority (FCA) at the end of 2021, some in the market retain a small hope that production of LIBOR rates could continue. The continuation of LIBOR is possible, but betting a portfolio of contracts on its continuation is an unnecessarily high-risk decision. That said, transition plans remain ambiguous about timing, and implementation of any contract changes is ultimately at the sole discretion of the contract holder. Earlier ARRC consultations acknowledged two possible implementation arrangements:   

  1. An “amendment approach,” which would provide a streamlined amendment mechanism for negotiating a replacement benchmark in the future and could serve as an initial step towards adopting a hardwired approach.  
  2. A “hardwired approach,” which would provide market participants with more clarity as to a how a potential replacement rate will be identified and implemented. 

However, the currently open-for-comment securitizations consultation has dropped the “amendment” and “hardwired” terminology and now describes what amounts to the hardwired approach as defined above – a waterfall of options that is implemented upon occurrence of a predefined set of “trigger” events. Given that the securitizations consultation is still open for comment, it remains possible that market respondents will bring the amendment approach back into discussions.  

Importantly, in the U.S. there are currently no legally binding obligations for organizations to plan for the cessation of LIBOR, nor policy governing how that plan be made. In contrast, the European Union has begun to require that institutions submit written plans to governing bodies.

Timing

Because the terms of implementation remain open for discussion and organizational preference, there is some ambiguity about when organizations will begin transitioning contracts away from LIBOR to the preferred risk-free rates. In the structured finance market, this compounds the challenge of consistency with timing. For commercial real estate securities, for example, there is possibility of mismatch in the process and timing of transition for rates in the index and for the underlying assets and resulting certificates or bonds. This potential challenge has not yet been addressed by the ARRC or other advisory bodies.

Mortgage Market

The mortgage market is still awaiting formal guidance. While the contributions by Fannie Mae and the FHLBanks to the SOFR market signal government sponsored entity (GSE) support for the newly selected reference rate, none of the GSEs has issued any commentary about recommended fallback language specific to mortgages or guidance on how to navigate the fact that SOFR does not yet have a viable term rate. An additional concern for consumer loan products, including mortgages, is the need to explain the contract changes to consumers. As a result, the ARRC Securitization consultation hypothesizes that consumer products are “likely to be simpler and involve less optionality and complexity, and any proposals would only be made after wide consultation with consumer advocacy groups, market participants, and the official sector.”  

For now, the Mortgage Bankers Association has recommended institutions develop a preliminary transition plan, beginning with a detailed assessment of exposures to LIBOR.

How can RiskSpan Help?

At any phase in the transition away from LIBOR, RiskSpan can provide institutions with analysts experienced in contract review, experts in model risk management and sophisticated technical tools—including machine learning capabilities—to streamline the process to identify and remediate LIBOR exposure. Our diverse team of professionals is available to deliver resources to financial institutions that will mitigate risks and streamline this forthcoming transition.


Case Study: Loan-Level Capital Reporting Environment​

The Client

Government Sponsored Enterprise (GSE)

The Problem

A GSE and large mortgage securitizer maintained data from multiple work streams in several disparate systems, provided at different frequencies. Quarterly and ad-hoc data aggregation, consolidation, reporting and analytics required a significant amount of time and personnel hours. ​

The client desired configurable integration with source systems, automated acquisition of over 375 million records and performance improvements in report development.

 

The Solution

The client engaged RiskSpan Consulting Services to develop a reporting environment backed by an ETL Engine to automate data acquisition from multiple sources. 

The Deliverables

  • Reviewed system architecture, security protocol, user requirements and data dictionaries to determine feasibility and approach.​
  • Developed a user-configurable ETL Engine, developed in Python, to load data from different sources into a PostgreSQL data repository hosted on Linux server. The engine provides real-time status updates and error tracking.​
  • Developed the reporting module of the ETL Engine in Python to automatically generate client-defined Excel reports, reducing report development time from days to minutes​
  • Made raw and aggregated data available for internal users to connect virtually any reporting tool, including Python, R, Tableau and Excel​
  • Developed a user interface, leveraging the API exposed by the ETL Engine, allowing users to create and schedule jobs as well as stand up user-controlled reporting environments​

RiskSpan Edge Platform API

The RiskSpan Edge Platform API enables direct access to all data from the RS Edge Platform. This includes both aggregate analytics and loan-and pool-level data.  Standard licensed users may build queries in our browser-based graphical interface. But, our API is a channel for power users with programming skills (Python, R, even Excel) and production systems that are incorporating RS Edge Platform components as part of their Service Oriented Architecture (SOA).

Watch RiskSpan Director LC Yarnelle explain the Edge API in this video!

get a demo


Data-as-a-Service – Credit Risk Transfer Data

Watch RiskSpan Managing Director Janet Jozwik explain our recent Credit Risk Transfer data (CRT) additions to the RS Edge Platform.

Each dataset has been normalized to the same standard for simpler analysis in RS Edge, enabling users to compare GSE performance with just a few clicks. The data has also been enhanced to include helpful variables, such as mark-to-market loan-to-value ratios based on the most granular house price indexes provided by the Federal Housing Finance Agency. 

get a demo


Big Companies; Big Data Issues

Data issues plague organizations of all sorts and sizes. But generally, the bigger the dataset, and the more transformations the data goes through, the greater the likelihood of problems. Organizations take in data from many different sources, including social media, third-party vendors and other structured and unstructured origins, resulting in massive and complex data storage and management challenges. This post presents ideas to keep in mind when seeking to address these.

First, a couple of definitions:

Data quality generally refers to the fitness of a dataset for its purpose in a given context. Data quality encompasses many related aspects, including:

  • Accuracy,
  • Completeness,
  • Update status,
  • Relevance,
  • Consistency across data sources,
  • Reliability,
  • Appropriateness of presentation, and
  • Accessibility

Data lineage tracks data movement, including its origin and where it moves over time. Data lineage can be represented visually to depict how data flows from its source to its destination via various changes and hops.

The challenges facing many organizations relate to both data quality and data lineage issues, and a considerable amount of time and effort is spent both in tracing the source of data (i.e., its lineage) and correcting errors (i.e., ensuring its quality). Business intelligence and data visualization tools can do a magnificent job of teasing stories out of data, but these stories are only valuable when they are true. It is becoming increasingly vital to adopt best practices to ensure that the massive amounts of data feeding downstream processes and presentation engines are both reliable and properly understood.

Financial institutions must frequently deal with disparate systems either because of mergers and acquisitions or in order to support different product types—consumer lending, commercial banking and credit cards, for example. Disparate systems tend to result in data silos, and substantial time and effort must go into providing compliance reports and meeting the various regulatory requirements associated with analyzing data provenance (from source to destination). Understanding the workflow of data and access controls around security are also vital applications of data lineage and help ensure data quality.

In addition to the obvious need for financial reporting accuracy, maintaining data lineage and quality is vital to identifying redundant business rules and data and to ensuring that reliable, analyzable data is constantly available and accessible. It also helps to improve the data governance echo system, enabling data owners to focus on gleaning business insights from their data rather than focusing attention on rectifying data issues.

Common Data Lineage Issues

A surprising number of data issues emerge simply from uncertainty surrounding a dataset’s provenance. Many of the most common data issues stem from one or more of the following categories:

  • Human error: “Fat fingering” is just the tip of the iceberg. Misconstruing and other issues arising from human intervention are at the heart of virtually all data issues.
  • Incomplete Data: Whether it’s drawing conclusions based on incomplete data or relying on generalizations and judgment to fill in the gaps, many data issues are caused by missing data.
  • Data format: Systems expect to receive data in a certain format. Issues arise when the actual input data departs from these expectations.
  • Data consolidation: Migrating data from legacy systems or attempting to integrate newly acquired data (from a merger, for instance) frequently leads to post-consolidation issues.
  • Data processing: Calculation engines, data aggregators, or any other program designed to transform raw data into something more “usable” always run the risk of creating output data with quality issues.

Addressing Issues

Issues relating to data lineage and data quality are best addressed by employing some combination of the following approaches. The specific blend of approaches depends on the types of issues and data in question, but these principles are broadly applicable.

Employing a top-down discovery approach enables data analysts to understand the key business systems and business data models that drive an application. This approach is most effective when logical data models are linked to the physical data and systems.

Creating a rich metadata repository for all the data elements flowing from the source to destination can be an effective way of heading off potential data lineage issues. Because data lineage is dependent on the metadata information, creating a robust repository from the outset often helps preserve data lineage throughout the life cycle.

Imposing useful data quality rules is an important element in establishing a framework in which data is always validated against a set of well-conceived business rules. Ensuring not only that data passes comprehensive rule sets but also that remediation factors are in place for appropriately dealing with data that fails quality control checks is crucial for ensuring end-to-end data quality.

Data lineage and data quality both require continuous monitoring by a defined stewardship council to ensure that data owners are taking appropriate steps to understand and manage the idiosyncrasies of the datasets they oversee.

Our Data Lineage and Data Quality Background

RiskSpan’s diverse client base includes several large banks (with we define as banks with assets totaling in excess of $50 billion). Large banks are characterized by a complicated web of departments and sub-organizations, each offering multiple products, sometimes to the same base of customers. Different sub-organizations frequently rely on disparate systems (sometimes due to mergers/acquisitions; sometimes just because they develop their businesses independent of one another). Either way, data silos inevitably result.

RiskSpan has worked closely with chief data officers of large banks to help establish data stewardship teams charged with taking ownership of the various “areas” of data within the bank. This involves the identification of data “curators” within each line of business to coordinate with the CDO’s office and be the advocate (and ultimately the responsible party) for the data they “own.” In best practice scenarios, a “data curator” group is formed to facilitate collaboration and effective communication for data work across the line of business.

We have found that a combination of top-down and bottom-up data discovery approaches is most effective when working accross stakeholders to understand existing systems and enterprise data assets. RiskSpan has helped create logical data flow diagrams (based on the top-down approach) and assisted with linking physical data models to the logical data models. We have found Informatica and Collibra tools to be particularly useful in creating data lineage, tracking data owners, and tracing data flow from source to destination.

Complementing our work with financial clients to devise LOB-based data quality rules, we have built data quality dashboards using these same tools to enable data owners and curators to rectify and monitor data quality issues. These projects typically include elements of the following components.

  • Initial assessment review of the current data landscape.
  • Establishment of a logical data flow model using both top-down and bottom-up data discovery approaches.
  • Coordination with the CDO / CIO office to set up a data governance stewardship team and to identify data owners and curators from all parts of the organization.
  • Delineation of data policies, data rules and controls associated with different consumers of the data.
  • Development of a target state model for data lineage and data quality by outlining the process changes from a business perspective.
  • Development of future-state data architecture and associated technology tools for implementing data lineage and data quality.
  • Invitation to client stakeholders to reach a consensus related to future-state model and technology architecture.
  • Creation of a project team to execute data lineage and data quality projects by incorporating the appropriate resources and client stakeholders.
  • Development of a change management and migration strategy to enable users and stakeholders to use data lineage and data quality tools.

Ensuring data quality and lineage is ultimately the responsibility of business lines that own and use the data. Because “data management” is not the principal aim of most businesses, it often behooves them to leverage the principles outlined in this post (sometimes along with outside assistance) to implement tactics that will to help ensure that the stories their data tell are reliable.


MDM to the Rescue for Financial Institutions

Data becomes an asset only when it is efficiently harnessed and managed. Because firms tend to evolve into silos, their data often gets organized that way as well, resulting in multiple references and unnecessary duplication of data that dilute its value. Master Data Management (MDM) architecture helps to avoid these and other pitfalls by applying best practices to maximize data efficiency, controls, and insights.

MDM has particular appeal to banks and other financial institutions where non-integrated systems often make it difficult to maintain a comprehensive, 360-degree view of a customer who simultaneously has, for example, multiple deposit accounts, a mortgage, and a credit card. MDM provides a single, common data reference across systems that traditionally have not communicated well with each other. Customer-level reports can point to one central database instead of searching for data across multiple sources.

Financial institutions also derive considerable benefit from MDM when seeking to comply with regulatory reporting requirements and when generating reports for auditors and other examiners. Mobile banking and the growing number of new payment mechanisms make it increasingly important for financial institutions to have a central source of data intelligence. An MDM strategy enables financial institutions to harness their data and generate more meaningful insights from it by:

  • Eliminating data redundancy and providing one central repository for common data;
  • Cutting across data “silos” (and different versions of the same data) by providing a single source of truth;
  • Streamlining compliance reporting (through the use of a common data source);
  • Increasing operational and business efficiency;
  • Providing robust tools to secure and encrypt sensitive data;
  • Providing a comprehensive 360-degree view of customer data;
  • Fostering data quality and reducing the risks associated with stale or inaccurate data, and;
  • Reducing operating costs associated with data management.

Not surprisingly, there’s a lot to think about when contemplating and implementing a new MDM solution. In this post, we lay out some of the most important things for financial institutions to keep in mind.

 

MDM Choice and Implementation Priorities

MDM is only as good as the data it can see. To this end, the first step is to ensure that all of the institution’s data owners are on board. Obtaining management buy-in to the process and involving all relevant stakeholders is critical to developing a viable solution. This includes ensuring that everyone is “speaking the same language”—that everyone understands the benefits related to MDM in the same way—and  establishing shared goals across the different business units.

Once all the relevant parties are on board, it’s important to identify the scope of the business process within the organization that needs data refinement through MDM. Assess the current state of data quality (including any known data issues) within the process area. Then, identify all master data assets related to the process improvement. This generally involves identifying all necessary data integration for systems of record and the respective subscribing systems that would benefit from MDM’s consistent data. The selected MDM solution should be sufficiently flexible and versatile that it can govern and link any sharable enterprise data and connect to any business domain, including reference data, metadata and any hierarchies.

An MDM “stewardship team” can add value to the process by taking ownership of the various areas within the MDM implementation plan. MDM is just not about technology itself but also involves business and analytical thinking around grouping data for efficient usage. Members of this team need to have the requisite business and technical acumen in order for MDM implementation to be successful. Ideally this team would be responsible for identifying data commonalities across groups and laying out a plan for consolidating them. Understanding the extent of these commonalities helps to optimize architecture-related decisions.

Architecture-related decisions are also a function of how the data is currently stored. Data stored in heterogeneous legacy systems calls for a different sort of MDM solution than does a modern data lake architecture housing big data. The solutions should be sufficiently flexible and scalable to support future growth. Many tools in the marketplace offer MDM solutions. Landing on the right tool requires a fair amount of due diligence and analysis. The following evaluation criteria are often helpful:

  • Enterprise Integration: Seamless integration into the existing enterprise set of tools and workflows is an important consideration for an MDM solution.  Solutions that require large-scale customization efforts tend to carry additional hidden costs.
  • Support for Multiple Devices: Because modern enterprise data must by consumable by a variety of devices (e.g., desktop, tablet and mobile) the selected MDM architecture must support each of these platforms and have multi-device access capability.
  • Cloud and Scalability: With most of today’s technology moving to the cloud, an MDM solution must be able to support a hybrid environment (cloud as well as on-premise). The architecture should be sufficiently scalable to accommodate seasonal and future growth.
  • Security and Compliance: With cyber-attacks becoming more prevalent and compliance and regulatory requirements continuing to proliferate, the MDM architecture must demonstrate capabilities in these areas.

 

Start Small; Build Gradually; Measure Success

MDM implementation can be segmented into small, logical projects based on business units or departments within an organization. Ideally, these projects should be prioritized in such a way that quick wins (with obvious ROI) can be achieved in problem areas first and then scaling outward to other parts of the organization. This sort of stepwise approach may take longer overall but is ultimately more likely to be successful because it demonstrates success early and gives stakeholders confidence about MDM’s benefits.

The success of smaller implementations is easier to measure and see. A small-scale implementation also provides immediate feedback on the technology tool used for MDM—whether it’s fulfilling the needs as envisioned. The larger the implementation, the longer it takes to know whether the process is succeeding or failing and whether alternative tools should be pursued and adopted. The success of the implementation can be measured using the following criteria:

  • Savings on data storage—a result of eliminating data redundancy.
  • Increased ease of data access/search by downstream data consumers.
  • Enhanced data quality—a result of common data centralization.
  • More compact data lineage across the enterprise—a result of standardizing data in one place.

Practical Case Studies

RiskSpan has helped several large banks consolidate multiple data stores across different lines of business. Our MDM professionals work across heterogeneous data sets and teams to create a common reference data architecture that eliminates data duplication, thereby improving data efficiency and reducing redundant data. These professionals have accomplished this using a variety of technologies, including Informatica, Collibra and IBM Infosphere.

Any successful project begins with a survey of the current data landscape and an assessment of existing solutions. Working collaboratively to use this information to form the basis of an approach for implementing a best-practice MDM strategy is the most likely path to success.


Making Data Dictionaries Beautiful Using Graph Databases

Most analysts estimate that for a given project well over half of the time is spent on collecting, transforming, and cleaning data in preparation for analysis. This task is generally regarded as one of the least appetizing portions of the data analysis process and yet it is the most crucial, as trustworthy analyses are borne out of clean, reliable data. Gathering and preparing data for analysis can be either enhanced or hindered based on the data management practices in place at a firm. When data are readily available, clearly defined, and well documented it will lead to faster and higher-quality insights. As the size and variability of data grows, however, so too does the challenge of storing and managing it. Like many firms, RiskSpan manages a multitude of large, complex datasets with varying degrees of similarity and connectedness. To streamline the analysis process and improve the quantity and quality of our insights, we have made our datasets, their attributes, and relationships transparent and quickly accessible using graph database technology. Graph databases differ significantly from traditional relational databases because data are not stored in tables. Instead, data are stored in either a node or a relationship (also called an edge), which is a connection between two nodes. The image below contains a grey node labeled as a dataset and a blue node labeled as a column. The line connecting these two nodes is a relationship which, in this instance, signifies that the dataset contains the column. Graph 1 There are many advantages to this data structure including decreased redundancy. Rather than storing the same “Column1” in multiple tables for each dataset that contain it (as you would in a relational database), you can simply create more relationships between the datasets demonstrated below: Graph 2 With this flexible structure it is possible to create complex schema that remain visually intuitive. In the image below the same grey (dataset) -contains-> blue (column) format is displayed for a large collection of datasets and columns. Even at such a high level, the relationships between datasets and columns reveal patterns about the data. Here are three quick observations:

  1. In the top right corner there is a dataset with many unique columns.
  2. There are two datasets that share many columns between them and have limited connectivity to the other datasets.
  3. Many ubiquitous columns have been pulled to the center of the star pattern via the relationships to the multiple datasets on the outer rim.

Graph 3 In addition to containing labels, nodes can store data as key-value pairs. The image below displays the column “orig_upb” from dataset “FNMA_LLP”, which is one of Fannie Mae’s public datasets that is available on RiskSpan’s Edge Platform. Hovering over the column node displays some information about it, including the name of the field in the RiskSpan Edge platform, its column type, format, and data type. Graph 4 Relationships can also store data in the same key-value format. This is an incredibly useful property which, for the database in this example, can be used to store information specific to a dataset and its relationship to a column. One of the ways in which RiskSpan has utilized this capability is to hold information pertinent to data normalization in the relationships. To make our datasets easier to analyze and combine, we have normalized the formats and values of columns found in multiple datasets. For example, the field “loan_channel” has been mapped from many unique inputs across datasets to a set of standardized values. In the images below, the relationships between two datasets and loan_channel are highlighted. The relationship key-value pairs contain a list of “mapped_values” identifying the initial values from the raw data that have been transformed. The dataset on the left contains the list: [“BROKER”, “CORRESPONDENT”, “RETAIL”] Graph 5 While the dataset on the right contains: [“R”, “B”, “C”, “T”, “9”] Graph 6 We can easily merge these lists with a node containing a map of all the recognized enumerations for the field. This central repository of truth allows us to deploy easy and robust changes to the ETL processes for all datasets. It also allows analysts to easily query information related to data availability, formats, and values. Graph 7 In addition to queries specific to a column, this structure allows an analyst to answer questions about data availability across datasets with ease. Normally, comparing pdf data dictionaries, excel worksheets, or database tables can be a painstaking process. Using the graph database, however, a simple query can return the intersection of three datasets as shown below. The resulting graph is easy to analyze and use to define the steps required to obtain and manipulate the data. Graph 8 In addition to these benefits for analysts and end users, utilizing graph database technology for data management comes with benefits from a data governance perspective. Within the realm of data stewardship, ownership and accountability of datasets can be assigned and managed within a graph database like the one in this blog. The ability to store any attribute in a node and create any desired relationship makes it simple to add nodes representing data owners and curators connected to their respective datasets. Graph 9 The ease and transparency with which any data related information can be stored makes graph databases very attractive. Graph databases can also support a nearly infinite number of nodes and relationships while also remaining fast. While every technology has a learning curve, the intuitive nature of graphs combined with their flexibility makes them an intriguing and viable option for data management.


A Brief Introduction to Agile Philosophy

Reducing time to delivery by developing in smaller incremental chunks and incorporating an ability to pivot is the cornerstone of Agile software development methodology.

“Agile” software development is a rarity among business buzz words in that it is actually a fitting description of what it seeks to accomplish. Optimally implemented, it is capable of delivering value and efficiency to business-IT partnerships by incorporating flexibility and an ability to pivot rapidly when necessary.

As a technology company with a longstanding management consulting pedigree, RiskSpan values the combination of discipline and flexibility inherent to Agile development and regularly makes use of the philosophy in executing client engagements. Dynamic economic environments contribute to business priorities that are seemingly in a near-constant state of flux. In response to these ever-evolving needs, clients seek to implement applications and application feature changes quickly and efficiently to realize business benefits early.

This growing need for speed and “agility” makes Agile software development methods an increasingly appealing alternative to traditional “waterfall” methodologies. Waterfall approaches move in discrete phases—treating analysis, design, coding, and testing as individual, stand-alone components of a software project. Historically, when the cost of changing plans was high, such a discrete approach worked best. Nowadays, however, technological advances have made changing the plan more cost-feasible. In an environment where changes can be made inexpensively, rigid waterfall methodologies become unnecessarily counterproductive for at least four reasons:

  1. When a project runs out of time (or money), individual critical phases—often testing—must be compressed, and overall project quality suffers.
  2. Because working software isn’t produced until the very end of the project, it is difficult to know whether the project is really on track prior to project completion.
  3. Not knowing whether established deadlines will be met until relatively late in the game can lead to schedule risks.
  4. Most important, discrete phase waterfalls simply do not respond well to the various ripple effects created by change.

 

Continuous Activities vs. Discrete Project Phases

Agile software development methodologies resolve these traditional shortcomings by applying techniques that focus on reducing overhead and time to delivery. Instead of treating fixed development stages as discrete phases, Agile treats them as continuous activities. Doing things simultaneously and continuously—for example, incorporating testing into the development process from day one—improves quality and visibility, while reducing risk. Visibility improves because being halfway through a project means that half of a project’s features have been built and tested, rather than having many partially built features with no way of knowing how they will perform in testing. Risk is reduced because feedback comes in from earliest stages of development and changes without paying exorbitant costs. This makes everybody happy.

 

Flexible but Controlled

Firms sometimes balk at Agile methods because of a tendency to equate “flexibility” and “agility” with a lack of organization and planning, weak governance and controls, and an abandonment of formal documentation. This, however, is a misconception. “Agile” does not mean uncontrolled—on the contrary, it is no more or less controlled than the existing organizational boundaries of standardized processes into which it is integrated. Most Agile methods do not advocate any particular methodology for project management or quality control. Rather, their intent is on simplifying the software development approach, embracing changing business needs, and producing working software as quickly as possible. Thus, Agile frameworks are more like a shell which users of the framework have full flexibility to customize as necessary.

 

Frameworks and Integrated Teams

Agile methodologies can be implemented using a variety of frameworks, including Scrum, Kanban, and XP. Scrum is the most popular of these and is characterized by producing a potentially shippable set of functionalities at the end of every iteration in two-week time boxes called sprints. Delivering high-quality software at the conclusion of such short sprints requires supplementing team activities with additional best practices, such as automated testing, code cleanup and other refactoring, continuous integration, and test-driven or behavior-driven development.

Agile teams are built around motivated individuals subscribing what is commonly referred to as a “lean Agile mindset.” Team members who embrace this mindset share a common vision and are motivated to contribute in ways beyond their defined roles to attain success. In this way, innovation and creativity is supported and encouraged. Perhaps most important, Agile promotes building relationships based on trust among team members and with the end-user customer in providing fast and high-quality delivery of software. When all is said and done, this is the aim of any worthwhile endeavor. When it comes to software development, Agile is showing itself to be an impressive means to this end.


Advantages and Disadvantages of Open Source Data Modeling Tools

Using open source data modeling tools has been a topic of debate as large organizations, including government agencies and financial institutions, are under increasing pressure to keep up with technological innovation to maintain competitiveness. Organizations must be flexible in development and identify cost-efficient gains to reach their organizational goals, and using the right tools is crucial. Organizations must often choose between open source software, i.e., software whose source code can be modified by anyone, and closed software, i.e., proprietary software with no permissions to alter or distribute the underlying code.

Mature institutions often have employees, systems, and proprietary models entrenched in closed source platforms. For example, SAS Analytics is a popular provider of proprietary data analysis and statistical software for enterprise data operations among financial institutions. But several core computations SAS performs can also be carried out using open source data modeling tools, such as Python and R. The data wrangling and statistical calculations are often fungible and, given the proper resources, will yield the same result across platforms.

Open source is not always a viable replacement for proprietary software, however. Factors such as cost, security, control, and flexibility must all be taken into consideration. The challenge for institutions is picking the right mix of platforms to streamline software development.  This involves weighing benefits and drawbacks.

Advantages of Open Source Programs

The Cost of Open Source Software

The low cost of open source software is an obvious advantage. Compared to the upfront cost of purchasing a proprietary software license, using open source programs seems like a no-brainer. Open source programs can be distributed freely (with some possible restrictions to copyrighted work), resulting in virtually no direct costs. However, indirect costs can be difficult to quantify. Downloading open source programs and installing the necessary packages is easy and adopting this process can expedite development and lower costs. On the other hand, a proprietary software license may bundle setup and maintenance fees for the operational capacity of daily use, the support needed to solve unexpected issues, and a guarantee of full implementation of the promised capabilities. Enterprise applications, while accompanied by a high price tag, provide ongoing and in-depth support of their products. The comparable cost of managing and servicing open source programs that often have no dedicated support is difficult to determine.

Open Source Talent Considerations

Another advantage of open source is that it attracts talent who are drawn to the idea of sharable and communitive code. Students and developers outside of large institutions are more likely to have experience with open source applications since access is widespread and easily available. Open source developers are free to experiment and innovate, gain experience, and create value outside of the conventional industry focus. This flexibility naturally leads to more broadly skilled inter-disciplinarians. The chart below from Indeed’s Job Trend Analytics tool reflects strong growth in open source talent, especially Python developers.

From an organizational perspective, the pool of potential applicants with relevant programming experience widens significantly compared to the limited pool of developers with closed source experience. For example, one may be hard-pressed to find a new applicant with development experience in SAS since comparatively few have had the ability to work with the application. Key-person dependencies become increasingly problematic as the talent or knowledge of the proprietary software erodes down to a shrinking handful of developers.

Job Seekers Interests via Indeed

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

Support and Collaboration

The collaborative nature of open source facilitates learning and adapting to new programming languages. While open source programs are usually not accompanied by the extensive documentation and user guides typical of proprietary software, the constant peer review from the contributions of other developers can be more valuable than a user guide. In this regard, adopters of open source may have the talent to learn, experiment with, and become knowledgeable in the software without formal training.

Still, the lack of support can pose a challenge. In some cases, the documentation accompanying open source packages and the paucity of usage examples in forums do not offer a full picture. For example, RiskSpan built a model in R that was driven by the available packages for data infrastructure – a precursor to performing statistical analysis – and their functionality. R does not have an active support solutions line and the probability of receiving a response from the author of the package is highly unlikely. This required RiskSpan to thoroughly vet packages.

Flexibility and Innovation

Another attractive feature of open source is its inherent flexibility. Python allows users to use different integrated development environments (IDEs) that have multiple different characteristics or functions, as compared to SAS Analytics, which only provides SAS EG or Base SAS. R makes possible web-based interfaces for server-based deployments. These functionalities grant more access to users at a lower cost. Thus, there can be more firm-wide development and participation in development. The ability to change the underlying structure of open source makes it possible to mold it per the organization’s goals and improve efficiency.

Another advantage of open source is the sheer number of developers trying to improve the software by creating many functionalities not found in their closed source equivalent. For example, R and Python can usually perform many functions like those available in SAS, but also have many capabilities not found in SAS: downloading specific packages for industry specific tasks, scraping the internet for data, or web development (Python). These specialized packages are built by programmers seeking to address the inefficiencies of common problems. A proprietary software vendor does not have the expertise nor the incentive to build equivalent specialized packages since their product aims to be broad enough to suit uses across multiple industries.

RiskSpan uses open source data modeling tools and operating systems for data management, modeling, and enterprise applications. R and Python have proven to be particularly cost effective in modeling. R provides several packages that serve specialized techniques. These include an archive of packages devoted to estimating the statistical relationship among variables using an array of techniques, which cuts down on development time. The ease of searching for these packages, downloading them, and researching their use incurs nearly no cost.

Open source makes it possible for RiskSpan to expand on the tools available in the financial services space. For example, a leading cash flow analytics software firm that offers several proprietary solutions in modeling structured finance transactions lacks the full functionality RiskSpan was seeking.  Seeking to reduce licensing fees and gain flexibility in structuring deals, RiskSpan developed deal cashflow programs in Python for STACR, CAS, CIRT, and other consumer lending deals. The flexibility of Python allowed us to choose our own formatted cashflows and build different functionalities into the software. Python, unlike closed source applications, allowed us to focus on innovating ways to interact with the cash flow waterfall.

Disadvantages of Open Source Programs

Deploying open source solutions also carries intrinsic challenges. While users may have a conceptual understanding of the task at hand, knowing which tools yield correct results, whether derived from open or closed source, is another dimension to consider. Different parameters may be set as default, new limitations may arise during development, or code structures may be entirely different. Different challenges may arise from translating a closed source program to an open source platform. Introducing open source requires new controls, requirements, and development methods.

Redundant code is an issue that might arise if a firm does not strategically use open source. Across different departments, functionally equivalent tools may be derived from distinct packages or code libraries. There are several packages offering the ability to run a linear regression, for example. However, there may be nuanced differences in the initial setup or syntax of the function that can propagate problems down the line. In addition to the redundant code, users must be wary of “forking” where the development community splits on an open source application. For example, R develops multiple packages performing the same task/calculations, sometimes derived from the same code base, but users must be cognizant that the package is not abandoned by developers.

Users must also take care to track the changes and evolution of open source programs. The core calculations of commonly used functions or those specific to regular tasks can change. Maintaining a working understanding of these functions in the face of continual modification is crucial to ensure consistent output. Open source documentation is frequently lacking. In financial services, this can be problematic when seeking to demonstrate a clear audit trail for regulators. Tracking that the right function is being sourced from a specific package or repository of authored functions, as opposed to another function, which may have an identical name, sets up blocks on unfettered usage of these functions within code. Proprietary software, on the other hand, provides a static set of tools, which allows analysts to more easily determine how legacy code has worked over time.

Using Open Source Data Modeling Tools

Deciding on whether to go with open source programs directly impacts financial services firms as they compete to deliver applications to the market. Open source data modeling tools are attractive because of their natural tendency to spur innovation, ingrain adaptability, and propagate flexibility throughout a firm. But proprietary software solutions are also attractive because they provide the support and hard-line uses that may neatly fit within an organization’s goals. The considerations offered here should be weighed appropriately when deciding between open source and proprietary data modeling tools.

Questions to consider before switching platforms include:

  • How does one quantify the management and service costs for using open source programs? Who would work on servicing it, and, once all-in expenses are considered, is it still more cost-effective than a vendor solution?
  • When might it be prudent to move away from proprietary software? In a scenario where moving to a newer open source technology appears to yield significant efficiency gains, when would it make sense to end terms with a vendor?
  • Does the institution have the resources to institute new controls, requirements, and development methods when introducing open source applications?
  • Does the open source application or function have the necessary documentation required for regulatory and audit purposes?

Open source is certainly on the rise as more professionals enter the space with the necessary technical skills and a new perspective on the goals financial institutions want to pursue. As competitive pressures mount, financial institutions are faced with a difficult yet critical decision of whether open source is appropriate for them. Open source may not be a viable solution for everyone—the considerations discussed above may block the adoption of open source for some organizations. However, often the pros outweigh the cons, and there are strategic precautions that can be taken to mitigate any potential risks.


References

 https://www.redhat.com/en/open-source/open-source-way

http://www.stackoverflow.blog/code-for-a-living/how-i-open-sourced-my-way-to-my-dream-job-mohamed-said

https://www.redhat.com/f/pdf/whitepapers/WHITEpapr2.pdf

http://www.forbes.com/sites/benkepes/2013/10/02/open-source-is-good-and-all-but-proprietary-is-still-winning/#7d4d544059e9

https://www.indeed.com/jobtrends/q-SAS-q-R-q-python.html


Open Source Software for Mortgage Data Analysis

While open source has been around for decades, using open source software for mortgage data analysis is a recent trend. Financial institutions have traditionally been slow to adopt the latest data and technology innovations due to the strict regulatory and risk-averse nature of the industry, and open source has been no exception. As open source becomes more mainstream, however, many of our clients have come to us with questions regarding its viability within the mortgage industry.

The short answer is simple: open source has a lot of potential for the financial services and mortgage industries, particularly for data modeling and data analysis. Within our own organization, we frequently use open source data modeling tools for our proprietary models as well as models built for clients. While a degree of risk is inherent, prudent steps can be taken to mitigate them and profit from the many worthwhile benefits of open source.

Open source has a lot of potential for the mortgage industry, particularly for data modeling & analysis @RiskSpan (Click to Tweet)

To address the common concerns that arise with open source, we’ll be publishing a series of blog posts aimed at alleviating these concerns and providing guidelines for utilizing open source software for data analysis within your organization. Some of the questions we’ll address include:

  • Can open source programming languages be applied to mortgage data modeling and data analysis?
  • What risks does open source expose me to and what can I do to mitigate them?
  • What are the pitfalls of open source and do the benefits outweigh them?
  • How does using open source software for mortgage data analysis affect the control and governance of my models?
  • What factors do I need to consider when deciding whether to use open source at my institution?

Throughout the series, we’ll also include examples of how RiskSpan has used open source software for mortgage data analysis, why we chose to use it, and what factors were considered. Before we dive in on the considerations for open source, we thought it would be helpful to offer an introduction to open source and provide some context around its birth and development within the financial industry.

What Is Open Source Software?

Software has conventionally been considered open source when the original code is made publicly available so that anyone can edit, enhance, or modify it freely. This original concept has recently been expanded to incorporate a larger movement built on values of collaboration, transparency, and community.

Open Source Software Vs Proprietary Software

Proprietary software refers to applications for which the source code is only accessible to those who created it. Thus, only the original author(s) has control over any updates or modifications. Outside players are barred from even viewing the code to protect the owners from copying and theft. To use proprietary software, users agree to a licensing agreement and typically pay a fee. The agreement legally binds the user to the owners’ terms and prevents the user from any actions the owners have not expressly permitted.

Open source software, on the other hand, gives any user free rein to view, copy, or modify it. The idea is to foster a community built on collaboration, allowing users to learn from each other and build on each other’s work. Like with proprietary software, open source users must still agree to a licensing agreement, but the terms are very differ significantly from those of a proprietary license.1

History of Open Source Software

The idea of open source software first developed in the 1950s, when much of software development was done by computer scientists in higher education. In line with the value of sharing knowledge among academics, source code was openly accessible. By the 1960s, however, as the cost of software development increased, hardware companies were charging additional fees for software that used to be bundled with their products.

Change came again in the 1980s. At this point, it was clear that technology and software were important factors of the growing business economy. Technology leaders were frustrated with the increasing costs of software. In 1984, Richard Stallman launched the GNU Project with the purpose of creating a complete computer operating system with no limitations on the use of its source code. In 1991, the operating system now referred to as Linux was released.

The final tipping point came in 1997, when Eric Raymond published his book, The Cathedral and the Bazaar, in which he articulated the underlying principles behind open source software. His book was a driving factor in Netscape’s decision to release its source code to the public, inspired by the idea that allowing more people to find and fix bugs will improve the system for everyone. Following Netscape’s release, the term “open source software” was introduced in 1998.

In the data-driven economy of the past two decades, open source has played an ever-increasing role. The field of software development has evolved to embrace the values of open source. Open source has made it not only possible but easy for anyone to access and manipulate source code, improving our ability to create and share valuable software.2

Adoption of Open Source Software in Business

The growing relevance of open source software has also changed the way large organizations approach their software solutions. While open source software was at one point rare in an enterprise’s system, it’s now the norm. A survey conducted by Black Duck Software revealed that fewer than 3% of companies don’t rely on open source at all. Even the most conservative organizations are hopping on board the open source trend.3
Even the most conservative organizations are hopping on board the open source trend.

In a blog post from June 2016, TechCrunch writes:

“Open software has already rooted itself deep within today’s Fortune 500, with many contributing back to the projects they adopt. We’re not just talking stalwarts like Google and Facebook; big companies like Walmart, GE, Merck, Goldman Sachs — even the federal government — are fleeing the safety of established tech vendors for the promises of greater control and capability with open software. These are real customers with real budgets demanding a new model of software.”4

The expected benefits of open source software are alluring all types of institutions, from small businesses, to technology giants, to governments. This shift away from proprietary software in favor of open source is streamlining business operations. As more companies make the switch, those who don’t will fall behind the times and likely be at a serious competitive disadvantage.

Open Source Software for Mortgage Data Analysis

Open source software is slowly finding its way into the financial services industry as well. We’ve observed that smaller entities that don’t have the budgets to buy expensive proprietary software have been turning to open source as a viable substitute. Smaller companies are either building software in house or turning to companies like RiskSpan to achieve a cost-effective solution. On the other hand, bigger companies with the resources to spare are also dabbling in open source. These companies have the technical expertise in house and give their skilled workers the freedom to experiment with open source software.

Within our own work, we see tremendous potential for open source software for mortgage data analysis. Open source data modeling tools like Python, R, and Julia are useful for analyzing mortgage loan and securitization data and identifying historical trends. We’ve used R to build models for our clients and we’re not the only ones: several of our clients are now building their DFAST challenger models using R.

Open source has grown enough in the past few years that more and more financial institutions will make the switch. While the risks associated with open source software will continue to give some organizations pause, the benefits of open source will soon outweigh those concerns. It seems open source is a trend that is here to stay, and luckily, it is a trend ripe with opportunity.


[1] https://opensource.com/resources/what-open-source

[2] https://www.longsight.com/learning-center/history-open-source

[3] https://techcrunch.com/2016/06/19/the-next-wave-in-software-is-open-adoption-software/

[4] https://techcrunch.com/2016/06/19/the-next-wave-in-software-is-open-adoption-software/


Get Started
Log in

Linkedin   

risktech2024