Linkedin    Twitter   Facebook

Get Started
Log In

Linkedin

Articles Tagged with: Credit Analytics

EDGE: An Update on GNMA Delinquencies

In this short post, we update the state of delinquencies for GNMA multi-lender cohorts, by vintage and coupon. As the Ginnie market has shifted away from bank servicers, non-bank servicers now account for more than 75% of GNMA servicing, and even higher percentages in recent-vintage cohorts.  

The table below summarizes delinquencies for GN2 cohorts where outstanding balance is greater than $10 billion. The table also highlights, in red, cohorts where delinquencies are more than 85% attributable to non-bank servicersThat non-banks are servicing so many delinquencies is not surprising given the historical reluctance (or inability)of these servicers to repurchase delinquent mortgages out of pools (see our recent analysis on this here). This is contributing to an extreme overhang of non-bankserviced delinquencies in recent-vintage GNMA cohorts. 

The 60-day+ delinquencies for 2018 GN2 3.5s get honorable mention, with the non-bank delinquencies totaling 84% of all delinquencies, just below our 85% threshold. At the upper end, delinquencies in 2017 30yr 4s were 93% attributable to non-bank servicers, and they serviced nearly 90% of 2019 delinquencies across all coupons.

The delinquencies in this analysis are predominantly loans that are six-months or more delinquent and in COVID forbearance.[1] Current guidance from GNMA gives servicers the latitude to leave these loans in pools without exceeding their seriously delinquent threshold.[2] However, as noted in our previous research, several non-bank servicers have started to increase their buyout activity, driven by joint-ventures with GNMA EBO investors and combined with a premium bid for reperforming GNMA RG pools. While we saw a modest pullback in recent buyout activity from Lakeview,[3] which has been at the vanguard of the activity, the positive economics of the trade indicates that we will likely see continued increases in repurchases, with 2018-19 production premiums bearing the brunt of involuntary speed increases.


Contact us if you are interested in seeing variations on this theme. Using Edge, we can examine any loan characteristic and generate a S-curve, aging curve, or time series.


[1] Breakdown of delinquencies available on request.

[2] GNMA APM 2020-17 extended to July 31st the exemption of counting post-COVID delinquencies as part of the servicer’s Seriously Delinquent count.

[3] Lakeview repurchased 15% of seriously delinquent loans in January, down from 22% in December. Penny Mac and Carrington continued their repurchases at their recent pace.


Overcoming Data Limitations (and Inertia) to Factor Climate into Credit Risk Modeling

With each passing year, it is becoming increasingly clear to mortgage credit investors that climate change is emerging as a non-trivial risk factor that must be accounted for. Questions around how precisely to account for this risk, however, and who should ultimately bear it, remain unanswered. 

Current market dynamics further complicate these questionsLate last year, Politico published this special report laying out the issues surrounding climate risk as it relates to mortgage finance. Even though almost everyone agrees that underinsured natural disaster risk is a problem, the Politico report outlines several forces that make it difficult for anyone to do anything about it. The massive undertaking of bringing old flood zone maps up to date is just one exampleAs Politico puts it: 

The result, many current and former federal housing officials acknowledge, is a peculiar kind of stasis — a crisis that everyone sees coming but no one feels empowered to prevent, even as banks and investors grow far savvier about assessing climate risk. 

At some point, however, we will reach a tipping point – perhaps a particularly devastating event (or series of events) triggering significant losses. As homeowners, the GSEs, and other mortgage credit investors point fingers at one another (and inevitably at the federal government) a major policy update will become necessary to identify who ultimately bears the brunt of mispriced climate risk in the marketOnce quantified and properly assigned, the GSEs will price in climate risk in the same way they bake in other contributors to credit risk — via higher guarantee fees. For non-GSE (and CRT) loans, losses will continue to be borne by whoever holds the credit risk 

Recognizing that such an event may not be far off, the GSEs, their regulator, and everyone else with credit exposure are beginning to appreciate the importance of understanding the impact of climate events on mortgage performance. This is not easily inferred from the historical data record, however. And those assessing risk need to make informed assumptions about how historically observed impacts will change in the future. 

The first step in constructing these assumptions is to compile a robust historical dataset. To this end, RIskSpan began exploring the impact of certain hurricanes a few years ago. This initial analysis revealed a significant impact on short-term mortgage delinquency rates (not surprisingly), but less of an impact on default rates. In other words, affected borrowers encountered hardship but ultimately recovered. 

This research is preliminary, however, and more data will be necessary to build scenario assumptions as climate events become more severe and widespread. As more data covering more events—including wildfires—becomes available, RiskSpan is engaged in ongoing research to tease out the impact each of these events has on mortgage performance.  

It goes without saying that climate scenario assumptions need to be grounded in reality to be useful to credit investors. Because time-series data relationships are not always detectable using conventional means, especially when data is sparse, ware beginning to see promise in leveraging various machine learning techniques to this endWe believe this historical, machine-learning-based research will provide the backbone for an approach that merges historical effects of events with inputs about the increasing frequency and severity of these events as they become better understood and more quantifiable. 

Precise forecasting of severe climate events by zip code in any given year is not here yet. But an increasingly reliable framework for gauging the likely impact of these events on mortgage performance is on the horizon.  


RiskSpan’s Edge Platform Wins 2021 Buy-Side Market Risk Management Product of the Year

RiskSpan, a leading SaaS provider of risk management, data and analytics has been awarded Buy-Side Market Risk Management Product of the Year for its Edge Platform at Risk.net’s 2021 Risk Markets Technology Awards. The honor marks Edge’s second major industry award in 2021, having also been named the winner of Chartis Research’s Risk-as-a-Service category.

RMTA21-BSMRMPOTYLicensed by some of the largest asset managers and Insurance companies in the U.S., a significant component of the Edge Platform’s value is derived from its ability to serve as a one-stop shop for research, pre-trade analytics, pricing and risk quantification, and reporting. Edge’s cloud-native infrastructure allows RiskSpan clients to scale as needs change and is supported by RiskSpan’s unparalleled team of domain experts — seasoned practitioners who know the needs and pain points of the industry firsthand

Adjudicators cited the platform’s “strong data management and overall technology” and “best-practice quant design for MBS, structured products and loans” as key factors in the designation.

GET A DEMO

Edge’s flexible configurability enables users to create custom views of their portfolio or potential trades at any level of granularity and down to the loan level. The platform enables researchers and analysts to integrate conventional and alternative data from an impressive array of sources to identify impacts that might otherwise go overlooked.

For clients requiring a fully supported risk-analytics-as-a-service offering, the Edge Platform provides a comprehensive data analysis, predictive modeling, portfolio benchmarking and reporting solution tailored to individual client needs.

An optional studio-level tier incorporates machine learning and data scientist support in order to leverage unstructured and alternative datasets in the analysis.


Contact us to learn how Edge’s capabilities can transform your mortgage and structured product analytics. 

Learn more about Edge at https://riskspan.com/edge-platform/ 


The NRI: An Emerging Tool for Quantifying Climate Risk in Mortgage Credit

Climate change is affecting investment across virtually every sector in a growing number of mostly secondary ways. Its impact on mortgage credit investors, however, is beginning to be felt more directly.

Mortgage credit investors are investors in housing. Because housing is subject to climate risk and borrowers whose houses are destroyed by natural disasters are unlikely to continue paying their mortgages, credit investors have a vested interest in quantifying the risk of these disasters.

To this end, RiskSpan is engaged in leveraging the National Risk Index (NRI) to assess the natural disaster and climate risk exposure of mortgage portfolios.

This post introduces the NRI data in the context of mortgage portfolio analysis (loans or mortgage-backed securities), including what the data contain and key considerations when putting together an analysis. A future post will outline an approach for integrating this data into a framework for scenario analysis that combines this data with traditional mortgage credit models.

The National Risk Index

The National Risk Index (NRI) was released in October 2020 through a collaboration led by FEMA. It provides a wealth of new geographically specific data on natural hazard risks across the country. The index and its underlying data were designed to help local governments and emergency planners to better understand these risks and to plan and prepare for the future.

The NRI provides information on both the frequency and severity of natural risk events. The level of detailed underlying data it provides is astounding. The NRI focuses on 18 natural risks (discussed below) and provides detailed underlying components for each. The severity of an event is broken out by damage to buildings, agriculture, and loss of life. This breakdown lets us focus on the severity of events relative to buildings. While the definition of building here includes all types of real estate—houses, commercial, rental, etc.—having the breakdown provides an extra level of granularity to help inform our analysis of mortgages.

The key fields that provide important information for a mortgage portfolio analysis are bulleted below. The NRI provides these data points for each of the 18 natural hazards and each geography they include in their analysis.

  • Annualized Event Frequency
  • Exposure to Buildings: Total dollar amount of exposed buildings
  • Historical Loss Ratio for Buildings (Bayesian methods to derive this estimate, such that every geography is covered for its relevant risks)
  • Expected Annual Loss for Buildings
  • Population estimates (helpful for geography weighting)

Grouping Natural Disaster Risks for Mortgage Analysis

The NRI data covers 18 natural hazards, which pose varying degrees of risk to housing. We have found the framework below to be helpful when considering which risks to include in an analysis. We group the 18 risks along two axes:

1) The extent to which an event is impacted by climate change, and

2) An event’s potential to completely destroy a home.

Earthquakes, for example, have significant destructive potential, but climate change is not a major contributor to earthquakes. Conversely, heat waves and droughts wrought by climate change generally do not pose significant risk to housing structures.

When assessing climate risk, RiskSpan typically focuses on the five natural hazard risks in the top right quadrant below.

Immediate Event Risk versus Cumulative Event Risk

Two related but distinct risks inform climate risk analysis.

  1. Immediate Event Analysis: The risk of mortgage delinquency and default resulting directly from a natural disaster eventhome severely damaged or destroyed by a hurricane, for example.  
  2. Cumulative Event Risk: Less direct than immediate event risk, this is the risk of widespread home price declines across an entire area communities because of increasing natural hazard risk brought on by climate changeThese secondary effects include: 
    • Heightened homebuyer awareness or perception of increasing natural hazard risk,
    • Property insurance premium increases or areas becoming ‘self-insured, 
    • Government policy impacts (e.g., potential flood zone remapping), and 
    • Potential policy changes related to insurance from key players in the mortgage market (i.e., Fannie Mae, Freddie Mac, FHFA, etc.). 

NRI data provides an indication of the probability of immediate event occurrence and its historic severity in terms of property losses. We can also empirically observe historical mortgage performance in the wake of previous natural disaster events. Data covering several hurricane and wildfire events are available.

Cumulative event risk is less observable. A few academic papers attempt to tease out these impacts, but the risk of broader home price declines typically needs to be incorporated into a risk assessment framework through transparent scenario overlays. Examples of such scenarios include home price declines of as much as 20% in newly flood-exposed areas of South Florida. There is also research suggesting that there are often long term impacts to consumer credit following a natural disaster 

Geography Normalization

Linking to the NRI is simple when detailed loan pool geographic data are available. Analysts can merge by census tract or county code. Census tract is the more geographically granular measure and provides a more detailed analysis.

For many capital markets participants, however, that level of geographic specific detail is not available. At best, an investor may have a 5-digit or 3-digit zip code. Zip codes do not directly match to a given county or census tract and can potentially span across those distinctions.

There is no perfect way to perform the data link when zip code is the only available geographic marker. We take an approach that leverages the other data on housing stock by census tract to weight mortgage portfolio data when multiple census tracts map to a zip code.

Other Data Limitations

The loss information available represents a simple historical average loss rate given an event. But hurricanes (and hurricane seasons) are not all created equal. The same is true of other natural disasters. Relying on averages may work over long time horizons but could significantly underpredict or overpredict loss in a particular year. Further, the frequency of events is rising so that what used to be considered 100 year event may be closer to a 10 or 20 year event. Lacking data about what losses might look like under extreme scenarios makes modeling such events problematic.

The data also make it difficult to take correlation into account. Hurricanes and coastal flooding are independent events in the dataset but are obviously highly correlated with one another. The impact of a large storm on one geographic area is likely to be correlated with that of nearby areas (such as when a hurricane makes its way up the Eastern Seaboard).

The workarounds for these limitations have limitations of their own. But one solution involves designing transparent assumptions and scenarios related to the probability, severity, and correlation of stress events. We can model outlier events by assuming that losses for a particular peril follow a normal distribution with set standard deviations. Other assumptions can be made about correlations between perils and geographies. Using these assumptions, stress scenarios can be derived by picking a particular percentile along the loss distribution.

A Promising New Credit Analysis Tool for Mortgages

Notwithstanding its limitations, the new NRI data is a rich source of information that can be leveraged to help augment credit risk analysis of mortgage and mortgage-backed security portfolios. The data holds great promise as a starting point (and perhaps more) for risk teams starting to put together climate risk and other ESG analysis frameworks.


Cash-out Refis, Investment Properties Contribute to Uptick in Agency Mortgage Risk Profile

RiskSpan’s Vintage Quality Index is a monthly measure of the relative risk profile of Agency mortgages. Higher VQI levels are associated with mortgage vintages containing higher-than-average percentages of loans with one or more “risk layers.”

These risk layers, summarized below, reflect the percentage of loans with low FICO scores (below 660), high loan-to-value ratios (above 80%), high debt-to-income ratios (above 45%), adjustable rate features, subordinate financing, cash-out refis, investment properties, multi-unit properties, and loans with only one borrower.

The RiskSpan VQI rose 4.2 points at the end of 2020, reflecting a modest increase in the risk profile of loans originated during the fourth quarter relative to the early stages of the pandemic.

The first rise in the index since February was driven by modest increases across several risk layers. These included cash-out refinances (up 2.5% to a 20.2% share in December), single borrower loans (up 1.8% to 52.0%) and investor loans (up 1.4% to 6.0%). Still, the December VQI sits more than 13 points below its local high in February 2020, and more than 28 points below a peak seen in January 2019.

While the share of cash-out refinances has risen some from these highs, the risk layers that have driven most of the downward trend in the overall VQI – percentage of loans with low FICO scores and high LTV and DTI ratios – remain relatively low. These layers have been trending downward for a number of years now, reflecting a tighter credit box, and the pandemic has only exacerbated tightening.

Population assumptions:

  • Monthly data for Fannie Mae and Freddie
  • Loans originated more than three months prior to issuance are excluded because the index is meant to reflect current market
  • Loans likely to have been originated through the HARP program, as identified by LTV, MI coverage percentage, and loan purpose, are also These loans do not represent credit availability in the market as they likely would not have been originated today but for the existence of HARP.

Data assumptions:

  • Freddie Mac data goes back to 12/2005. Fannie Mae only back to 12/2014.
  • Certain fields for Freddie Mac data were missing prior to 6/2008.
  • GSE historical loan performance data release in support of GSE Risk Transfer activities was used to help back-fill data where it was missing.

This analysis is developed using RiskSpan’s Edge Platform. To learn more or see a free, no-obligation demo of Edge’s unique data and modeling capabilities, please contact us.


RiskSpan VQI: Current Underwriting Standards Q3 2020

Sept 2020 Vintage Quality Index

Riskspan VQI Historical Trend

Riskspan VQI Historical Trend

RiskSpan’s Vintage Quality Index, which had declined sharply in the first half of the year, leveled off somewhat in the third quarter, falling just 2.8 points between June and September, in contrast to its 12 point drop in Q2.

This change, which reflects a relative slowdown in the tightening of underwriting standards reflects something of a return to stability in the Agency origination market.

Driven by a drop in cash-out refinances (down 2.3% in the quarter), the VQI’s gradual decline left the standard credit-related risk attributes (FICO, LTV, and DTI) largely unchanged.

The share of High-LTV loans (loans with loan-to-value ratios over 80%) which fell 1.3% in Q3, has fallen dramatically over the last year–1.7% in total. More than half of this drop (6.1%) occurred before the start of the COVID-19 crisis. This suggests that, even though the Q3 VQI reflects tightening underwriting standards, the stability of the credit-related components, coupled with huge volumes from the GSEs, reflects a measure of stability in credit availability.

Risk Layers Historical Trend

Risk Layers – September 20 – All Issued Loans By Count

FICO < 660 - Share Issued Loans

Loan to Value > 80 - Share of Issued Loans

Debt-to-Income > 45 - Share of Issued Loans

Ajustable-Rate-Share-of-Issued-Loans

Loans-w-Subordinate-Financing-Sept-2020

Cashout-Refinance

Risk Layers – September 20 – All Issued Loans By Count

Loan-Occupancy

Multi-Unit-Share-of-Issued-Loans

One-Borrower-Loans

Analytical And Data Assumptions

Population assumptions:

  • Monthly data for Fannie Mae and Freddie Mac.

  • Loans originated more than three months prior to issuance are excluded because the index is meant to reflect current market conditions.

  • Loans likely to have been originated through the HARP program, as identified by LTV, MI coverage percentage, and loan purpose are also excluded. These loans do not represent credit availability in the market as they likely would not have been originated today but for the existence of HARP.                                                                                                                          

Data assumptions:

  • Freddie Mac data goes back to 12/2005. Fannie Mae only back to 12/2014.

  • Certain fields for Freddie Mac data were missing prior to 6/2008.   

GSE historical loan performance data release in support of GSE Risk Transfer activities was used to help back-fill data where it was missing.

An outline of our approach to data imputation can be found in our VQI Blog Post from October 28, 2015.                                                


Consistent & Transparent Forbearance Reporting Needed in the PLS Market

There is justified concern within the investor community regarding the residential mortgage loans currently in forbearance and their ultimate resolution. Although most of the 4M loans in forbearance are in securities backed by the Federal Government (Fannie Mae, Freddie Mac or Ginnie Mae), approximately 400,000 loans currently in forbearance represent collateral that backs private-label residential mortgage-backed securities (PLS). The PLS market operates without clear, articulated standards for forbearance programs and lacks the reporting practices that exist in Agency markets. This leads to disparate practices for granting forbearance to borrowers and a broad range of investor reporting by different servicers. COVID-19 has highlighted the need for transparent, consistent reporting of forbearance data to investors to support a more efficient PLS market.

Inconsistent investor reporting leaves too much for interpretation. It creates investor angst while making it harder to understand the credit risk associated with underlying mortgage loans. RiskSpan performed an analysis of 2,542 PLS deals (U.S. only) for which loan-level foreclosure metrics are available. The data shows that approximately 78% of loans reported to be in forbearance were backing deals originated between 2005-2008 (“Legacy Bonds”).  As you would expect, new issue PLS has a smaller percentage of loans reported to be in forbearance.

% total forebearance UPB

Not all loans in forbearance will perform the same and it is critical for investors to receive transparent reporting of underlying collateral within their PLS portfolio in forbearance.  These are unchartered times and, unlike historic observations of borrowers requesting forbearance, many loans presently in forbearance are still current on their mortgage payments. In these cases, they have elected to join a forbearance program in case they need it at some future point. Improved forbearance reporting will help investors better understand if borrowers will eventually need to defer payments, modify loan terms, or default leading to foreclosure or sale of the property.

In practice, servicers have followed GSE guidance when conducting forbearance reviews and approval. However, without specific guidance, servicers are working with inconsistent policies and procedures developed on a company-by-company basis to support the COVID forbearance process. For example, borrowers can be forborne for 12-months according to FHFA guidance. Some servicers have elected to take a more conservative approach and are providing forbearance in 3-month increments with extensions possible once a borrower confirms they remain financially impacted by the COVID pandemic.

Servicers have the data that investors want to analyze. Inconsistent practices in the reporting of COVID forbearances by servicers and trustees has resulted in forbearance data being unavailable on certain transactions. This means investors are not able to get a clear picture of the financial health of borrowers in transactions. In some cases, trustees are not reporting forbearance information to investors which makes it nearly impossible to obtain a reliable credit assessment of the underlying collateral.  

The PLS market has attempted to identify best practices for monthly loan-level reporting to properly assess the risk of loans where forbearance has been granted.  Unfortunately, the current market crisis has highlighted that not all market participants have adopted the best practices and there are not clear advantages for issuers and servicers to provide clear, transparent forbearance reporting. At a minimum, RiskSpan recommends that the following forbearance data elements be reported by servicers for PLS transactions:

  • Last Payment Date: The last contractual payment date for a loan (i.e. the loan’s “paid- through date”).
  • Loss Mitigation Type: A code indicating the type of loss mitigation the servicer is pursuing with the borrower, loan, or property.
  • Forbearance Plan Start Date: The start date when either a) no payment or b) a payment amount less than the contractual obligation has been granted to the borrower.
  • Forbearance Plan Scheduled End Date: The date on which a Forbearance Plan is scheduled to end.
  • Forbearance Exit – Reason Code: The reason provided by the borrower for exiting a forbearance plan.
  • Forbearance Extension Requested: Flag indicating the borrower has requested one or more forbearance extensions.
  • Repayment Plan Start Date: The start date for when a borrower has agreed to make monthly mortgage payments greater than the contractual installment in an effort to repay amounts due during a Forbearance Plan.
  • Repayment Plan Scheduled End Date: The date at which a Repayment Plan is scheduled to end.
  • Repayment Plan Violation Date: The date when the borrower ceased complying with the terms of a defined repayment plan.

The COVID pandemic has highlighted monthly reporting weaknesses by servicers in PLS transactions. Based on investor discussions, additional information is needed to accurately assess the financial health of the underlying collateral. Market participants should take the lessons learned from the current crisis to re-examine prior attempts to define monthly reporting best practices. This includes working with industry groups and regulators to implement consistent, transparent reporting policies and procedures that provide investors with improved forbearance data.


The Why and How of a Successful SAS-to-Python Model Migration

A growing number of financial institutions are migrating their modeling codebases from SAS to Python. There are many reasons for this, some of which may be unique to the organization in question, but many apply universally. Because of our familiarity not only with both coding languages but with the financial models they power, my colleagues and I have had occasion to help several clients with this transition.

Here are some things we’ve learned from this experience and what we believe is driving this change.

Python Popularity

The popularity of Python has skyrocketed in recent years. Its intuitive syntax and a wide array of packages available to aid in development make it one of the most user-friendly programming languages in use today. This accessibility allows users who may not have a coding background to use Python as a gateway into the world of software development and expand their toolbox of professional qualifications.

Companies appreciate this as well. As an open-source language with tons of resources and low overhead costs, Python is also attractive from an expense perspective. A cost-conscious option that resonates with developers and analysts is a win-win when deciding on a codebase.

Note: R is another popular and powerful open-source language for data analytics. Unlike R, however, which is specifically used for statistical analysis, Python can be used for a wider range of uses, including UI design, web development, business applications, and others. This flexibility makes Python attractive to companies seeking synchronicity — the ability for developers to transition seamlessly among teams. R remains popular in academic circles where a powerful, easy-to-understand tool is needed to perform statistical analysis, but additional flexibility is not necessarily required. Hence, we are limiting our discussion here to Python.

Python is not without its drawbacks. As an open-source language, less oversight governs newly added features and packages. Consequently, while updates may be quicker, they are also more prone to error than SAS’s, which are always thoroughly tested prior to release.

CONTACT US

Visualization Capabilities

While both codebases support data visualization, Python’s packages are generally viewed more favorably than SAS’s, which tend to be on the more basic side. More advanced visuals are available from SAS, but they require the SAS Visual Analytics platform, which comes at an added cost.

Python’s popular visualization packages — matplotlib, plotly, and seaborn, among others — can be leveraged to create powerful and detailed visualizations by simply importing the libraries into the existing codebase.

Accessibility

SAS is a command-driven software package used for statistical analysis and data visualization. Though available only for Windows operating systems, it remains one of the most widely used statistical software packages in both industry and academia.

It’s not hard to see why. For financial institutions with large amounts of data, SAS has been an extremely valuable tool. It is a well-documented language, with many online resources and is relatively intuitive to pick up and understand – especially when users have prior experience with SQL. SAS is also one of the few tools with a customer support line.

SAS, however, is a paid service, and at a standalone level, the costs can be quite prohibitive, particularly for smaller companies and start-ups. Complete access to the full breadth of SAS and its supporting tools tends to be available only to larger and more established organizations. These costs are likely fueling its recent drop-off in popularity. New users simply cannot access it as easily as they can Python. While an academic/university version of the software is available free of charge for individual use, its feature set is limited. Therefore, for new users and start-up companies, SAS may not be the best choice, despite being a powerful tool. Additionally, with the expansion and maturity of the variety of packages that Python offers, many of the analytical abilities of Python now rival those of SAS, making it an attractive, cost-effective option even for very large firms.

Future of tech

Many of the expected advances in data analytics and tech in general are clearly pointing toward deep learning, machine learning, and artificial intelligence in general. These are especially attractive to companies dealing with large amounts of data.

While the technology to analyze data with complete independence is still emerging, Python is better situated to support companies that have begun laying the groundwork for these developments. Python’s rapidly expanding libraries for artificial intelligence and machine learning will likely make future transitions to deep learning algorithms more seamless.

While SAS has made some strides toward adding machine learning and deep learning functionalities to its repertoire, Python remains ahead and consistently ranks as the best language for deep learning and machine learning projects. This creates a symbiotic relationship between the language and its users. Developers use Python to develop ML projects since it is currently best suited for the job, which in turn expands Python’s ML capabilities — a cycle which practically cements Python’s position as the best language for future development in the AI sphere.

Overcoming the Challenges of a SAS-to-Python Migration

SAS-to-Python migrations bring a unique set of challenges that need to be considered. These include the following.

Memory overhead

Server space is getting cheaper but it’s not free. Although Python’s data analytics capabilities rival SAS’s, Python requires more memory overhead. Companies working with extremely large datasets will likely need to factor in the cost of extra server space. These costs are not likely to alter the decision to migrate, but they also should not be overlooked.

The SAS server

All SAS commands are run on SAS’s own server. This tightly controlled ecosystem makes SAS much faster than Python, which does not have the same infrastructure out of the box. Therefore, optimizing Python code can be a significant challenge during SAS-to-Python migrations, particularly when tackling it for the first time.

SAS packages vs Python packages

Calculations performed using SAS packages vs. Python packages can result in differences, which, while generally minuscule, cannot always be ignored. Depending on the type of data, this can pose an issue. And getting an exact match between values calculated in SAS and values calculated in Python may be difficult.

For example, the true value of “0” as a float datatype in SAS is approximated to 3.552714E-150, while in Python float “0” is approximated to 3602879701896397/255. These values do not create noticeable differences in most calculations. But some financial models demand more precision than others. And over the course of multiple calculations which build upon each other, they can create differences in fractional values. These differences must be reconciled and accounted for.

Comparing large datasets

One of the most common functions when working with large datasets involves evaluating how they change over time. SAS has a built-in function (proccompare) which compares datasets swiftly and easily as required. Python has packages for this as well; however, these packages are not as robust as their SAS counterparts. 

Conclusion

In most cases, the benefits of migrating from SAS to Python outweigh the challenges associated with going through the process. The envisioned savings can sometimes be attractive enough to cause firms to trivialize the transition costs. This should be avoided. A successful migration requires taking full account of the obstacles and making plans to mitigate them. Involving the right people from the outset — analysts well versed in both languages who have encountered and worked through the pitfalls — is key.


August 12 Webinar: Good Models, Bad Scenarios? Delinquency, Forbearance, and COVID

Recorded: August 12th | 1:00 p.m. EDT

Business-as usual macroeconomic scenarios that seemed sensible a few months ago are now obviously incorrect. Off-the-shelf models likely need enhancements. How can institutions adapt? 

Credit modelers don’t need to predict the future. They just need to forecast how borrowers are likely to respond to changing economic conditions. This requires robust datasets and insightful scenario building.

Let our panel of experts walk you through how they approach scenario building, including:

  • How mortgage delinquencies have traditionally tracked unemployment and how these assumptions may need to be altered when unemployment is concentrated in non-homeowning population segments.
  • The likely impacts of home purchases and HPI on credit performance.
  • Techniques for translating macroeconomic scenarios into prepayment and default vectors.


Featured Speakers

Shelley Klein

Shelley Klein

VP of Loss Forecast and Allowance, Fannie Mae

Janet Jozwik

Janet Jozwik

Managing Director, RiskSpan

Suhrud-Dagli

Suhrud Dagli

Co-founder and CIO, RiskSpan

Michael Neal

Michael Neal

Senior Research Associate, The Urban Institute


COVID-19 and the Cloud

COVID-19 creates a need for analytics in real time

Regarding the COVID-19 pandemic, Warren Buffet has observed that we haven’t faced anything that quite resembles this problem” and the fallout is “still hard to evaluate. 

The pandemic has created unprecedented shock to economies and asset performance. The recent unemployment  data, although encouraging , has only added to the uncertaintyFurthermore, impact and recovery are unevenoften varying considerably from county to county and city to city. Consider: 

  1. COVID-19 cases and fatalities were initially concentrated in just a few cities and counties resulting in almost a total shutdown of these regions. 
  2. Certain sectors, such as travel and leisure, have been affected worse than others while other sectors such as oil and gas have additional issues. Regions with exposure to these sectors have higher unemployment rates even with fewer COVID-19 cases. 
  3. Timing of reopening and recoveries has also varied due to regional and political factors. 

Regional employment, business activity, consumer spending and several other macro factors are changing in real time. This information is available through several non-traditional data sources. 

Legacy models are not working, and several known correlations are broken. 

Determining value and risk in this environment is requiring unprecedented quantities of analytics and on-demand computational bandwidth. 

COVID-19 in the Cloud

Need for on-demand computation and storage across the organization 

I don’t need a hard disk in my computer if I can get to the server faster… carrying around these non-connected computers is byzantine by comparison.” ~ Steve Jobs 


Front office, risk management, quants and model risk management – every aspect of the analytics ecosystem requires the ability to run large number of scenarios quickly. 

Portfolio managers need to recalibrate asset valuation, manage hedges and answer questions from senior management, all while looking for opportunities to find cheap assets. Risk managers are working closely with quants and portfolio managers to better understand the impact of this unprecedented environment on assets. Quants must not only support existing risk and valuation processes but also be able to run new estimations and explain model behavior as data streams in from variety of sources. 

These activities require several processors and large storage units to be stood up on-demand. Even in normal times infrastructure teams require at least 10 to 12 weeks to procure and deploy additional hardware. With most of the financial services world now working remotely, this time lag is further exaggerated.  

No individual firm maintains enough excess capacity to accommodate such a large and urgent need for data and computation. 

The work-from-home model has proven that we have sufficient internet bandwidth to enable the fast access required to host and use data on the cloud. 

Cloud is about how you do computing

“Cloud is about how you do computing, not where you do computing.” ~ Paul Maritz, CEO of VMware 


Cloud computing is now part of everyday vocabulary and powers even the most common consumer devices. However, financial services firms are still in early stages of evaluating and transitioning to a cloud-based computing environment. 

Cloud is the only way to procure the level of surge capacity required today. At RiskSpan we are computing an average of half-million additional scenarios per client on demand. Users don’t have the luxury to wait for an overnight batch process to react to changing market conditions. End users fire off a new scenario assuming that the hardware will scale up automagically. 

When searching Google’s large dataset or using Salesforce to run analytics we expect the hardware scaling to be limitless. Unfortunately, valuation and risk management software are typically built to run on a pre-defined hardware configuration.  

Cloud native applications, in contrast, are designed and built to leverage the on-demand scaling of a cloud platform. Valuation and risk management products offered as SaaS scale on-demand, managing the integration with cloud platforms. 

Financial services firms don’t need to take on the burden of rewriting their software to work on the cloud. Platforms such as RS Edge enable clients to plug their existing data, assumptions and models into a cloudnative platform. This enables them to get all the analytics they’ve always had—just faster and cheaper.  

Serverless access can also help companies provide access to their quant groups without incurring additional IT resource expense. 

A recent survey from Flexera shows that 30% of enterprises have increased their cloud usage significantly due to COVID-19.

COVID-19 in the Cloud

Cloud is cost effective 

In 2000, when my partner Ben Horowitz was CEO of the first cloud computing company, Loudcloud, the cost of a customer running a basic Internet application was approximately $150,000 a month.”  ~ Marc Andreessen, Co-founder of Netscape, Board Member of Facebook 


Cloud hardware is cost effective, primarily due to the on-demand nature of the pricing model. $250B asset manager uses RS Edge to run millions of scenarios for a 45minute period every day. Analysis is performed over a thousand servers at a cost of $500 per month. The same hardware if deployed for 24 hours would cost $27,000 per month 

Cloud is not free and can be a two-edged sword. The same on-demand aspect thaenables end users to spin up servers as needed, if not monitoredcan cause the cost of such servers to accumulate to undesirable levelsOne of the benefits of a cloud-native platform is built-on procedures to drop unused servers, which minimizes the risk of paying for unused bandwidth. 

And yes, Mr. Andreeseen’s basic application can be hosted today for less than $100 per month 

The same survey from Flexera shows that organizations plan to increase public cloud spending by 47% over the next 12 months. 

COVID-19 in the Cloud

Alternate data analysis

“The temptation to form premature theories upon insufficient data is the bane of our profession.” ~ Sir Arthur Conan Doyle, Sherlock Holmes.


Alternate data sources are not always easily accessible and available within analytic applications. The effort and time required to integrate them can be wasted if the usefulness of the information cannot be determined upfront. Timing of analyzing and applying the data is key. 

Machine learning techniques offer quick and robust ways of analyzing data. Tools to run these algorithms are not readily available on a desktop computer.  

Every major cloud platform provides a wealth of tools, algorithms and pre-trained models to integrate and analyze large and messy alternate datasets. 

Join fintova’s Gary Maier and me at 1 p.m. EDT on June 24th as we discuss other important factors to consider when performing analytics in the cloud. Register now.


Get Started
Log in

Linkedin   

risktech2024