The Why and How of a Successful SAS-to-Python Model Migration
A growing number of financial institutions are migrating their modeling codebases from SAS to Python. There are many reasons for this, some of which may be unique to the organization in question, but many apply universally. Because of our familiarity not only with both coding languages but with the financial models they power, my colleagues and I have had occasion to help several clients with this transition.
Here are some things we’ve learned from this experience and what we believe is driving this change.
Python Popularity
The popularity of Python has skyrocketed in recent years. Its intuitive syntax and a wide array of packages available to aid in development make it one of the most user-friendly programming languages in use today. This accessibility allows users who may not have a coding background to use Python as a gateway into the world of software development and expand their toolbox of professional qualifications.
Companies appreciate this as well. As an open-source language with tons of resources and low overhead costs, Python is also attractive from an expense perspective. A cost-conscious option that resonates with developers and analysts is a win-win when deciding on a codebase.
Note: R is another popular and powerful open-source language for data analytics. Unlike R, however, which is specifically used for statistical analysis, Python can be used for a wider range of uses, including UI design, web development, business applications, and others. This flexibility makes Python attractive to companies seeking synchronicity — the ability for developers to transition seamlessly among teams. R remains popular in academic circles where a powerful, easy-to-understand tool is needed to perform statistical analysis, but additional flexibility is not necessarily required. Hence, we are limiting our discussion here to Python.
Python is not without its drawbacks. As an open-source language, less oversight governs newly added features and packages. Consequently, while updates may be quicker, they are also more prone to error than SAS’s, which are always thoroughly tested prior to release.
Visualization Capabilities
While both codebases support data visualization, Python’s packages are generally viewed more favorably than SAS’s, which tend to be on the more basic side. More advanced visuals are available from SAS, but they require the SAS Visual Analytics platform, which comes at an added cost.
Python’s popular visualization packages — matplotlib, plotly, and seaborn, among others — can be leveraged to create powerful and detailed visualizations by simply importing the libraries into the existing codebase.
Accessibility
SAS is a command-driven software package used for statistical analysis and data visualization. Though available only for Windows operating systems, it remains one of the most widely used statistical software packages in both industry and academia.
It’s not hard to see why. For financial institutions with large amounts of data, SAS has been an extremely valuable tool. It is a well-documented language, with many online resources and is relatively intuitive to pick up and understand – especially when users have prior experience with SQL. SAS is also one of the few tools with a customer support line.
SAS, however, is a paid service, and at a standalone level, the costs can be quite prohibitive, particularly for smaller companies and start-ups. Complete access to the full breadth of SAS and its supporting tools tends to be available only to larger and more established organizations. These costs are likely fueling its recent drop-off in popularity. New users simply cannot access it as easily as they can Python. While an academic/university version of the software is available free of charge for individual use, its feature set is limited. Therefore, for new users and start-up companies, SAS may not be the best choice, despite being a powerful tool. Additionally, with the expansion and maturity of the variety of packages that Python offers, many of the analytical abilities of Python now rival those of SAS, making it an attractive, cost-effective option even for very large firms.
Future of tech
Many of the expected advances in data analytics and tech in general are clearly pointing toward deep learning, machine learning, and artificial intelligence in general. These are especially attractive to companies dealing with large amounts of data.
While the technology to analyze data with complete independence is still emerging, Python is better situated to support companies that have begun laying the groundwork for these developments. Python’s rapidly expanding libraries for artificial intelligence and machine learning will likely make future transitions to deep learning algorithms more seamless.
While SAS has made some strides toward adding machine learning and deep learning functionalities to its repertoire, Python remains ahead and consistently ranks as the best language for deep learning and machine learning projects. This creates a symbiotic relationship between the language and its users. Developers use Python to develop ML projects since it is currently best suited for the job, which in turn expands Python’s ML capabilities — a cycle which practically cements Python’s position as the best language for future development in the AI sphere.
Overcoming the Challenges of a SAS-to-Python Migration
SAS-to-Python migrations bring a unique set of challenges that need to be considered. These include the following.
Memory overhead
Server space is getting cheaper but it’s not free. Although Python’s data analytics capabilities rival SAS’s, Python requires more memory overhead. Companies working with extremely large datasets will likely need to factor in the cost of extra server space. These costs are not likely to alter the decision to migrate, but they also should not be overlooked.
The SAS server
All SAS commands are run on SAS’s own server. This tightly controlled ecosystem makes SAS much faster than Python, which does not have the same infrastructure out of the box. Therefore, optimizing Python code can be a significant challenge during SAS-to-Python migrations, particularly when tackling it for the first time.
SAS packages vs Python packages
Calculations performed using SAS packages vs. Python packages can result in differences, which, while generally minuscule, cannot always be ignored. Depending on the type of data, this can pose an issue. And getting an exact match between values calculated in SAS and values calculated in Python may be difficult.
For example, the true value of “0” as a float datatype in SAS is approximated to 3.552714E-150, while in Python float “0” is approximated to 3602879701896397/255. These values do not create noticeable differences in most calculations. But some financial models demand more precision than others. And over the course of multiple calculations which build upon each other, they can create differences in fractional values. These differences must be reconciled and accounted for.
Comparing large datasets
One of the most common functions when working with large datasets involves evaluating how they change over time. SAS has a built-in function (proccompare) which compares datasets swiftly and easily as required. Python has packages for this as well; however, these packages are not as robust as their SAS counterparts.
Conclusion
In most cases, the benefits of migrating from SAS to Python outweigh the challenges associated with going through the process. The envisioned savings can sometimes be attractive enough to cause firms to trivialize the transition costs. This should be avoided. A successful migration requires taking full account of the obstacles and making plans to mitigate them. Involving the right people from the outset — analysts well versed in both languages who have encountered and worked through the pitfalls — is key.




 
			 
			
 
			
 
			 
			 
			
 
			


 
			

