Data Science

Legacy code to open source: Generative AI to the rescue

Yuval Zukerman2024-06-11 | 5 min read

A data science leader works with a younger colleague
Return to blog home

Many leading companies have an extensive portfolio of analytics and predictive models. Your team created these assets over the years. These models are now indispensable for your company's operations. Yet the technology you used to create them is starting to lag. And the number of people proficient in the tools is diminishing. As a data, AI, or IT leader — you are no longer at an intersection. Until recently, you faced the unappetizing premise of training recent graduates in a technology that's approaching a dead end. You also had to pay exorbitant license fees.

You knew open source with Python or R is where you needed to be. But transitioning your models' code was too risky and time-consuming. Yet now, Generative AI (GenAI) offers help automating this transition. It will save you time and, most importantly, a lot of money. It will even help retention and team morale. Let's explore how!

Beyond just technical debt

Fifteen years ago, your company started collecting increasing amounts of data across its operations. You also adopted the latest and greatest statistics, analytics, and reporting tools. These commercial tools made investigating and processing data easy and efficient and helped your team create your first models. Connecting to data and other applications required another set of proprietary tools.

Your company now embraces cloud-scale data and processing. As a result, that reliable, decades-old analytics technology is hitting a wall — hard. Tools like that take more time to offer you access to the cloud, let alone the newest advancements. Don't even think about GenAI techniques. And when old tech slows your company's progress down, management starts to care.

Relying on older technology also makes finding new talent harder than before. If you didn't notice, data and AI specialists are in high demand. They get to pick and choose their opportunities. Locating suitable candidates who are familiar with and willing to use your old tech is making your life harder. Jobseekers look to advance their skills, not take a trip back into data science history.

You also need to think about security. Older software products are more difficult to patch. Their authors did not have modern, secure programming languages to use. That makes the products vulnerable and memory-unsafe. As a result, it also takes more time for the tools to receive security patches. Delays also impact your models. Patching their source code to secure them will take more time. You need to decide whether those are risks, delays, and pain you're willing to take.

Your booster into the future

Two years ago, your choice would have been to hire a team to port your code from the old language to a modern one. That may not have been a realistic option for many companies; you can keep paying maintenance fees for legacy tools. Yet the advent of LLMs from OpenAI, Meta, and others is finally giving you a cost-effective way out.

You heard of Microsoft's GitHub CoPilot or Jupyter AI which help companies develop faster and better with AI assistants. Developers avoid common errors and can even tell the assistant what to code in English. Tools like CoPilot or Jupyter AI rely on LLM adaptations known as coder models. Vendors and open source teams fine-tune these LLMs on vast bodies of source code. The LLMs gain the ability to produce source code beyond just responding to your prompts in English. And just like LLMs do a phenomenal job translating between human languages, they can also do that with source code.

Code translation — like any LLM output — requires human review, feedback, and correction. At the same time, it will allow you to cut down the time needed to move forward dramatically. Forward with open source, rapid innovation, and faster security update cycles. Tapping into the biggest data science talent pool will be easier, and you will also improve your team retention efforts.

Sounds interesting? The video below shows how an SAS-based analytics code can be transformed in minutes into a Python or R code snippet that runs and produces the same results.

Learn more about how Domino can help you move forward. For further insights, register to watch Domino's full platform demo in action.

As Domino's content lead, Yuval makes AI technology concepts more human-friendly. Throughout his career, Yuval worked with companies of all sizes and across industries. His unique perspective comes from holding roles ranging from software engineer and project manager to technology consultant, sales leader, and partner manager.

SHARE

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.