Crush Pandas Speed Barriers with NVIDIA GPUs on Domino

Yuval Zukerman2023-12-04 | 4 min read

Speeding cars at night in Hong Kong (Photo by Jimmy Chan: https://www.pexels.com/photo/time-lapse-photography-of-road-near-buildings-2552595/)
Return to blog home

AI workloads in a world of generative AI and large language models are growing in size and complexity. Just ask a data scientist. She will tell you that data volumes are rapidly moving from gigabytes to terabytes and even petabytes. Bigger data typically means slower and more expensive. It's more challenging to wrangle data and reshape it to a usable form. More data makes operations like filtering, sorting, and aggregating grind to a halt. And in this world, those who move fast win. They win by gleaning insights from more data quicker, releasing models sooner, and delighting customers with new AI-driven features.

In reality, fast is difficult. While some projects (vaex, pola.rs) focus on CPU-based improvements, the true power is in the GPU. That is why NVIDIA's newest RAPIDS framework version will help you overcome many of these challenges by streamlining the GPU acceleration of the Pandas library. Give us 4 minutes to learn more.

Why Pandas on GPU matters

Pandas is a critical building block in the data science and AI toolchain. The library introduced the basic building block for much of today's analytics in the DataFrame concept. Pandas DataFrames load, manipulate and simplify data processing in Python. Virtually all AI packages and workflows depend on DataFrames. DataFrames have a table structure, just like a spreadsheet. GPUs process data in parallel, and tables fit that bill perfectly.

Until now, leveraging GPUs with Pandas was tricky. NVIDIA's cuDF library provided the necessary power but acted as a Pandas stand-in. There was concern about its Pandas functionality coverage, and you had to maintain separate code branches if you needed your code to run on both GPU- and CPU-first environments. Worse, using other PyData libraries would have required you to swap cuDF in and out with Pandas.

RAPIDS v23.10 is a gamechanger

NVIDIA's RAPIDS v23.10 announcement removes many of those barriers to speed. No longer do data scientists need to struggle with re-coding, testing, and resolving conflicts just to be able to use the GPU. Just add NVIDIA's RAPIDS to your environment and load a notebook extension, or include it as a Python module to begin Pandas acceleration. Even if you don’t have a GPU, your code will still work. NVIDIA's tests demonstrate a whopping 150x acceleration in tabular data processing tests. Better yet, RAPIDS accelerates many other AI libraries with no additional effort.

Unfortunately, leveraging this innovation may take many organizations time. They are stuck with data science offerings that either lack GPU support or free-to-access offerings from the cloud platforms. At the same time, as an open platform, Domino users can harness this power today. Let's see how.

Domino ((Hearts)) RAPIDS

Domino was one of the earliest platforms to embrace RAPIDS. We ran a webinar highlighting its impact, and NVIDIA offers documentation on enabling RAPIDS in Domino. Getting started with version 23.10 is easy, and you can get your entire team up and running in minutes by defining a new Domino environment. A Domino environment defines what OS, code, packages, and tools you want for your project.

NVIDIA RAPIDS environment setup in Domino
NVIDIA RAPIDS environment setup in Domino

Once the RAPIDS environment is available, you can use it with your projects. With Domino, selecting between CPU and GPU infrastructure is a matter of a single click, and as we mentioned, RAPIDS can run on both. We use a notebook from NVIDIA to try out the Pandas updates on Domino. What we found out running on a Tesla V100 confirmed the promises.

Listing of NVIDIA Hardware inside of a Jupyter Notebook
Listing of NVIDIA Hardware inside of a Jupyter Notebook

The notebook runs three comparison tests on common pipelines using a New York City parking violation dataset. The results look compelling:

CPU Time

GPU Time

Acceleration

counts, groupby and sort

15400ms

1350ms

11.4x

groupby, aggregate and sort

3900ms

19.8ms

197x

Type casting, count and sort

7220ms

90ms

80x

Ready to try it for yourself? Go to NVIDIA's Launchpad and take Domino for a spin on the world's most advanced computing infrastructure. Want to know more? Sign up for our demos today!

[Photo by Jimmy Chan]

As Domino's content lead, Yuval makes AI technology concepts more human-friendly. Throughout his career, Yuval worked with companies of all sizes and across industries. His unique perspective comes from holding roles ranging from software engineer and project manager to technology consultant, sales leader, and partner manager.