Beyond Spark: Dask and Ray as Multi-node Accelerated Compute Frameworks
Apache Spark has been the incumbent distributed compute framework for the past 10+ years. But the overhead and complexity of Spark has been eclipsed by new frameworks like Dask and Ray.
We'll discuss the history of the three, their intended use-cases, their strengths, and their weaknesses. From individual trade-offs, we pose the question of how to select the right framework based on the available infrastructure, data volumes, workload complexity, etc.
Finally, we’ll explore their capabilities in the context of GPU-accelerated computing, presenting an integrated solution that enables data scientists to easily provision a Spark/Ray/Dask cluster and access it through an integrated development environment.
Speaker: Nikolay Manchev - Principal Data Scientist for EMEA, Domino Data Lab
Related Resources
![](https://cdn.sanity.io/images/kuana2sp/production-main/86c6c63ff4f024098b94f73010f2e28b25611bcc-1091x502.webp?w=900&fit=max&auto=format)
![](https://cdn.sanity.io/images/kuana2sp/production-main/803b9f36d5c4703aa03051a39f917e0b267f501d-1600x812.webp?w=900&fit=max&auto=format)
Webinar
Run complex AI training workloads using on-demand GPU-accelerated Spark/RAPIDs clusters
![](https://cdn.sanity.io/images/kuana2sp/production-main/2348f2594cc25755b191c269ea8f5dccf2d47c4a-640x373.webp?w=900&fit=max&auto=format)
![](https://cdn.sanity.io/images/kuana2sp/production-main/217897ac6a7c295700137c14b5427608cd6364bc-1258x872.png?w=900&fit=max&auto=format)