Subject archive for "spark"


Spark, Dask, and Ray: choosing the right framework

Apache Spark, Dask, and Ray are three of the most popular frameworks for distributed computing. In this blog post we look at their history, intended use-cases, strengths and weaknesses, in an attempt to understand how to select the most appropriate one for specific data science use-cases.

By Nikolay Manchev15 min read

Ray clusters with Domino accelerate data science innovation.
Machine Learning

Domino Unlocks the Power of Data Science with Ray 2 Clusters

OpenAI demonstrated the profound impact generative AI could have. Such techniques turn datasets into transformative tools and products. Tangible AI projects that are both inspiring and can save your company time and money. Better yet, you are in a good position to aim high.

By Thomas Dinsmore and Yuval Zukerman5 min read

Data Science

Making PySpark work with spaCy: Overcoming serialization errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate at Google, as well as a co-author of "High Performance Spark" and "Learning Spark". She has a repository of her talks, code reviews and code sessions on Twitch and YouTube. She is also working on Distributed Computing 4 Kids.

By Domino8 min read

Data Science

Themes and Conferences per Pacoid, Episode 13

Paco Nathan's latest article covers data practices from the National Oceanic and Atmospheric Administration (NOAA) Environment Data Management (EDM) workshop as well as updates from the AI Conference.

By Paco Nathan17 min read

Machine Learning

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She has already written a complementary blog post on using spaCy to process text data for Domino. Karau is a Developer Advocate at Google as well as a co-author on High Performance Spark and Learning Spark. She also has a repository of her talks, code reviews, and code sessions on Twitch and Youtube.

By Holden Karau5 min read

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.


By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.