Better Interactive Data Science with Beaker and Rodeo
By Nick Elprin2015-09-154 min read
Domino has offered support for IPython/Jupyter Notebook for a while, but we recently added support for two newer, up-and-coming tools for interactive data science: Beaker Notebooks, and Rodeo. This post gives a brief overview of each tool and describes how to use them on Domino.
Power tools at your fingertips
The motivation behind Domino is to make data scientists more productive by letting them focus on their analysis without worrying about infrastructure and configuration; and to facilitate collaboration and sharing among teams, by keeping work organized and tracked in a central place.
To that end, Domino now lets you spin up Rodeo or Beaker sessions on big machines, and keeps your files and notebooks stored centrally so it's easier to track, share, and comment on them.
Beaker Notebook is a notebook application from the team at Two Sigma Open Source, in some ways similar to Jupyter/IPython notebooks. But in addition to supporting inline code, documentation and visualization in many different languages, Beaker lets you mix languages. That's right: one notebook can mix code from any language they support, and Beaker's slick interop capabilities seamlessly translate data between languages. This even works for DataFrames and more complex types.
There's a lot going on under the hood to make that work — it's pretty magical.
This makes Beaker the ultimate weapon for those who believe in "using the best tool for the job": one single analytical workflow can use Python for data prep, R for sophisticated statistical analysis, and HTML with D3, or Latex for beautiful visualization and presentation.
You can watch a video of one of Beaker's creators speaking about it at SciPy 2015. You also play with Beaker yourself, without any installation or setup, on Domino. You can create your own projects to do this.
- Start a Beaker session by clicking on the "Notebook" menu on your "Runs" dashboard.
2. When the server is ready, click the "Open session" button in the right pane.
3. Create a new notebook, or import one of Beaker's examples, or use the file menu to browse to "/mnt" and choose one of the files in our project (
interop.viz notebook shows some nice examples of Beaker's flexibility for translating data between languages.
Rodeo is an open source Python IDE from the folks at yHat. It answers the question, "is there anything like RStudio for Python?"
Rodeo is just that: it's a web-based IDE for editing Python files that gives you a code editor along with a plot viewer and a file browser in one interface. Unlike Python editors designed for building large software systems, Rodeo is tailored for doing data science in Python — especially with its built-in plot viewer.
You can read more about our support for Rodeo on our help site.
Nick Elprin is the CEO and co-founder of Domino Data Lab, provider of the open data science platform that powers model-driven enterprises such as Allstate, Bristol Myers Squibb, Dell and Lockheed Martin. Before starting Domino, Nick built tools for quantitative researchers at Bridgewater, one of the world's largest hedge funds. He has over a decade of experience working with data scientists at advanced enterprises. He holds a BA and MS in computer science from Harvard.
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.