Intel’s Python Distribution is Smoking Fast, and Now it’s in Domino
By Domino Data Lab2017-12-273 min read
Domino just finished benchmarking Intel’s Python Distribution, and it is fast, very fast. Intel’s Python distribution is available for use in Domino.
Intel’s Python Distribution
People may not have known that Intel has a Python Distribution. Based on the Anaconda Distribution, the engineers at Intel have optimized popular math and statistical packages such as NumPy, SciPy and scikit-learn using the Intel® Math Kernel Library and Intel® Data Analytics Acceleration Library. Intel continues to collaborate with all major Python Distributions including Anaconda to make these IA optimizations and performance accelerations widely available to all Python users.
Intel benchmark results indicate that batch runs that might have taken over an hour to complete, now complete in as little as two minutes. When working in a Jupyter Notebook, the resulting speedups mean cells that used to take minutes to compute — now do so in seconds.
Domino Benchmarks Intel’s Python Distribution
At Domino, we wanted to run the benchmarks in a few different scenarios to see how the speedups would impact real world data science programs. For each benchmark, we ran identical experiments where the only variable changed was the Python Distribution being used.
It is easy to change environments and hardware in Domino, so changing environments to run benchmarks takes just a few seconds.
Once we had an environment with Intel Python, we could kick off all the benchmarks at the same time, and know we were isolating the environment as the variable. Also, we could run complex jobs on smaller and larger machines to see how that changed the results.
The first benchmark we ran used scikit-learn to compute distances in the distance matrix from a vector array. Each benchmark was run three times on a 16 core/120GB RAM box.
The Intel Python consistently completed the runs in less than 20% of the time that it took for the Standard Python Distribution.
The second test we ran used a Black Scholes benchmark on a smaller, shared box. The server had four CPUs and 16 GB of RAM.
Again, the time savings from using Intel’s Python Distribution were substantial. Even saving seven or eight minutes per experiment leads to a substantial improvements in research results. Faster results allow for more iterations, and also ensure researchers won’t be distracted and pulled away in the middle of their work. When runs are shortened from hours to just a few minutes the difference is even more valuable.
Intel Python Environments Available in Domino
Domino customers can benefit from Intel’s Python Distribution right away. We’ve already created Intel Python environments in both our trial environment and our cloud production environment.
Domino powers model-driven businesses with its leading Enterprise MLOps platform that accelerates the development and deployment of data science work while increasing collaboration and governance. More than 20 percent of the Fortune 100 count on Domino to help scale data science, turning it into a competitive advantage. Founded in 2013, Domino is backed by Sequoia Capital and other leading investors.
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.