Helping Farmers Grow Crops More Efficiently through Innovation and Sustainability
Testing more seed variants, producing more seed with less land, and serving customers better while yielding additional company value.
Data Science at Bayer
With roots tracing back more than a century, Bayer has always seen innovation as key to its mission of improving farmers’ harvests to balance the needs of humanity with our planet’s limited resources. Bayer, a world-leading provider of agricultural products, relies on data science at its core, supporting use cases such as maximizing crop yields, improving customer experience, and optimizing supply chain operations. The output of data science is a model. With models built by Bayer’s 500- plus strong data science community helping to improve more than 100 decisions, the company exemplifies what it means to be model-driven.
Bayer has adopted Domino as part of their “science@scale” data science platform to further enhance visibility and collaboration, accelerating the pace of research across hundreds of simultaneous projects and multiple business units. The platform is making a big impact on the business. Coupled with investments in an enterprise-wide data strategy and digital platforms, Bayer has realized significant cost savings by reducing cost- of-goods and increasing operational efficiencies.
Bayer has a multi-year research pipeline to develop new products, including seeds that maximize crop productivity and provide protection from insect pests and herbicides that are needed to combat yield-robbing weeds in the field. The process is expensive and time-consuming; there is little margin for error.
“Each year, we have one chance in each hemisphere’s growing season to collect data on the seeds we develop,” explained Naveen Singla, Data Science Center of Excellence lead at Bayer. “We manage an incredible amount of data to help produce high-quality results, but we also know there’s always opportunity to improve how we manage and leverage the data.”
“Domino has made it easier for users across the global enterprise, using different tools and with varied backgrounds and skill sets, to work with each other, leverage past work, and collaborate quickly.”
Data Science Center of Excellence Lead at Bayer
The company applies highly complex models at each stage of the agricultural process, from early breeding to in-field testing, to increase the probability and pace of breakthroughs that will maximize output while conserving environmental resources.
Bayer realized early in their data science journey that managing the development, production, and ongoing improvements to models requires a different approach than established disciplines surrounding software engineering and data management. Bayer started developing an internal, cloud-based data science platform called “science@scale” to ingest data and provide access to widely used data science tools. While the platform sped up data analysis, the unique characteristics of models required additional collaboration.
Models are built differently, and they serve a different purpose.
Unlike software engineering or data management, models (and Bayer’s business) require a research-based approach comprised of constant exploration, iteration, and agility. They’re intended to be probabilistic, not deterministic. The nature of data scientists’ work is experimental and collaborative; models must constantly be tracked, retrained, and iterated on to reflect changing data and other factors that lead to model drift.
Bayer had the opportunity to augment and amplify its research-based approach for even greater success across its global data science community.
Models have different ingredients.
The landscape of data science tools and technologies — i.e. the “ingredients” that go into models — is very heterogeneous and constantly evolving. A data science platform must provide flexibility, agility, and scalability to support a dynamic tooling environment and diverse skill sets and preferences. The ability to quickly iterate on retraining models, validating, and deploying was a “must” for Bayer.
The science@scale solution included RStudio, Jupyter, Flask, etc., catering to data scientists comfortable with modern software programming paradigms. Domino has provided easier access to the big data technology stack to the broader data science community at Bayer, as well—which has had a positive impact for Bayer’s diverse research team while also delivering business value.
“We needed a platform that could abstract away complexities and allow all users to do analysis at scale, utilizing the modern tech stack and getting better insights from data,” said Singla.
The Domino Effect
Bayer leadership recognized an opportunity to enhance science@scale with Domino. Domino is a purpose-built data science platform that supports diverse tools, automates hardware infrastructure provisioning (so data scientists can run experiments in parallel and at scale), and facilitates rapid iteration and deployment of models. The critical features provided by Domino include:
- Open and flexible ease of use: Domino allows Bayer’s hundreds of data scientists to focus on driving innovation, using their preferred hardware, software, tools, and languages -- including RStudio, Python, Flask, and Shiny -- with centralized management. The platform allows team members who are relatively unfamiliar with the big data technology stack to process, explore, and model data using the latest packages. Data scientists at every level are empowered to control their own environment and hardware.
- Collaboration: Domino automatically versions not just code, but entire experiments along with the data, the environments, discussion threads, and necessary artifacts, meaning work is never lost and is always reproducible. “It’s invaluable to be able to compare your current result with one from five experiments ago and see what’s changed,” said Singla. Data scientists across the globe can collaborate and build on past work rather than reinventing the wheel, and data science leadership is confident in the team’s ability to deliver business results efficiently and at scale.
- Adoption: Bayer set up Domino within a robust discovery environment in science@scale, where it facilitates accelerated model development along with model delivery via the Domino API and Shiny apps. More than 75% of the company’s 500-strong community of data scientists now actively use Domino and adoption continues to expand. As the team grows, expert data scientists create templates in Domino that help ingrain and share best practices for more junior colleagues.
Bayer’s Data Science Journey: Mission Driven
Bayer’s large data science community works as a cohesive, high-performing team. They build models that both drive agricultural breakthroughs and optimize efficiencies of everyday business operations.
Digital innovations across the company, enabled via a combination of investments in data, platforms, and people, have led the company to realize value and efficiencies in delivering agricultural products to farmers around the world.
- Using machine learning via Domino’s platform, Bayer can better understand, model and predict the impact that seed genetics, environmental conditions and agronomic practices have on crop yield within its Supply Chain operations. Yield performance depends on the interaction between the crop genotype and environmental factors (such as topography, soil and climate conditions at the location where the crop is planted in the field). Leveraging the platform has resulted in a significant increase in seed production yield simply by planting a product in the best zones within the company’s existing seed production network. This increase can be used either to reduce production acres, or the level of uncertainty within existing production acres.
- Rather than using static models during online operations, researchers can now adapt models based on updated data flow. Rapid iteration, validation, and delivery using Domino allows them to conduct field operations more efficiently.
- Domino allows Bayer’s sales teams to access more detailed information about customers’ specific needs, in order to recommend the best products for their fields. This personalized approach improves customer success and satisfaction.
- The platform automatically tracks the full testing record for R&D projects, and deploys it as APIs for consumption by downstream systems, allowing new members who join the team to contribute immediately.
“Domino has made it easier for users across the global enterprise, using different tools and with varied backgrounds and skill sets, to work with each other, leverage past work, and collaborate quickly. This ultimately results in more models being delivered and deployed in a shorter window of time, which is empowering Bayer to be a model-driven company that’s at the forefront of farming,” Singla said.
Agriculture / Biotech
Headquarters: St. Louis, MO
Product and crop yield optimization
Customer segmentation and churn prediction
Supply chain efficiencies
Significant increase in Supply Chain seed production yield
Collaboration among diverse team with varied skill sets and tools of choice
Significant value realized with digital solutions
500-Strong data science community embedded across multiple lines of global business
Data Science Tool(s): R Studio, Python, Flask, Shiny
Server / Cloud Infrastructure: AWS