What is vibe modeling? Code less, analyze more

Etan Lightstone

Return to blog home

Vibe modeling: The future of high-impact data science

If you're a data scientist, you are likely already using AI coding tools to accelerate parts of your workflow. But as someone who has been focused on this for years, I’ve seen that writing the initial code is only a tiny part of the data science job. The real bottleneck, and where the most time is spent, is in the iterative process of experimentation. That process includes everything from finding, analyzing, and transforming data to running countless variations of a model. At Domino Data Lab, we’re passionate about serving as an accelerant to building AI models more quickly and safely, with over a decade of experience doing this for large enterprises.

This is where the paradigm shifts. Yes, we can use natural language to have a coding assistant generate a model architecture. But the real opportunity is in describing the predictive outcomes we want and involving an AI assistant in both the development and the experimentation and evaluation of our models. What if we could accelerate that entire process in a safe, reproducible, and enterprise-ready way? That is the vision behind vibe modeling, and it's what we are building in the open at Domino Data Lab.

Vibe modeling is an approach to data science that leverages AI assistance throughout the complete model development lifecycle, not just initial coding. Rather than simply generating code from prompts, it also involves AI in the iterative experimentation process, helping with data transformations, model experimentation, and evaluation cycles. This enables data scientists to describe their analytical intent and desired outcomes while AI accelerates the time-consuming experimental phases of model building.

Where data science really happens

Unlike traditional software development, our work is defined by constant workflow interruptions. We are always switching between high-level thinking about model optimization and low-level details like fixing syntax errors or writing boilerplate code. Think about the effort required to clean messy data, set up cross-validation, or adjust model hyperparameters.

Existing tools like notebooks and code libraries help, but they still require us to manually translate our analytical thinking into exact code instructions. Vibe modeling changes this relationship. Instead of specifying how to code something, you can simply describe what you want to accomplish. This fundamentally alters the nature of data science work, allowing you to spend more time on analytical thinking and domain expertise, and less time wrestling with implementation details.

From instruction to conversation: A practical example

This approach allows you to express complex model architectures using natural language. For example, instead of writing out the Python for a static model, you can ask for a flexible starting point:

“Create a PyTorch model architecture for this classification task. Make it consist of four layers with configurable hidden dimensions, dropout rate, and activation function. Also, create a training script that takes arguments for batch size, number of epochs, and learning rate.”

The system then generates the corresponding code, which you would normally review, adjust, and run. However, if we pair our coding agent with Domino’s MCP Server add-on, we can involve it in the full training, evaluation, and optimization loop using a prompt like this:

“Given how the training script works and use it to train a small neural network for the diabetes model. Iterate on the model parameters or architecture as needed until the model is optimized enough. Explain the results when you are finished.”

This conversational approach streamlines the iterative process of experimenting with different architectures and parameters. All executing from inside the Domino platform via the MCP Server. You can quickly refine the network or coach it further by saying, "run another training attempt with a bigger model and a reduced learning rate" — perhaps it will suggest adding a residual block or switching from a neural net to a random forest.

Crucially, you always maintain complete control. You review the generated code, make adjustments when needed, and ensure the analysis meets your standards. It’s an approach that makes complex analysis more accessible while keeping data scientists and ML engineers firmly in charge of the analytical direction.

This is more than just code generation. Your prompts create a clear record of your analytical decisions. For maximum reproducibility and shareability, you could even include the prompt history as an export into the code repo or log your prompts in the experiment runs themselves. In this new workflow, the prompts can become the new IP — arguably more important than the code itself.

Why an integrated platform is essential

This new, agentic way of working raises an interesting question: how can we hand the keys over to a coding LLM while ensuring everything we do is explainable and reproducible?

This is where the comfort of a platform with built-in experiment tracking, reproducibility, and standardized shareable environments becomes essential. Using vibe modeling with Domino’s Enterprise AI Platform provides the perfect balance. Every iteration an AI co-pilot performs is automatically tracked and is reproducible in Domino. This addresses the core challenge of governance in the AI era.

This is the value of an integrated platform. It provides a secure, IT-approved environment connected to enterprise assets and governed by quality and regulatory policies. This ensures the work data scientists do via vibe modeling is more than just a pilot; it is ready for production. This means moving beyond the constraints of no-code options, which can create non-portable solutions and vendor lock-in, and away from proprietary languages with long learning curves. Vibe modeling is about making data science open, frictionless, and impactful.

Putting it to the test: Model refinement and troubleshooting

Let's walk through a diabetes prediction project using a CSV dataset to train a classification model. After asking the AI assistant to perform exploratory data analysis and data transformations, it generated the required code and ran it as a Domino job. It correctly identified a class imbalance and significant feature correlations. It then handled outliers, created derived features for exercise and body weight, and performed one-hot encoding and feature value scaling. Like any job, this was fully tracked and reproducible on the platform.

Next, I used a simple prompt asking the assistant to understand our training script and optimize the model. The assistant, using its configured tools, automatically started running training attempts as Domino jobs.

The first attempt, with 10 epochs and a small model, achieved a respectable 96% accuracy. For the second run, the assistant decided to use more epochs, a bigger model, and a reduced learning rate, which resulted in a higher accuracy.

This is where the process became a true partnership. The assistant noted it was worried about overfitting and proposed a code change to the model architecture to increase the dropout rate. Following that, it made one last change to the model code, adding weight decay to the Adam optimizer. The final attempt was a good candidate, achieving over 97% accuracy.

The future of data science starts today

This approach is about changing the focus of our work. By reducing time spent on implementation, data scientists can concentrate on domain understanding, problem formulation, and interpreting results — the activities that create the most value. Effective prompting is becoming an important professional skill, and those who get comfortable with this way of working now will have an advantage as these tools become more common.

For Domino customers, this future starts today. We have released our open source MCP Server, Cursor rules, and a few suggested prompts for Domino users to leverage, all at: https://github.com/dominodatalab/domino_mcp_server. But as you get more sophisticated, there's a lot more opportunity to further prime your coding agent based on your needs. For example, consider drafting your own additional Cursor rules for how you'd like the model to experiment with different model architectures or data feature engineering approaches, slowly increasing the level of complexity if the accuracy tradeoff is worth it.

This is part of our commitment to innovating in the open, building on our platform to define what’s next for data science. I'm looking forward to seeing what the community can do by working with AI agents that leverage Domino's platform capabilities.

Etan Lightstone

A product design leader specializing in building and leading teams, Etan Lightstone focuses on shaping design strategy and vision for AI, MLOps, and data science software. As the VP, Head of Product Design at Domino Data Lab, he leverages a hybrid background in Design and Software Engineering to guide his team and design software experiences. Prior to Domino, he held key product design leadership roles at New Relic, Inc., and ShiftLeft, a cybersecurity company.