New! Domino Nexus now supports versioned file storage with Domino Datasets
Leila Nouri2024-11-21 | 5 min read
Domino Nexus hybrid and multicloud solution lets enterprises move compute closer to their data, to accommodate data gravity, avoid the time and costs of data transfers, and restrict access to data by region in order to protect data locality for compliance. And a Domino Nexus deployment consists of a “control plane,” which is a Kubernetes cluster hosting Domino platform services, and many “data planes” which are distinct Kubernetes clusters that run a small set of Domino services and are used for executing user workloads.
Previously, customers would only use local dataplanes to access larger files on-premises, using Domino Datasets. Now, customers can leverage Domino Nexus remote dataplanes with Domino Datasets, to store more data (beyond 8GB) and larger files (beyond the limitation of 10,000 files), and access files faster in public clouds.
Data science teams benefit by having the visibility, permissioning, reproducibility, and traceability of datasets in remote planes, and IT teams benefit by maintaining control over where the data resides, without having to move it on-premises (to a local data plane), or find other ad-hoc methods to enable data scientists to access that data.
Nexus customers can now take advantage of Datasets to build and share smaller curated datasets for customers and stakeholders, easily track production and testing states of their data, track data changes and enable reproducibility with snapshots, and automatically pipe data from external sources into Domino.
Store larger files and access them faster without moving data
Domino Datasets do not restrict the number of files you can store in a Domino Dataset; there is no limit to the size of any individual file stored in a Domino Dataset or switch projects to use from Datasets to EDVs, which can lack granular permissioning that is often a prerequisite for traceability, reproducibility and compliance.
Second, Domino Datasets are mounted to executions as networked filesystems, removing the need to transfer their contents to the executor when starting a run or workspace. This eliminates the need to pull the data from elsewhere during the execution, using the API — now all data is readily available, and no longer needs to be moved.
Granular permissioning for traceability
Datasets have their own independent permissioning that can be leveraged to make sure that only the correct data gets accessed by the correct person, helping IT maintain data security and privacy.
Build smaller, curated datasets for consumers
Using Domino Datasets, you can now build multiple curated collections of shareable data in your project, sharing data with members of your team and those who consume your projects receive the entire content of your project files in their Runs and Workspaces — allowing you to expose smaller, curated data within large projects, to your consumers.
Track data changes with snapshots
When your project requires reproducible data, you can create Dataset snapshots that are immutable and versioned, so you have a consistent view of data, can track data changes, ensure reproducibility for compliance, and perform analysis on historical data.
Track production and testing states of your data
If you have a Dataset that is being used by downstream consumers for critical work, tagging allows you to continue to improve, process, and experiment with new Snapshots without impacting those consumers. When you have improved data ready for use, you can switch which Snapshot is tagged, and your tag consumers will automatically start getting your new data.
Automatically pipe data from external sources into Domino
Suppose you have data stored in an external data source that is periodically updated and you have derived data that is an input into a downstream processing step in a pipeline. You can fetch the latest state of that file once per week and load it into a Domino Dataset with a scheduled job.
Takeaways
Domino Datasets for Nexus remote dataplanes optimizes data management by enabling faster access to larger files without moving data. With features like immutable snapshots, dataset tagging, and automated data imports, Domino empowers teams to manage, share, and update their datasets seamlessly, improving overall efficiency in data-driven projects. This flexibility allows enterprises to store and access vast amounts of data in both local and remote dataplanes, reducing costs and improving compliance. For more insights, check out the latest Domino Datasets demo and see it in action.
Leila Nouri, Director of Product Marketing at Domino Data Lab, is an innovative and data-driven product marketing leader with 15+ years of experience building high-performing teams, go-to-market campaigns, and new revenue streams for startups and Fortune 500 companies.