Domino Expands Unified Data Access and Governance
By Tim Law2023-10-317 min read
In today's fast-paced world of data-driven decision-making, having a streamlined and efficient approach to managing data is essential. That's why we're excited to announce the expansion of Domino's Unified Data Access and Governance capabilities. At Domino, we understand data's critical role in every data science project, especially the requirement for large volumes of unstructured data for Generative AI. We aim to provide a comprehensive interface to access and govern your data, no matter where it resides.
Access All Your Data Anywhere
Data is the lifeblood of any data science project. It's the fuel that powers insights and innovation. However, data can be scattered across an organization in various formats and locations. Domino Data Sources provide secure, shareable, and authenticated access to data platform deployments from all major vendors across database, data warehousing, data lake and data lakehouse deployments. To enhance flexibility, Domino Data Sources enables multiple methods for teams to easily and rapidly connect to enterprise data anywhere.
Domino Data Source Connectors
Our Domino Data Source Connectors make connecting to popular data services such as IBM DB2, Netezza, SAP HANA, Databricks, and more accessible. Whether you're working with data in the cloud or on-premises, Domino Data Source Connectors streamline the setup process, eliminating the hassle of manual configurations and driver installations. These connectors allow you to access data from various sources within a single interface seamlessly. The Domino Platform provides complete credential management, data source sharing, and audit logging capabilities. Domino Data Sources support multiple authentication methods to meet data governance needs like username and password, API keys, and OAuth integration.
Domino Data Sources can be configured to use either a service account that only the administrator is aware of or use each user's credentials. Once created, Domino Data Sources can be shared across all users on the platform or a select set of users, enabling them to reuse the same data source, thus preventing connection sprawl. All access to data sources is audited and logged so that admins can monitor and analyze all accesses for compliance with organizational data access policies.
Data Connectors for External Data
But what if you're working with a data source that needs to be added to our predefined list? No worries, Domino allows you to connect directly to any data service using the same code you'd use in your local environment. This flexibility means that there are virtually no constraints on how you can access your data.
Domino data sources have a global scope within a deployment and are accessible to anyone with the appropriate permissions in any project. You can explicitly add data sources to a project, or they can be implicitly included when a data source is used directly in the project's code.
Domino Data Sets
Domino offers Data Sets read/write managed folders that can be versioned with snapshots. These Data Sets can be shared within Domino and are ideal for use cases where you need training data that can't easily be shared or controlled outside the platform. Additionally, they are perfect for storing derived data used in downstream processing steps within a data pipeline.
External Data Volumes
External Data Volumes (EDVs) represent network-attached storage systems Domino can register and attach to projects. When an EDV is attached to a project, Domino mounts its file system when running code in that project. This capability is especially valuable for exposing Domino's existing IT data storage interface.
Project Artifacts are stored in a version-controlled folder, similar to Git LFS, within the Domino File System (DFS). This is where you can store various outputs, including charts, serialized model files, and output CSVs. Having these artifacts stored within Domino ensures version control and easy access.
Use the Domino Data API
After properly configuring a data source, use the Domino Data API to retrieve data without installing drivers or data source-specific libraries. The auto-generated code snippets in your workspace are based on the Domino Data API. The API supports tabular and file-based data sources and is available for many popular data sources, including Snowflake, Oracle, PostgreSQL, S3, Redshift, and more.
The API supports Python and R. The Data API comes pre-packaged in the Domino Standard Environment (DSE). If you are using a custom environment that doesn’t have the Data API, you can install it.
The Data API’s data source client uses environment variables available in the workspace to automatically authenticate your identity. You can override this behavior using custom authentication.
Domino Feature Store
One of our most exciting additions is the Domino Feature Store. This feature allows you to create, publish, and share features in a global registry. It leverages the Feast open-source feature store library and offers additional installation, feature cataloging, search, and reuse capabilities. With the Domino Feature Store, you can create and publish features others can quickly discover and incorporate into their projects. This is an invaluable asset for collaboration and efficiency.
Data Reproducibility and Security
Domino places a strong emphasis on data reproducibility. You can create Dataset snapshots that are immutable and versioned, ensuring that your data remains consistent across projects. Additionally, the Data API enables you to create training sets based on your data from any location.
Data security is paramount. Domino controls who can access and edit data, making it easy to collaborate securely. We automatically create audit trails to monitor data source activity, ensuring individual accountability, facilitating the reconstruction of events, and aiding in intrusion detection and problem analysis. This feature also helps in ensuring compliance with industry regulations.
In conclusion, Domino's expanded Unified Data Access and Governance capabilities offer data scientists and teams the tools to access, manage, and govern their data efficiently. Whether you're working with structured databases, unstructured files, or external data sources, Domino provides a unified, secure, and streamlined solution for your data needs. We are excited about the possibilities this expansion brings to the data science community and look forward to helping you make the most of your data. Stay tuned for more updates as we continue to innovate and improve our platform.
Learn more about working with data in Domino.
Tim Law, Director of Product Marketing at Domino Data Lab, is an innovative technology marketer with 20+ years of experience in data analytics, AI, and machine learning.
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.