Data Mesh in short
The concept of Data Mesh is relatively new as it was introduced in 2019 by Zhamak Dehghani, a pioneer in managed data decentralization. Data Mesh is an enterprise data architecture, that opposite to traditional “monolithic” and centralized data lakes and warehouses, embraces intentionally distributed approach to data architecture, especially meant for large and complex organizations dealing with big data.
The key message of Dehghani was, that the traditional way of implementing data warehouses or lakes as big, centralized structures hasn’t been able to unleash the true value of the data, and has created big, complex and expensive data architectures full of delivery bottle necks, especially in large organizations with rich domains, several sources and a diverse set of consumers.
To harness the full potential of data, Data Mesh approach advocates for distributing data ownership and governance to individual domain teams, enabling them to take ownership of their data and work in agile way. The four core principles of Data Mesh include domain and data product thinking, self-serve data platforms, and federated data governance. Let’s dig a bit deeper to these core principles.
Domain Thinking
Domain thinking is a fundamental aspect of Data Mesh. It involves aligning data infrastructure, tools, and processes with specific business domains rather than treating data as a monolithic entity. Each domain team becomes responsible for its data products, including data collection, processing, storage, and analytics.
This approach promotes a deep understanding of domain-specific data requirements, leading to better insights and faster decision-making. In large organizations a single platform/data integration team will cause bottlenecks and hinders getting business value out of data. The integration work also needs data expertise from the team, which is hard to achieve in large organizations with number of data sources, for small and centralized teams. This supports the way how business domains naturally distribute in organizations. Data domains and the teams around them should be long-term.
Data Products
Data Mesh introduces a product-oriented mindset to data management. Each domain team treats its data products as assets and focuses on delivering data products that support the specific needs of their users. This approach encourages teams to think beyond just data pipelines and storage, considering the end-to-end data product lifecycle, including data discovery, documentation, accessibility, and continuous improvement that bring long term value for the business. The customers of these data products delivered by the domain teams can be other data scientists, data engineers or business users within the organization. Data products can be for example APIs, reports, tables or datasets. Through the data products the data can be shared also between the different data domain teams when needed.
Self-Serve Data Platform
Data Mesh encourages the creation of self-serve data infrastructure as platform within the organization for the domain teams. The domain teams have the autonomy to choose and manage their data storage, processing, and analysis tools based on their unique needs to be to deliver successful data products, but their job is not to manage technical infrastructure. This job is done by a centralized platform team, who are responsible to create domain-agnostic infrastructure platform, that can support domain teams in creating their data products with low lead time. Automation capabilities are one of the key features of the platform.
Federated Data Governance
Data governance plays a crucial role in Data Mesh, as in the distributed domain approach it is very important to make sure that we don’t fall back to creating silos, data duplication and building a wild west of the enterprise data architecture.
As a recap, data governance is a set of processes, policies, and guidelines that ensure the effective and secure management of an organization’s data assets. Instead of relying solely on a centralized data governance model, Data Mesh promotes a federated data governance approach. Federation approach means, that a set of data governance guidelines and standards are defined and managed centrally in the organization, and each data domain team must comply with these rules. However, each domain is free to decide how they will comply with the governance rules in practice, taking account domain-specific requirements.
It is important to make the distinction between dusty data silos and the decentralized data domains in the data mesh. Data silos refer to isolated data storage where data is stored within individual teams or departments in an organization. Each silo typically has its own data formats, definitions, and access controls, making it challenging to share and integrate data across different silos. This results in data duplication, inconsistencies, and limited data accessibility, hindering collaboration and a holistic view of data across the organization.
The key difference between data silos and decentralized data domains lies in their approach to data management and governance. While data silos isolate data within specific teams or departments, leading to fragmentation and limited data sharing, decentralized data domains emphasize the culture of collaboration within the organization following standardized common practices, but keeping the autonomy to define their data products, data schemas, and access controls in a way that supports their use cases the best.
Fabric & Data Mesh
Ok, so now we know what is a Data Mesh is, but how does it relate to Microsoft Fabric? It is important to remember, that Data Mesh itself is not a technology or coupled with any tech provider, it is an architectural paradigm that can be implemented in many ways, and multiple paths can lead to a Data Mesh. Currently the Fabric Preview enables the organizing the data into domains and thus supporting the domain thinking of data mesh. In the future releases federated governance capabilities are enabled. In general, Microsoft now defines in their data architecture suggestions a data mesh as a one approach and gives implementation instructions for the technical side as well as for the concept and change management perspective. Data mesh is here to stay as a one enterprise data architecture.
Conclusion
Even though the Fabric would provide ease to the technical requirements of Data Mesh, no tool is going to the actual groundwork of setting up the working methods, defining and organizing the teams and generally tuning the mindset of the organization into the data mesh frequency. Changing the way how people work is never an easy task. Organization don’t need to necessarily start building a Data Mesh from scratch, maybe you have a solid existing implementation, but you just change the way how it is governed, managed, and developed and by what kind of teams.
It is also important to remember, that Data Mesh is not always the best approach, as it requires independently working autonomous domain teams. The biggest benefits of Data Mesh are achieved in larger and complex organizations with rich data landscape. For smaller organizations a single centralized team might be a better alternative for the team set up perspective.
But still, it is not a waste of time to understand the concepts of product thinking, general technical requirements of a data mesh platform or the importance of data governance.
References:
Martin Fowler: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Medium: Why We Started Nextdata
Mesh-AI: Data Mesh 101: Why Federated Data Governance Is the Secret Sauce of Data Innovation
Microsoft: What is Data Mesh
Microsoft: Design considerations for self-serve data platforms
Microsoft: Domains (preview)