Data mesh is a sociotechnical approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective), and borrows Eric Evans’ theory of domain-driven design[1] and Manuel Pais’ and Matthew Skelton’s theory of team topologies.[2] Data mesh mainly concerns itself with the data itself, taking the data lake and the pipelines as a secondary concern. [3] The main proposition is scaling analytical data by domain-oriented decentralization.[4] With data mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform.[5] This enables a decrease in data disorder or the existence of isolated data silos, due to the presence of a centralized system that ensures the consistent sharing of fundamental principles across various nodes within the data mesh and allows for the sharing of data across different areas.[6]
History
The term data mesh was first defined by Zhamak Dehghani in 2019[7] while she was working as a principal consultant at the technology company Thoughtworks.[8][9] Dehghani introduced the term in 2019 and then provided greater detail on its principles and logical architecture throughout 2020. The process was predicted to be a “big contender” for companies in 2022.[10][11] Data meshes have been implemented by companies such as Zalando,[12]Netflix,[13]Intuit,[14]VistaPrint, PayPal[15] and others.
In 2022, Dehghani left Thoughtworks to found Nextdata Technologies to focus on decentralized data.[16]
In addition to these principles, Dehghani writes that the data products created by each domain team should be discoverable, addressable, trustworthy, possess self-describing semantics and syntax, be interoperable, secure, and governed by global standards and access controls.[19] In other words, the data should be treated as a product that is ready to use and reliable.[20][21]
In practice
After its introduction in 2019[7] multiple companies started to implement a data mesh[12][14][15] and share their experiences. Challenges (C) and best practices (BP) for practitioners, include:
C1. Federated data governance
Companies report difficulties to adopt a federated governance structure for activities and processes that were previously centrally owned and enforced. This is especially true for security, privacy, and regulatory topics.[22][23][24]
C2. Responsibility shift
In data mesh individuals within domains are end-to-end responsible for data products. This new responsibility can be challenging, because it is rarely compensated and usually benefits other domains.[22][23]
C3. Comprehension
Research has shown a severe lack of comprehension for the data mesh paradigm among employees of companies implementing a data mesh.[22]
BP1. Cross-domain unit
Addressing C1, organizations should introduce a cross-domain steering unit responsible for strategic planning, use case prioritization, and the enforcement of specific governance rules—especially concerning security, regulatory, and privacy-related topics. Nevertheless, a cross-domain steering unit can only complement and support the federated governance structure and may grow obsolete with the increasing maturity of the data mesh.[22][25]
BP2. Track and observe
Addressing C2., organizations should observe and score data product quality as tracking and ranking key data products can encourage high-quality offerings, motivate domain owners, and support budget negotiations.[22]
BP3. Conscious adoption
Organizations should thoroughly assess and evaluate their existing data systems, consider organizational factors, and weigh the potential benefits before implementing a data mesh. When introducing data mesh, it is advised to carefully and consciously introduce data mesh terminology to ensure a clear understanding of the concept (C3).[22]
Community
Scott Hirleman has started a data mesh community that contains over 7,500 people in their Slack channel.[26]
Data vault modeling, method of data modeling with storage of data from various operational systems and tracing of data origin, facilitating auditing, loading speeds and resilience
Data warehouse, a well established type of database system for organizing data in a thematic way
^ abVestues, Kathrine; Hanssen, Geir Kjetil; Mikalsen, Marius; Buan, Thor Aleksander; Conboy, Kieran (2022). "Agile Data Management in NAV: A Case Study". Agile Processes in Software Engineering and Extreme Programming. Lecture Notes in Business Information Processing 445 LNBIP. Vol. 445. Springer. pp. 220–235. doi:10.1007/978-3-031-08169-9_14. ISBN978-3-031-08168-2.
^Joshi, Divya; Pratik, Sheetal; Rao, Madhu Podila (2021). "Data Governance in Data Mesh Infrastructures: The Saxo Bank Case Study". Proceedings of the International Conference on Electronic Business (ICEB). Vol. 21. pp. 599–604.