Data Mesh is based on four core concepts, the first being domain-oriented ownership and architecture. In this blog, we'll explore what this means and go into the details of a fundamental shift that supports a decentralized data ecosystem.
What is a domain?
A domain is typically a community of people organized around a common business goal. Domains usually start by reflecting the organization and then iterate from there. Examples of domains for an ecommerce site might include users, sellers, products, marketing, etc. From a functional point of view, the domain can serve a variety of purposes: for example, domain vendors can have partner relationships with domain vendors, track products, arrange payments for vendors, etc. Ideally, each domain enables data generation (acquisition), transformation, and submission of data products to downstream analytics - this is how data ultimately provides business value.
Challenges with centralized data ownership
As seen by countless data teams over time, any disconnect between data producers and data consumers ultimately creates a challenge when deriving business value from data. There is a natural signal loss in the change of ownership of data, which reduces the value of the data. In a centralized data environment, it is often unclear who ultimately owns and who is responsible for the data generated by a domain. These responsibilities include data generation, acquisition, transformation, quality assurance and presentation. Laura Smith, CIO of healthcare provider UnityPoint Health, comments: “One of the biggest challenges for organizations is not collecting data, but developing a team to implement the data and drive change across the organization.”
These days it's common for companies to play the hot potato game when trying to figure out who is responsible for their data sets—the engineering team that generates the data focuses solely on the operational system and the business function of their product developments—the data they generate is an afterthought for them. It is an advantage that it ultimately increases business value and can provide context for business decisions, but it is beyond the scope of the development team's interest—they are evaluated through the product they create, not through the data.
According to research conducted by Forrester, 60 to 73 percent of all data in an organization cannot be used for analytical purposes. Meanwhile, in a recent Accenture survey, only 32 percent of companies reported being able to derive tangible and measurable value from data, while only 27 percent said data and analytics projects generate a high level of actionable insights and recommendations.
In the centralized model, data is ultimately transferred to a data team outside the operational function, and data engineers and analysts then try to understand and derive value from this data and all other functions. This is of course problematic because it is the data producers who have the breadth and depth of the context surrounding the data - they are the ones who are the most knowledgeable about it. In this model, analysts are far from data generation and people who know the data best. Moreover, data engineering studies become an open bottleneck between data producers and analysts. This inefficiency results in an “apocalyptic cycle” in which any changes required by analytics or additional information take too long to produce; when data is updated to analysts' specifications, it is often no longer necessary or additional changes have been identified. When there is no clear connection between data producers and data consumers, loss of feedback and loss of value in data occurs.
Domain-driven data ownership
Data Mesh relies on shifting ownership of data from an external data team to operational space — without this, it can be said that you will continue to repeat the challenges described in detail above, where value continues to disappear as data ownership changes hands. Data Mesh essentially implements domain-oriented parsing and ownership of an organization's data. Domains are responsible for the data they generate - receiving, transforming it and making it available to end users. By shifting ownership and responsibility of the data back to the domain, there is no transfer of data ownership and therefore loss of value - the people who know the most about the data are the people who prepare and provide the data for analysis. Data becomes just another product that the domain produces and is responsible for, and data engineers focus on data in a single domain, working closely with other domain SMEs to produce valuable data products.
In particular, domain data product ownership means that product owners and developers have both responsibility and accountability for:
- Creation and presentation of data products to other fields and end users
- Ensure that data is easily accessible, usable, ready and meets defined quality criteria
- Development of the data product based on user feedback and the decommissioning of the data product when it is no longer used or not relevant
- Promotion and “marketing” of these data products to the rest of the organization
Space-oriented technology opportunities
While the social aspect of data responsibility is important, it requires certain technological capabilities to produce a data product. These capabilities will be determined by the domain, so the adoption of technology capabilities drives the domain. For example, a domain may need a more secure upstream environment for PII or financial data, or may be pulling data from third-party partners. Domains must use data acquisition, transformation, and presentation tools that are meaningful to their private data. However, the data product format must be standardized and presented in a standardized way in the analytical plane (aka the Data Mesh experience plane), which ensures the smooth operation of data product consumers. The domain must decide on data technologies that will enable them to develop their own data product in the domain environment.
What does it look like in practice?
In practice, domains must include people and processes capable of obtaining data from operational and analytical planes and generating data products that are offered based on expert knowledge and work experience. Data products from each domain need to be submitted to the analytical environment for use by analysts and other domains — which means that the data must be defined by the domain in such a way that it can be understood and easily used by users outside the domain.
The change in ownership of domain data in this way ultimately means that the responsibility of the domain is further expanded and the employees in that domain put more effort into it. This creates the need for data engineers to be freed from their previous positions in a centralized data organization toward their domains. Reorganizing data engineering under the authority of a CTO or CIO is a familiar leadership challenge for many companies struggling to generate enough value from a centralized data organization. To implement this, domains must be promoted to encourage the ownership of data products.
This is ultimately a good career and organizational step for data engineers, who are able to focus more on data modeling and the production of high-quality data products instead of spreading themselves out too much.
There is also an opportunity for software engineers to become “citizen data engineers” in their field, which is a great opportunity to spread their knowledge of the field as career development and data products are developed. On the other hand, as they develop more domain-specific knowledge and have domain knowledge, there is also the opportunity for analysts to become more like data engineers. A non-trivial skill overlap (e.g. SQL) is a boon for the analyst and data engineer to have a common language and provides career momentum for both.
So how does this make Data Mesh possible?
Domain-driven data ownership and architecture are key to enabling and routing the other three policies that govern Data Mesh:
- Domains are the explicit owners and manufacturers of data products
- When domains contain data products from other domains (during product development or when producing additional data products), there must be a contract that governs the collaborative relationship between the domains involved
- Data products that incorporate a combination of various data products accelerate insight time, thereby increasing the total value of the business and shortening the data-value gap
- Domains control governance guidelines, including authorization specific to each data product
- Domains operate within the security, compliance, and regulatory framework defined and implemented by the central IT organization
- Domains create data products on a self-service infrastructure provided by the central IT organization.
How does Starburst support domain-oriented ownership?
In essence, Starburst shortens the path between data and the business value derived from the data. What this means in the context of producing data products is that a domain can rely on Starburst to enable data engineers to focus less on building infrastructure and processes to support data engineering efforts. Data engineers can instead focus more on using simple tools they already know, such as SQL, to prepare high-quality, low-latency data products for end users. Starburst can also be used by analysts and data scientists in the cross-domain analytical layer as a query engine that facilitates and simplifies access to data products.
Starburst also reduces the total number of vendors (and vendor-specific information) required, and with its large set of connectors, allows each domain to connect to data wherever and in what format it is. With its SQL-based interface, Starburst provides a consistent and familiar interface that uses data language, enabling “citizen data engineers” as well as analysts across the organization. Plus, no matter where you are on your cloud or microservices architecture journey, Starburst not only supports data from different architectures, it's flexible enough to move with you throughout that journey — it's easy to add new data sources or adjust existing ones.
İlginizi Çekebilecek Diğer İçeriklerimiz
Veri analisti (Data Analyst), verileri toplayan, analiz eden ve bu verilerden anlamlı içgörüler çıkararak işletmelere stratejik kararlar almalarında yardımcı olan bir profesyoneldir.
Makine Öğrenimi Mühendisi (Machine Learning Engineer), veri analizi ve yapay zeka algoritmalarıyla çalışan, makinelerin öğrenmesini ve veri odaklı kararlar almasını sağlayan sistemleri geliştiren bir profesyoneldir. Bu mühendisler, istatistik, programlama ve veri bilimi becerilerini kullanarak, iş süreçlerini otomatikleştiren ve optimize eden çözümler oluşturur.