Many businesses are on a journey to enable self-service analysis. They want to provide self-service solutions that allow all their users to use data and analysis for organizational benefit. A growing number of businesses are using technologies such as cloud data lake and cloud data warehouse to drive their digital transformation. However, technical professionals are struggling to consolidate business definitions in one place to provide a single source of truth that is trustworthy, understandable, discoverable and cost-effective.
On the other hand, business professionals struggle to gain reliable data using the business values they are familiar with. In addition, the business relies heavily on IT for analytics-based decision making.
Take this example of a technology company that, after a rapid growth of the business and two years of building the data platform, the company has reached a massive scale of data. The company’s challenges include:
- He can’t effectively support the strategy at the company level
- No unified business semantics
- Low adoption of the data platform because the business does not trust the data.
- A massive scale of data: 5.7K Operational Data Store (ODS) tables grow to 1 million data warehouse (DW) tables.
- Line out of control: A TX_ORDERS main table has 10,000 direct descendants.
- More duplicate ETLs and wasted computing.
These problems are so harmful that this company has fallen into the swamps of data without trusting the data, much less in the self-service analysis of the business.
This company expects to transition its existing approach to governed values to replace the existing self-service ETL. This will help the company save millions from its IT budget each year, once the values are governed.
What is a metrics store?
An excellent solution to these problems is the metrics store. A metrics repository is a middle ground between upstream data warehouses / sources and downstream business applications.
The metrics store decouples the definition of BI reporting values and data warehouses. And teams that own values can define their values only once in the stock store, forming that unique source of truth, and can constantly reuse values in BI, automation tools, business workflows, or even advanced analytics.
The metric counts
“When it comes to managing business processes or any production process, unless performance is being tracked continuously, how do you know if you’re improving?” This famous quote from Peter Drucker talks about the idea, if you can’t measure it, you can’t improve it.
A metrics store is first a management system, then a data system. Like ERP, its core is improving management; what big data technology improves is measurement accuracy and management efficiency. All big data technologies, data warehouses, data lakes, ETL / ELT, various BI and reports are used to guide management decisions, and the technology itself is not the goal. If companies want to optimize their management system, values work as a cornerstone.
Placing values in analysis and BI tools is a natural option. After all, it is intuitive to put values where they will be consumed. However, it introduces the issue of discrepancy. Metric definitions in BI applications are isolated and difficult to reuse in many applications. When you have many BI tools in your organization (which is a typical case, as every business unit would have a BI preference), it’s hard to standardize values across BI platforms.
Another typical solution is to place definitions and calculations for metrics in a data warehouse. However, this option also raises two issues:
- Similar to BI tools, a wide range of analytical engines are used to support various use cases. For this reason, it is unlikely that it will be possible to achieve a single layer of unified values across all of them.
- Every data warehouse expert understands how difficult data storage is for business users to understand. The business learning curve is high when values are placed in the data warehouse.
How does Metrics Store help?
So how does stocking help solve these problems? Let’s go back to the example of the technology company. Instead of creating an excessive number of aggregated tables in the data repository without governance, the technology company can standardize business requirements by placing a repository between ODS tables and business applications. The IT team can manage values in one place and bring standardization to the values of all business teams. Business requirements can be standardized and reused in the metric store with 2,000 basic metrics. This standardization can save up to 90% -95% of the entire ETL process.
The business user can create their own self-service values on the consumer side, where the last mile of analysis is. It is a process of innovation in self-governing service, because business users can generate derivative values based on well-governed core values.
Benefits of IT governance and business innovation
A middle ground between placing values in the data warehouse / data lake and level BI is placing values in an autonomous metrics repository, the stock store. The metrics store helps businesses solve some of the silo and trust issues by offering the following benefits:
Self-service business analysis: Companies can easily reuse and create their own self-service values without the involvement of IT.
Data trust: The Metrics store brings the only source of truth for business. As values are standardized and well-governed, the business team will regain their trust and confidence in the data.
Data governance: In the previous approach, everyone could create values in BI or DW, which leads to chaos and poor data governance. With a single store of reusable values, IT can easily track the descent and use of data.
Cost-effectiveness of data management: The metrics store helps reduce process chaos and ETL governance, saving countless IT efforts for the enterprise.
Luke Han is co-founder and CEO of Kyligenceas well as co-founder and member of the Project Management Committee for Apache Kylin. He was the first VP of the top tier of the Apache Software Foundation in China. He is also Microsoft Regional Director (RD) and Tencent Cloud Valuable Professional (TVP).
Rising and falling data governance (again)
50 years of ETL: can SQL be replaced by ETL?
Why it is so important to get the right values in machine learning