Themes & Ideas

The Problem of Data Trustworthiness in Enterprises

Published on

September 29, 2020

The Emergent Team

Data Observability: Solving the Enterprise Data Complexity Challenge

A lot is happening in the data world these days — ever since Jeff Bezos talked about specialized databases for specialized workloads. A natural consequence of this specialization is the proliferation of databases in a typical enterprise today – sometimes, up to 10 different types of databases.

Data teams have built 1-1, 1-many and many-many connections between various databases leading to 100's of data pipelines. There is Jevons Paradox at play here as well. As databases proliferate, pipelines that connect them also increase. These data pipelines have become declarative and engineers are empowered with better tools to create even more pipelines.

The Enterprise Data Landscape

A typical data landscape in an enterprise looks like the figure below – this one is from a leading ride-sharing company.

‍

Chief Data Officers (CDO) are under tremendous pressure to derive value out of this data, especially in times of crisis like today [1]. Analytics and AI use cases push the limits of data management as well. In fact, the complexity and heterogeneity are increasing in enterprises. Case in point, data engineering [2,3] is the fastest-growing tech job in the US.

‍

Cloud Data Lakes: Benefits and Challenges

Cloud data lakes have emerged as a popular architectural pattern to help organizations get value out of this data. Schema-on-read (the new paradigm that data lakes introduced) has a benefit that it allows organizations to approach data challenges in a crawl-walk-run fashion – unlike data warehouses where more up-front planning needs to be done [4].

However, there is no free lunch — the price has to be paid somewhere, for the agility, flexibility and cost advantages of data lakes. This usually happens since the data often:

Lacks context
Doesn't meet the quality required for applications
Is not easily understandable or discoverable by users

Problems of data consistency and accuracy make it hard to derive value from data lakes and to trust the analytics based on this data. The traditional methods of manually documenting, classifying and assessing the data don't scale to the volume of cloud-based data lakes.

‍

The External Data Challenge

To exacerbate the problem, sometime during the mid-2000s, data (particularly as it relates to business decision making) crossed an important line. Previously, the majority of such data was sourced internally and its quality and reliability were in the hands of the internal systems and IT developers that created and maintained it.

Since then, an increasing proportion of data comes from external sources. While internal data quality has often been questioned, it certainly far exceeds that of external data [5].

‍

Enter Data Observability

To summarize, enterprises are swimming in data. However, to truly enable data-driven enterprises, you need to let a lot of people use this data. This is the problem that Data Observability aims to solve.

Data Observability enables enterprises to run predictable data pipelines by providing contextual information for data monitoring and data quality. The ultimate goal of this endeavor is to make Big Data work the way you'd imagine it worked if you'd only used Small Data before. In effect, to operationalize data trustworthiness in enterprises.

References

[0] Werner Vogels wrote one size fits all database, fits no one.

[1] Are we asking too much of Chief Data Officers and their data teams?

[2] Data Engineers are part of the Analytics "dream team".

[3] What does a Data Engineer do?

[4] Data Lakes and Data Warehouses – it's not either/or but both!

[5] Data Quality market is expected to be a $2.5B market in the next 8 years.

[6] Database of Databases – www.dbdb.io

‍

We help founders get to the *next level

We’re hands-on partners to ambitious Enterprise AI founders. We invest early and help achieve product-market fit, build go-to-market, and scale into breakout growth.

The Problem of Data Trustworthiness in Enterprises

Data Observability: Solving the Enterprise Data Complexity Challenge

The Enterprise Data Landscape

Cloud Data Lakes: Benefits and Challenges

The External Data Challenge

Enter Data Observability

References

More from Emergent

Observe launches VoiceAI agents to automate customer call centers

Nexthop AI raises $110M led by Lightspeed

Prezent raises $20M to build AI for slide decks

We help founders get to the *next level