Glossary: Data Lake

What is a data lake?

In information systems, a data lake is a system or repository of data stored in raw data format, usually blobs or files. Typically, a data lake is a single repository for all enterprise data, including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can contain structured data from relational databases (rows and columns), from CSV, XML or JSON formats or unstructured data such as emails, documents, PDF files and binary data (image, sound, memory images).

Data lakes are used in industries such as retail, banking, hospitality and even travel. To track and predict customer preferences and improve the overall customer experience.

Generic analysis methods are also stored alongside the data. These are therefore also available for the centrally stored data and do not have to be compiled in advance of each analysis process. Compared to data warehouses, data lakes therefore usually require much more storage capacity. Unprocessed raw data is also malleable, can be quickly analyzed for a wide variety of purposes and is ideal for machine learning.

A data swamp is an unmanaged data lake that is either inaccessible to its intended users or offers little value. If appropriate data quality and data governance measures are not implemented, data swamps are created.

Back to the glossary overview

This is Testify

Quality management

Manufacturing & Production

Maintenance & servicing

Audits & inspections

Further use cases

Quality management

Securing food quality digitally

Manufacturing & Production

Checking HSE requirements globally

Maintenance & servicing

Ready to use, even without internet

Audits & inspections

20+ sites, 7 countries, one tool

Further use cases

Testify in every industry

Erfolgsgeschichten mit Testify

Machinery & vehicle construction

Railway and infrastructure

Chemical industry

Food industry

Aviation and defense industry

Other industry

Arvato

backaldrin

ekey

Evonik

GCS

Haidlmair

Oswald

Plasser & Theurer

Plasser Australia

Primetals

RUBBLE MASTER

Steinbacher

Stihl

TSA

VIVATIS

Weinbergmaier

Glossary: Data Lake

What is a data lake?