In order for companies to get the most out of their data, they need a sustainable backbone to facilitate new business models. A dataplatform is exactly what it stands for: a platform to ingest, store, connect, transform, model, and serve data. Thriving companies have already put data at the core of their businessess and recent developments have made dataplatforms accessible and affordable for any business size.
The core question is not if you should consider putting data at the core of your company but how fast can you start?
The best of both worlds
Data warehouses have a long history in decision support and business intelligence applications, though were not suited or were expensive for handling unstructured data, semi-structured data, and data with high variety, velocity, and volume.
Delta Lake was launched in early 2019 by Databricks, an Apache Spark company, as a cloud table format built on open standards and partially open-source.
The idea behind Delta Lake was to support the very frequently requested features in the modern data platforms ecosystem, or should we say Big Data – data mutability, point in time view of data (aka Time Travel), ACID guarantees and concurrent writes, and more.
Data lakehouses are enabled by a new, open system design: implementing similar data structures and data management features to those in a data warehouse, directly on the kind of low-cost storage used for data lakes. Merging them together into a single system means that data teams can move faster as they are able to use data without needing to access multiple systems. Data lakehouses also ensure that teams have the most complete and up-to-date data available for data science, machine learning, and business analytics projects.
At the time of its launch, Delta Lake was released with some features behind a “paywall” and you had to be a Databricks customer to use them in their Databricks Platform. It was only in 2022 that Databricks agreed to fully open source the entire product and contribute future enhancements to the Linux Foundation.
Even as an open source platform, Databricks are investing and aligning Delta Lake features so it aligns with their other products and cloud (and understandably so). Additionally, because it’s maintained by a large and well-known commercial entity, Delta Lake is likely to have more development effort committed than other data platforms technologies (esp. that it is now an integral part of Delta Live Tables that is yet another hot product from Databricks).
Delta Lake is mostly a Java and Apache Spark effort, but it is indeed developed through an open standard so there are, and will be, integrations from other languages and tools, for example native integration with the populair business language SQL.