We Value Your Policy

We use cookies to enhance your browser experience, analyze site traffic, and personalize content. By clicking "Accept All," you consent to our use of cookies. For more, read our Privacy Policy
Back to Blog
April 10, 2024

Reading Delta Lake with Daft

Announcing the launch of Daft's Delta Lake read support

by Jay Chia

The Daft team is excited to announce that we now support reading from Delta Lake!

We released a blogpost on the Delta Lake blog, but here’s the TLDR.

What is Delta Lake?

Delta Lake is an open table format which provides an abstraction over a table of data. Under the hood, this table of data is represented with Parquet files (which Daft is able to read extremely efficiently), but Delta Lake keeps track of metadata about these files to allow users of the table to efficiently query large terabyte-scale tables!

What’s Cool?

In the blogpost, we go through some benchmarks locally with other local engines (Polars, Pandas and DuckDB). Daft’s integrations with Delta Lake allows it to outperform pandas by 15.8xDuckDB by 2.3x, and Polars by 2x for partitioned and z-ordered Delta Lake tables.

What’s Coming Up?

We have some exciting new features around Delta Lake in the pipeline too. These include:

P.S., if you’re interested in exploring the intersection of modern data and ML stacks, our team is hiring! :)

Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and revolutionize your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo