Eventual Logo
Back to Blog

Introducing Daft-SQL

Subtitle: A SQL API enabling users to interact with their data in a new but familiar way!


What is Daft-SQL?

Our new Daft-SQL API is designed to let users interact with their data using traditional SQL queries within the daft ecosystem. Whether you’re analyzing large datasets or performing simple queries, Daft-SQL offers a streamlined, SQL-based interface that simplifies your workflow.

[daft sql.gif]


Our SQL dialect is based on PostgreSQL, one of the most widely used dialects in the industry. This means that if you’re already familiar with PostgreSQL, you'll feel right at home using Daft-SQL. Common SQL commands and functions, such as SELECT, WHERE, and JOIN, behave as you’d expect, ensuring a low learning curve for those accustomed to SQL.


Why Use Daft-SQL?

For many users, learning a new DataFrame API can be challenging—and being forced to do so can be frustrating, especially when you’re already proficient in SQL. Daft-SQL bridges this gap by allowing you to query your data with SQL commands, eliminating the need to learn new DataFrame-specific methods and syntax. This makes it particularly appealing for SQL professionals and data analysts who want to leverage the power of daft without having to dive into the intricacies of a new API.


Daft-SQL is built on the same query execution engine as the native daft DataFrame API, so there are no performance trade-offs between using SQL and using the DataFrame methods. You get the same speed and efficiency both ways. This allows you to choose the interface that works best for you, without worrying about sacrificing performance.


How to Use Daft-SQL

Using Daft-SQL is simple. It’s already integrated into the daft library, so no additional setup is required. Here’s a basic example to get you started:


1import daft
2df = daft.read_csv('data.csv')
3
4# Run a SQL query
5daft.sql('SELECT * FROM df').collect()
6
7# Alternatively, you can manually construct a `SQLCatalog` for more control over the SQL context
8from daft.sql import SQLCatalog
9
10# Create a SQLCatalog
11catalog = SQLCatalog({'table_1': df})
12
13daft.sql('SELECT * FROM table_1', catalog).collect()


In this example, we import the necessary components from daft, create a DataFrame by reading a CSV file, and then run a simple SQL query to select all records from the DataFrame. Daft-SQL supports all the usual SQL operations, making it extremely flexible for a variety of use cases. Whether you need to filter data, aggregate results, or join tables, Daft-SQL has you covered!


Tips and Tricks

We’ve designed Daft-SQL to stay as close to our DataFrame API as possible. However, there are a few key differences in function names that users should be aware of.


In Daft-SQL, all .str namespaced functions from the DataFrame API are available globally, and we use snake_case for function names to maintain consistency. This ensures that functions are easy to understand and use, while also aligning with SQL conventions.


Some notable differences include:

- str.endswith is written as ends_with in Daft-SQL.

- str.startswith is written as starts_with in Daft-SQL.


Aside from these small variations, most functions have a direct mapping between the DataFrame API and the SQL API, so switching between the two is straightforward.


Here’s an example to illustrate this:

1# DataFrame API
2df.select(
3 col('name').str.endswith('x'),
4 col('lst').list.count(),
5 col('img').image.decode()
6)
7
8# SQL API
9daft.sql("""
10 SELECT
11 ends_with(name, 'x'),
12 list_count(lst),
13 image_decode(img)
14 FROM df
15""")


In this example, you can see how similar the functionality is across both APIs. This makes it easy to switch between the two depending on your preferred style or the task at hand.


Furthermore, certain features, like aliasing columns and conditional logic, are fully supported using standard SQL syntax. For instance:


1# DataFrame API
2df.select(col('name').alias('my_alias'))
3df.select((df["A"] > df["B"]).if_else(df["A"], df["B"]))
4
5# SQL API
6daft.sql('SELECT name AS my_alias FROM df')
7daft.sql('SELECT CASE WHEN A > B THEN A ELSE B END FROM df')


With Daft-SQL, we’re excited to offer a tool that brings together the power of SQL and the flexibility of daft. Happy querying!


Need Help?

We’re here to support you as you explore the full potential of Daft-SQL → See Daft-SQL Documentation


If you run into any issues or have questions, feel free to reach out to us via our Distributed Data Slack Community or open an issue on our Daft GitHub. We’re always looking for feedback to make Daft-SQL even better, so don’t hesitate to share your thoughts and experiences!