You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by cg...@apache.org on 2023/02/26 16:00:30 UTC

[drill-site] branch master updated: Create 126-delta-lake-format-plugin.md

This is an automated email from the ASF dual-hosted git repository.

cgivre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 1f751423a Create 126-delta-lake-format-plugin.md
1f751423a is described below

commit 1f751423a17c517b4f534106476fdc085b7591a3
Author: Charles S. Givre <cg...@apache.org>
AuthorDate: Sun Feb 26 11:00:25 2023 -0500

    Create 126-delta-lake-format-plugin.md
---
 .../126-delta-lake-format-plugin.md                | 61 ++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/_docs/en/data-sources-and-file-formats/126-delta-lake-format-plugin.md b/_docs/en/data-sources-and-file-formats/126-delta-lake-format-plugin.md
new file mode 100644
index 000000000..171f1f373
--- /dev/null
+++ b/_docs/en/data-sources-and-file-formats/126-delta-lake-format-plugin.md
@@ -0,0 +1,61 @@
+---
+title: "Delta Lake Format Plugin"
+slug: "Delta Lake Format Plugin"
+parent: "Data Sources and File Formats"
+---
+
+**Introduced in release:** 1.21
+
+This format plugin enables Drill to query Delta Lake tables.
+
+## Supported optimizations and features
+
+### Project pushdown
+
+This format plugin supports project and filter pushdown optimizations.
+
+For the case of project pushdown, only columns specified in the query will be read, even when they are nested columns.
+
+### Filter pushdown
+
+For the case of filter pushdown, all expressions supported by Delta Lake API will be pushed down, so only data that
+matches the filter expression will be read. Additionally, filtering logic for parquet files is enabled
+to allow pruning of parquet files that do not match the filter expression.
+
+### Querying specific table versions (snapshots)
+
+Delta Lake has the ability to travel back in time to the specific data version.
+
+The following ways of specifying data version are supported:
+
+- `version` - the version number of the specific snapshot
+- `timestamp` - the timestamp in milliseconds at or before which the specific snapshot was generated
+
+Table function can be used to specify one of the above configs in the following way:
+
+```sql
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'delta', version => 0));
+
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'delta', timestamp => 1636231332000));
+```
+
+## Configuration
+
+The format plugin has the following configuration options:
+
+- `type` - format plugin type, should be `'delta'`
+
+### Format config example:
+
+```json
+{
+  "type": "file",
+  "formats": {
+    "delta": {
+      "type": "delta"
+    }
+  }
+}
+```