You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by dz...@apache.org on 2023/02/21 08:06:02 UTC

[drill-site] branch master updated (958774133 -> d1dede5c1)

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git


    from 958774133 Grammatical fixes in using-jdbc-driver.
     new d6ef361c7 Document the need for a default format when querying an empty dir.
     new d1dede5c1 Document INTERSECT and EXCEPT.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 ...0-data-sources-and-file-formats-introduction.md |   6 +-
 .../odbc-jdbc-interfaces/015-using-jdbc-driver.md  |   5 +-
 .../sql-commands/087-set-operators.md              | 110 +++++++++++++++++++++
 .../sql-commands/087-union-set-operator.md         |  55 -----------
 4 files changed, 118 insertions(+), 58 deletions(-)
 create mode 100644 _docs/en/sql-reference/sql-commands/087-set-operators.md
 delete mode 100644 _docs/en/sql-reference/sql-commands/087-union-set-operator.md


[drill-site] 02/02: Document INTERSECT and EXCEPT.

Posted by dz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git

commit d1dede5c1b352e54d5838849507a3e43f1587b86
Author: James Turton <ja...@somecomputer.xyz>
AuthorDate: Tue Feb 21 10:05:41 2023 +0200

    Document INTERSECT and EXCEPT.
---
 .../sql-commands/087-set-operators.md              | 110 +++++++++++++++++++++
 .../sql-commands/087-union-set-operator.md         |  55 -----------
 2 files changed, 110 insertions(+), 55 deletions(-)

diff --git a/_docs/en/sql-reference/sql-commands/087-set-operators.md b/_docs/en/sql-reference/sql-commands/087-set-operators.md
new file mode 100644
index 000000000..8907101e4
--- /dev/null
+++ b/_docs/en/sql-reference/sql-commands/087-set-operators.md
@@ -0,0 +1,110 @@
+---
+title: "Set Operators"
+slug: "Set Operators"
+parent: "SQL Commands"
+---
+
+The UNION, INTERSECT and EXCEPT set operators combine the result sets of two separate query expressions. The result set of each query must have the same number of columns and compatible data types.
+
+# UNION
+
+The UNION operator computes the set union of its two arguments, automatically removing duplicate records from the result set. UNION ALL returns all duplicate records.
+
+## Syntax
+The UNION operator supports the following syntax:
+
+       query
+       { UNION [ ALL ] }
+       another_query
+
+## Parameters
+*query*, *another_query*
+
+Any SELECT query that Drill supports. See [SELECT]({{site.baseurl}}/docs/select/).
+
+## Examples
+The following example uses the UNION ALL set operator to combine click activity data before and after a marketing campaign. The data in the example exists in the `dfs.clicks workspace`.
+
+       0: jdbc:drill:> SELECT t.trans_id transaction, t.user_info.cust_id customer
+       FROM `clicks/clicks.campaign.json` t
+       UNION ALL
+       SELECT u.trans_id, u.user_info.cust_id FROM `clicks/clicks.json` u LIMIT 5;
+       |-------------|------------|
+       | transaction |  customer  |
+       |-------------|------------|
+       | 35232       | 18520      |
+       | 31995       | 17182      |
+       | 35760       | 18228      |
+       | 37090       | 17015      |
+       | 37838       | 18737      |
+       |-------------|------------|
+
+This UNION ALL query returns rows that exist in two files (and includes any duplicate rows from those files): `clicks.campaign.json` and `clicks.json`
+
+If a query on either side of the UNION operator queries an empty directory, as shown in the following example where empty_DIR is an empty directory:
+
+       0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL select cast(columns[0] as int) c1 from `testWindow.csv`;
+
+Drill treats the empty directory as a schemaless table and returns results as if the UNION operator is not included in the query.
+
+# INTERSECT and EXCEPT
+
+**Introduced in release: 1.21**
+The INTERSECT and EXCEPT operators respectively compute the set intersection and the set difference of their two arguments. As with the UNION operator, duplicate records are auotmatically removed from the result set.
+
+## Syntax
+
+These set operators support the following syntax:
+
+```
+query
+{ INTERSECT | EXCEPT }
+another_query
+```
+
+## Parameters
+*query*, *another_query*
+
+Any SELECT query that Drill supports. See [SELECT]({{site.baseurl}}/docs/select/).
+
+## Examples
+
+### Compute { 1, 2, 4 } ∩ { 4, 5, 6 }.
+```
+apache drill> WITH
+	X AS (SELECT employee_id AS col FROM cp.`employee.json` LIMIT 3 OFFSET 0),
+	Y AS (SELECT employee_id AS col FROM cp.`employee.json` LIMIT 3 OFFSET 2)
+SELECT COL FROM X
+INTERSECT
+SELECT col FROM Y;
+
+col  4
+
+1 row selected (0.449 seconds)
+```
+
+### Compute { 1, 2, 4 } ∖ { 4, 5, 6 }
+
+```
+apache drill> WITH
+	X AS (SELECT employee_id AS col FROM cp.`employee.json` LIMIT 3 OFFSET 0),
+	Y AS (SELECT employee_id AS col FROM cp.`employee.json` LIMIT 3 OFFSET 2)
+SELECT COL FROM X
+EXCEPT
+SELECT col FROM Y;
+
+col  1
+
+col  2
+
+2 rows selected (0.41 seconds)
+```
+
+# Set operator usage notes
+   * The two SELECT query expressions that represent the direct operands of the set operator must produce the same number of columns. Corresponding columns must contain compatible data types. See [Supported Data Types]({{site.baseurl}}/docs/supported-data-types/).
+   * Multiple set operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses.
+   * You can only use * on either side of a set operator when the data source has a defined schema, such as data in Hive or views.
+   * You must explicitly specify columns.
+   * Drill 1.13 and later supports queries on empty directories. An empty directory in a query interacts with set operators in the same way is if it is the empty set. See [Schemaless Tables]({{site.baseurl}}/docs/data-sources-and-file-formats-introduction/#schemaless-tables) for more information.
+
+
diff --git a/_docs/en/sql-reference/sql-commands/087-union-set-operator.md b/_docs/en/sql-reference/sql-commands/087-union-set-operator.md
deleted file mode 100644
index cd9d8a865..000000000
--- a/_docs/en/sql-reference/sql-commands/087-union-set-operator.md
+++ /dev/null
@@ -1,55 +0,0 @@
----
-title: "UNION Set Operator"
-slug: "UNION Set Operator"
-parent: "SQL Commands"
----
-The UNION set operator combines the result sets of two separate query expressions. The result set of each query must have the same number of columns and compatible data types. UNION automatically removes duplicate records from the result set. UNION ALL returns all duplicate records.
-
-
-## Syntax
-The UNION set operator supports the following syntax:
-
-       query
-       { UNION [ ALL ] }
-       query
-
-
-## Parameters
-*query*
-
-Any SELECT query that Drill supports. See [SELECT]({{site.baseurl}}/docs/select/).
-
-## Usage Notes
-   * The two SELECT query expressions that represent the direct operands of the UNION must produce the same number of columns. Corresponding columns must contain compatible data types. See [Supported Data Types]({{site.baseurl}}/docs/supported-data-types/).
-   * Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses.
-   * You can only use * on either side of UNION when the data source has a defined schema, such as data in Hive or views.
-   * You must explicitly specify columns.
-   * Drill 1.13 and later supports queries on empty directories. An empty directory in a query does not change the results; Drill returns results as if the query does not contain the UNION operator. See [Schemaless Tables]({{site.baseurl}}/docs/data-sources-and-file-formats-introduction/#schemaless-tables) for more information.
-
-## Examples
-The following example uses the UNION ALL set operator to combine click activity data before and after a marketing campaign. The data in the example exists in the `dfs.clicks workspace`.
-
-       0: jdbc:drill:> SELECT t.trans_id transaction, t.user_info.cust_id customer
-       FROM `clicks/clicks.campaign.json` t
-       UNION ALL
-       SELECT u.trans_id, u.user_info.cust_id FROM `clicks/clicks.json` u LIMIT 5;
-       |-------------|------------|
-       | transaction |  customer  |
-       |-------------|------------|
-       | 35232       | 18520      |
-       | 31995       | 17182      |
-       | 35760       | 18228      |
-       | 37090       | 17015      |
-       | 37838       | 18737      |
-       |-------------|------------|
-
-This UNION ALL query returns rows that exist in two files (and includes any duplicate rows from those files): `clicks.campaign.json` and `clicks.json`
-
-If a query on either side of the UNION operator queries an empty directory, as shown in the following example where empty_DIR is an empty directory:
-
-       0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL select cast(columns[0] as int) c1 from `testWindow.csv`;
-
- Drill treats the empty directory as a schemaless table and returns results as if the UNION operator is not included in the query.
-
-
-


[drill-site] 01/02: Document the need for a default format when querying an empty dir.

Posted by dz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git

commit d6ef361c7f12ca671b6eaf10135c011ea3fb71fe
Author: James Turton <ja...@somecomputer.xyz>
AuthorDate: Mon Feb 20 16:32:53 2023 +0200

    Document the need for a default format when querying an empty dir.
---
 .../010-data-sources-and-file-formats-introduction.md               | 6 +++++-
 _docs/en/odbc-jdbc-interfaces/015-using-jdbc-driver.md              | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/_docs/en/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md b/_docs/en/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
index a38347b7f..7b86f6f31 100644
--- a/_docs/en/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
+++ b/_docs/en/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
@@ -49,7 +49,10 @@ You set the input format for data coming from data sources to Drill in the works
 You change one of the `store` properties in the [sys.options table]({{ site.baseurl }}/docs/configuration-options-introduction/) to set the output format of Drill data. The default storage format for Drill CREATE TABLE AS (CTAS) statements is Parquet.
 
 ## Schemaless Tables
-As of Drill 1.13, Drill supports queries on empty directories. Empty directories are directories that exist, but do not contain files. Currently, an empty directory in Drill is a Drill table without a schema, or a “schemaless” table. An empty directory with Parquet metadata cache files is also a schemaless table in Drill.
+
+**Introduced in release: 1.13**
+
+Drill supports queries on empty directories i.e. directories that exist but do not contain files.  In Drill 1.21 the constraint that the enclosing workspace must have a default format set for a query against an empty directory to be considered valid was added. Currently, an empty directory in Drill is a Drill table without a schema, or a “schemaless” table. An empty directory with Parquet metadata cache files is also a schemaless table in Drill.
 
 Drill supports queries with JOIN and UNION [ALL] operators on empty directories. For example, if you issue the following queries with the UNION ALL operator, Drill queries the empty directory (empty_DIR) as a schemaless table and returns results for the query on the right side of the operator:
 
@@ -57,6 +60,7 @@ Drill supports queries with JOIN and UNION [ALL] operators on empty directories.
 
 ### Usage Notes
 
+- Drill 1.21 and later: if the workspace configuration specifies no default format, i.e. `"defaultInputFormat": null`, then queries on an empty directory fail with a `VALIDATION ERROR`.
 - Queries with stars (*) on an empty directory return an empty result set.
 - Fields indicated in the SELECT statement are returned as INT-OPTIONAL types.
 - The empty directory in a query with the UNION operator does not change the results; Drill returns results as if the query does not contain the UNION operator.
diff --git a/_docs/en/odbc-jdbc-interfaces/015-using-jdbc-driver.md b/_docs/en/odbc-jdbc-interfaces/015-using-jdbc-driver.md
index ae50136e0..81df2bf84 100644
--- a/_docs/en/odbc-jdbc-interfaces/015-using-jdbc-driver.md
+++ b/_docs/en/odbc-jdbc-interfaces/015-using-jdbc-driver.md
@@ -87,7 +87,7 @@ where
 
 `drillbit=<node name>` specifies one or more host names or IP addresses of cluster nodes running Drill.
 
-### `tries` Parameter
+### The `tries` Parameter
 
 As of Drill 1.10, you can include the optional `tries=<value>` parameter in the connection string, as shown in the following URL:
 
@@ -153,4 +153,5 @@ try (
   while (ctasResults.next()); // scroll through results to ensure that we wait for CTAS completion
 }
 ```
-      
+
+See also: the `exec.query.return_result_set_for_ddl` config option.