You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2022/09/20 01:45:14 UTC
[flink] 02/25: [FLINK-29025][docs] add overview page of queries for Hive dialect
This is an automated email from the ASF dual-hosted git repository.
jark pushed a commit to branch release-1.16
in repository https://gitbox.apache.org/repos/asf/flink.git
commit 282a3f19eb35c48e149035ddf0460ef6e1d52d0d
Author: luoyuxia <lu...@alumni.sjtu.edu.cn>
AuthorDate: Mon Aug 29 15:01:37 2022 +0800
[FLINK-29025][docs] add overview page of queries for Hive dialect
---
.../hiveDialect/Queries/_index.md | 21 +++
.../hiveDialect/Queries/overview.md | 163 +++++++++++++++++++++
.../hiveDialect/Queries/_index.md | 21 +++
.../hiveDialect/Queries/overview.md | 163 +++++++++++++++++++++
4 files changed, 368 insertions(+)
diff --git a/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md b/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md
new file mode 100644
index 00000000000..d3ec8ca31d9
--- /dev/null
+++ b/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md
@@ -0,0 +1,21 @@
+---
+title: Queries
+bookCollapseSection: true
+weight: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
diff --git a/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md b/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md
new file mode 100644
index 00000000000..90e6a062223
--- /dev/null
+++ b/docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md
@@ -0,0 +1,163 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /dev/table/hiveCompatibility/hiveDialect/Queries/overview
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Queries
+
+## Description
+
+Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select).
+The following lists some parts of HiveQL supported by the Hive dialect.
+
+- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}})
+- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}})
+- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}})
+- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}})
+- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}})
+- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}})
+- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}})
+- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}})
+- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}})
+- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}})
+
+## Syntax
+
+The following section describes the overall query syntax.
+The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.
+
+```sql
+[WITH CommonTableExpression (, CommonTableExpression)*]
+SELECT [ALL | DISTINCT] select_expr, select_expr, ...
+ FROM table_reference
+ [WHERE where_condition]
+ [GROUP BY col_list]
+ [ORDER BY col_list]
+ [CLUSTER BY col_list
+ | [DISTRIBUTE BY col_list] [SORT BY col_list]
+ ]
+ [LIMIT [offset,] rows]
+```
+- A `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query
+- `table_reference` indicates the input to the query. It can be a regular table, a view, a [join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}).
+- Table names and column names are case-insensitive
+
+### WHERE Clause
+
+The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFS](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF)
+in the `WHERE` clause. Some types of [sub queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) are supported in `WHERE` clause.
+
+### GROUP BY Clause
+
+Please refer to [GROUP BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) for more details.
+
+### ORDER BY Clause
+
+The `ORDER BY` clause is used to return the result rows in a sorted manner in the user specified order.
+Different from [SORT BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}#sort-by), `ORDER BY` clause guarantees
+a total order in the output.
+
+{{< hint warning >}}
+**Note:**
+To guarantee global order, there has to be single one task to sort the final output.
+So if the number of rows in the output is too large, it could take a very long time to finish.
+{{< /hint >}}
+
+## CLUSTER/DISTRIBUTE/SORT BY
+
+Please refer to [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) for more details.
+
+### ALL and DISTINCT Clauses
+
+The `ALL` and `DISTINCT` options specify whether duplicate rows should be returned or not.
+If none of these two options are given, the default is `ALL` (all matching rows are returned).
+`DISTINCT` specifies removal of duplicate rows from the result set.
+
+### LIMIT Clause
+
+The `LIMIT` clause can be used to constrain the number of rows returned by the `SELECT` statement.
+
+`LIMIT` takes one or two numeric arguments, which must both be non-negative integer constants.
+The first argument specifies the offset of the first row to return and the second specifies the maximum number of rows to return.
+When a single argument is given, it stands for the maximum number of rows and the offset defaults to 0.
+
+## Examples
+
+Following is an example of using hive dialect to run some queries.
+
+{{< hint warning >}}
+**Note:** Hive dialect no longer supports [Flink SQL queries]({{< ref "docs/dev/table/sql/queries/overview" >}}). Please switch to default dialect if you’d like to write in Flink syntax.
+{{< /hint >}}
+
+```bash
+Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
+[INFO] Execute statement succeed.
+
+Flink SQL> use catalog myhive;
+[INFO] Execute statement succeed.
+
+Flink SQL> load module hive;
+[INFO] Execute statement succeed.
+
+Flink SQL> use modules hive,core;
+[INFO] Execute statement succeed.
+
+Flink SQL> set table.sql-dialect=hive;
+[INFO] Session property has been set.
+
+FLINK SQL> set sql-client.execution.result-mode=tableau;
+
+Flink SQL> select explode(array(1,2,3)); -- call hive udtf
++----+-------------+
+| op | col |
++----+-------------+
+| +I | 1 |
+| +I | 2 |
+| +I | 3 |
++----+-------------+
+Received a total of 3 rows
+
+Flink SQL> create table tbl (key int,value string);
+[INFO] Execute statement succeed.
+
+Flink SQL> insert into table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+
+FLINK SQL> set execution.runtime-mode=batch; -- change to batch mode
+
+Flink SQL> select * from tbl cluster by key; -- run cluster by
+2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
++-----+-------+
+| key | value |
++-----+-------+
+| 1 | a |
+| 1 | a |
+| 5 | e |
+| 2 | b |
+| 3 | c |
+| 3 | c |
+| 3 | c |
+| 4 | d |
++-----+-------+
+Received a total of 8 rows
+```
diff --git a/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md b/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md
new file mode 100644
index 00000000000..d3ec8ca31d9
--- /dev/null
+++ b/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/_index.md
@@ -0,0 +1,21 @@
+---
+title: Queries
+bookCollapseSection: true
+weight: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
diff --git a/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md b/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md
new file mode 100644
index 00000000000..90e6a062223
--- /dev/null
+++ b/docs/content/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md
@@ -0,0 +1,163 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /dev/table/hiveCompatibility/hiveDialect/Queries/overview
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Queries
+
+## Description
+
+Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select).
+The following lists some parts of HiveQL supported by the Hive dialect.
+
+- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}})
+- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}})
+- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}})
+- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}})
+- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}})
+- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}})
+- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}})
+- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}})
+- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}})
+- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}})
+
+## Syntax
+
+The following section describes the overall query syntax.
+The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.
+
+```sql
+[WITH CommonTableExpression (, CommonTableExpression)*]
+SELECT [ALL | DISTINCT] select_expr, select_expr, ...
+ FROM table_reference
+ [WHERE where_condition]
+ [GROUP BY col_list]
+ [ORDER BY col_list]
+ [CLUSTER BY col_list
+ | [DISTRIBUTE BY col_list] [SORT BY col_list]
+ ]
+ [LIMIT [offset,] rows]
+```
+- A `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query
+- `table_reference` indicates the input to the query. It can be a regular table, a view, a [join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}).
+- Table names and column names are case-insensitive
+
+### WHERE Clause
+
+The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFS](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF)
+in the `WHERE` clause. Some types of [sub queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) are supported in `WHERE` clause.
+
+### GROUP BY Clause
+
+Please refer to [GROUP BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) for more details.
+
+### ORDER BY Clause
+
+The `ORDER BY` clause is used to return the result rows in a sorted manner in the user specified order.
+Different from [SORT BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}#sort-by), `ORDER BY` clause guarantees
+a total order in the output.
+
+{{< hint warning >}}
+**Note:**
+To guarantee global order, there has to be single one task to sort the final output.
+So if the number of rows in the output is too large, it could take a very long time to finish.
+{{< /hint >}}
+
+## CLUSTER/DISTRIBUTE/SORT BY
+
+Please refer to [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) for more details.
+
+### ALL and DISTINCT Clauses
+
+The `ALL` and `DISTINCT` options specify whether duplicate rows should be returned or not.
+If none of these two options are given, the default is `ALL` (all matching rows are returned).
+`DISTINCT` specifies removal of duplicate rows from the result set.
+
+### LIMIT Clause
+
+The `LIMIT` clause can be used to constrain the number of rows returned by the `SELECT` statement.
+
+`LIMIT` takes one or two numeric arguments, which must both be non-negative integer constants.
+The first argument specifies the offset of the first row to return and the second specifies the maximum number of rows to return.
+When a single argument is given, it stands for the maximum number of rows and the offset defaults to 0.
+
+## Examples
+
+Following is an example of using hive dialect to run some queries.
+
+{{< hint warning >}}
+**Note:** Hive dialect no longer supports [Flink SQL queries]({{< ref "docs/dev/table/sql/queries/overview" >}}). Please switch to default dialect if you’d like to write in Flink syntax.
+{{< /hint >}}
+
+```bash
+Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
+[INFO] Execute statement succeed.
+
+Flink SQL> use catalog myhive;
+[INFO] Execute statement succeed.
+
+Flink SQL> load module hive;
+[INFO] Execute statement succeed.
+
+Flink SQL> use modules hive,core;
+[INFO] Execute statement succeed.
+
+Flink SQL> set table.sql-dialect=hive;
+[INFO] Session property has been set.
+
+FLINK SQL> set sql-client.execution.result-mode=tableau;
+
+Flink SQL> select explode(array(1,2,3)); -- call hive udtf
++----+-------------+
+| op | col |
++----+-------------+
+| +I | 1 |
+| +I | 2 |
+| +I | 3 |
++----+-------------+
+Received a total of 3 rows
+
+Flink SQL> create table tbl (key int,value string);
+[INFO] Execute statement succeed.
+
+Flink SQL> insert into table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+
+FLINK SQL> set execution.runtime-mode=batch; -- change to batch mode
+
+Flink SQL> select * from tbl cluster by key; -- run cluster by
+2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
++-----+-------+
+| key | value |
++-----+-------+
+| 1 | a |
+| 1 | a |
+| 5 | e |
+| 2 | b |
+| 3 | c |
+| 3 | c |
+| 3 | c |
+| 4 | d |
++-----+-------+
+Received a total of 8 rows
+```