You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2020/09/30 23:26:24 UTC
[spark] branch branch-3.0 updated:
[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
This is an automated email from the ASF dual-hosted git repository.
yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new db6ba04 [SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
db6ba04 is described below
commit db6ba049c43e2aa1521ed39c9f2b802ad04d111f
Author: GuoPhilipse <46...@users.noreply.github.com>
AuthorDate: Thu Oct 1 08:15:53 2020 +0900
[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
### What changes were proposed in this pull request?
update sql-ref docs, the following key words will be added in this PR.
CLUSTERED BY
SORTED BY
INTO num_buckets BUCKETS
### Why are the changes needed?
let more users know the sql key words usage
### Does this PR introduce _any_ user-facing change?
No
![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png)
![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png)
![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png)
### How was this patch tested?
generate html test
Closes #29883 from GuoPhilipse/add-sql-missing-keywords.
Lead-authored-by: GuoPhilipse <46...@users.noreply.github.com>
Co-authored-by: GuoPhilipse <gu...@126.com>
Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
(cherry picked from commit 3bdbb5546d2517dda6f71613927cc1783c87f319)
Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
docs/sql-ref-syntax-ddl-create-table-datasource.md | 7 ++++-
docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 32 ++++++++++++++++++++++
2 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md
index d334447..ba0516a 100644
--- a/docs/sql-ref-syntax-ddl-create-table-datasource.md
+++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md
@@ -67,7 +67,12 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
* **SORTED BY**
- Determines the order in which the data is stored in buckets. Default is Ascending order.
+ Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
+ If not specified, ASC is assumed by default.
+
+* **INTO num_buckets BUCKETS**
+
+ Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
* **LOCATION**
diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md
index 7bf847d..3a8c8d5 100644
--- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md
+++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md
@@ -31,6 +31,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
+ [ CLUSTERED BY ( col_name1, col_name2, ...)
+ [ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ]
+ INTO num_buckets BUCKETS ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
@@ -65,6 +68,21 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
Partitions are created on the table, based on the columns specified.
+* **CLUSTERED BY**
+
+ Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.
+
+ **NOTE:** Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.
+
+* **SORTED BY**
+
+ Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
+ If not specified, ASC is assumed by default.
+
+* **INTO num_buckets BUCKETS**
+
+ Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
+
* **row_format**
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
@@ -203,6 +221,20 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';
+
+--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
+CREATE TABLE clustered_by_test1 (ID INT, AGE STRING)
+ CLUSTERED BY (ID)
+ INTO 4 BUCKETS
+ STORED AS ORC
+
+--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
+CREATE TABLE clustered_by_test2 (ID INT, NAME STRING)
+ PARTITIONED BY (YEAR STRING)
+ CLUSTERED BY (ID, NAME)
+ SORTED BY (ID ASC)
+ INTO 3 BUCKETS
+ STORED AS PARQUET
```
### Related Statements
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org