You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/28 00:04:30 UTC
[GitHub] [spark] maropu commented on a change in pull request #29883: [SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs

maropu commented on a change in pull request #29883:
URL: https://github.com/apache/spark/pull/29883#discussion_r495635757



##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
 
     Partitions are created on the table, based on the columns specified.
     
+* **CLUSTERED BY**
+
+    Specifies bucket columns for bucketing table.
+    
+* **SORTED BY**
+
+    Used to sort bucket column, we can combine with `ASC` for ascending order, with `DESC` for descending order.
+    
+* **INTO num_buckets BUCKETS**
+
+    Specifies buckets numbers, which is used in  `CLUSTERED BY` clause.

Review comment:
       nit: redundant spaces found between `in`/`CLUSTER BY`. And, `in the CLUSTERED BY clause`?

##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
 
     Partitions are created on the table, based on the columns specified.
     
+* **CLUSTERED BY**
+
+    Specifies bucket columns for bucketing table.

Review comment:
       `Specifies bucket column names for bucketing a table`?

##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -203,6 +218,17 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
     STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
         OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
     LOCATION '/tmp/family/';
+
+--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
+CREATE TABLE TEST1(ID INT, AGE STRING)
+    CLUSTERED BY (ID)
+    INTO 4 BUCKETS
+
+--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
+CREATE TABLE TEST2(ID INT, NAME STRING)

Review comment:
       ditto

##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -203,6 +218,17 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
     STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
         OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
     LOCATION '/tmp/family/';
+
+--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
+CREATE TABLE TEST1(ID INT, AGE STRING)

Review comment:
       nit: To follow the format of the other examples, `CREATE TABLE clustered_by_test1 (ID ...`?

##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
 
     Partitions are created on the table, based on the columns specified.
     
+* **CLUSTERED BY**
+
+    Specifies bucket columns for bucketing table.
+    
+* **SORTED BY**
+
+    Used to sort bucket column, we can combine with `ASC` for ascending order, with `DESC` for descending order.

Review comment:
       How about rephrasing it like this? `Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC  for a descending order after any column names in the SORTED BY clause. If not specified, ASC is assumed by default.`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org