You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Baike Xia (Code Review)" <ge...@cloudera.org> on 2022/09/30 03:01:06 UTC

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Baike Xia has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19055


Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
16 files changed, 411 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia <xi...@163.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> Yes, i think so. But it was originally intended that later versions would a
We already support the SortBy clause. Currently it's independent with the ClusteredBy clause, e.g.

  CREATE TABLE tbl (i int, s string)
  CLUSTERED BY (i) INTO 24 BUCKETS SORT BY (i);

What I mean is changing it to

  CREATE TABLE tbl (i int, s string)
  CLUSTERED BY (i) SORT BY (i) INTO 24 BUCKETS;

The latter one is the syntax used in Hive and SparkSQL. I think we don't need to add any new feature for this, and the grammer might be easier that we can add the EMPTY production rule.



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 12:31:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Manish Maheshwari (Code Review)" <ge...@cloudera.org>.
Manish Maheshwari has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using Cluster by and so does Hive - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 07 Nov 2022 18:26:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 17: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 16:47:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 07:54:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 415 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11589/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 10 Oct 2022 12:03:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 10:

(10 comments)

Thanks for contributing this! I left some comments.

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@18
PS10, Line 18:  [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
It'd be better if we can highlight this line since it's the only new part. We should also write this based on the cup file.


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
Do you mean "CLUSTERED" has been used as a hint so we can't use it as a keyword? I'm not sure what blocks this. Could you share the error you saw? It'd be nice to have the consistent syntax as HQL.


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@30
PS10, Line 30: 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
Are these recognized by Hive? i.e. if Hive inserts data into the table, is it using the hash algorithm we expected?


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   | opt_bucket_desc:bucket
I think we don't need two switches here. Like other optional fields, we can add an empty switch in opt_bucket_desc, e.g.
https://github.com/apache/impala/blob/6a1a871fb7f014be0ab9dbc0ac450416b897a263/fe/src/main/cup/sql-parser.cup#L1650-L1653


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = new Pair<List<String>, TSortingOrder>(null, TSortingOrder.LEXICAL); :}
nit: Let's skip reformatting unrelated codes.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@329
PS10, Line 329:     ensureTableNotBucketed(table);
This blocks us from dropping a bucketed table. But it's ok to support dropping bucketed tables in another JIRA.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex@178
PS10, Line 178: kuduhash
The commit message mentions "kudu_hash".


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2844
PS10, Line 2844: HASH()
Can we add tests for "HASH" and "KUDUHASH" without the parentheses?

  BUCKETED BY HASH INTO 12 BUCKETS
  BUCKETED BY KUDUHASH INTO 12 BUCKETS


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@a4347
PS10, Line 4347: 
Can we keep this since this still doesn't work?


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3074
PS10, Line 3074:     ParsesOk("CREATE TABLE bucketed_test (i int, s string) BUCKETED BY RANDOM");
Could you add some tests for KUDU_HASH?



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 18 Oct 2022 12:04:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11633/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 03:55:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@512
PS15, Line 512:         sb.append(String.format("SORT BY %s (\n  %s\n)\n", 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 09:43:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG@19
PS11, Line 19: RANDOM
Is RANDOM actually useful in practise? Could you share some use cases?


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   | opt_bucket_desc:bucket
> Yes, but adding empty to opt_bucket_desc causes a compilation error. So I t
I see. I checked the Hive parser and realized that in HiveQL the SortBy clause is part of the BucketClause:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable
https://github.com/apache/hive/blob/16ce75578c265d0aaba7eedafb65658fc569f75e/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g#L1916

  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

SparkSQL also has the SortBy clause inside the BucketClause:
https://spark.apache.org/docs/3.3.1/sql-ref-syntax-ddl-create-table-hiveformat.html

    [ CLUSTERED BY ( col_name1, col_name2, ...) 
        [ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ] 
        INTO num_buckets BUCKETS ]

I think the syntax consistency in the ecosystem is important. Could you try the same syntax that moving the SortBy clause into the Bucket clause? Probably the grammer will be easier and we can work around this EMPTY production issue.



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 06:08:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29:   the hash partition is equivalent to a bucket,
> If add supported "CLUSTERED" in cup file, execute SQL with CLUTERED in hint
I see. Previously, CLUSTERED is identified as an IDENTIFIER. Now we define it as a keyword. So we have to add a new production rule in plan_hint for KW_CLUSTERED, just like what we have done for KW_STRAIGHT_JOIN, i.e.

 plan_hint ::=
  KW_STRAIGHT_JOIN
  {: RESULT = new PlanHint("straight_join"); :}
  | KW_CLUSTERED
  {: RESULT = new PlanHint("clustered"); :}
  | IDENT:name
  {: RESULT = new PlanHint(name); :}
  ...


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = new Pair<List<String>, TSortingOrder>(null, TSortingOrder.LEXICAL); :}
> I Got.
This hasn't been addressed.



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 05:08:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11690/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Oct 2022 12:37:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 352 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/14
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is the hash function used
  in Hive's bucketed tables;
2. Create Bucketed Table statements currently don't support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;
4. Support dropping bucketed table;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
18 files changed, 380 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/18
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8768/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 11:35:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8735/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 25 Oct 2022 08:34:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

> Patch Set 18:
> 
> Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using Cluster by and so does Hive - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
> https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html

Hi Manish, glad to see your comment.
In Hive and Spark, "clustered by " is used to specify the bucketed fields and number of buckets when the table is created. In select syntax, "cluster by" ensures each of N reducers gets non-overlapping ranges , then sorts by those ranges at the reducers. 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html
https://stackoverflow.com/questions/34495981/difference-between-cluster-by-and-clustered-by-in-hive


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 08 Nov 2022 02:29:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11566/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Oct 2022 08:53:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 413 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8679/


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 10 Oct 2022 16:47:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11732/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 10:05:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is he hash function used
  in Hive's bucketed tables;
2. Create Bucketed Table statements currently don't support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;
4. Support drop bucketed table;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
18 files changed, 380 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/17
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Nov 2022 11:22:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> We already support the SortBy clause. Currently it's independent with the C
OK, I'm going to do that.
Before, I was thinking about adding the syntax is simple, but the logic we need to implement inserts and queries is more complex.



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 03:22:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8819/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 04:22:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19:

> Patch Set 19: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8819/

I don't think the failure is related to this patch. Filed IMPALA-11747.
Merging this. Thank Baike!


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 25 Nov 2022 05:30:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 20:

> Change has been successfully rebased and submitted as 2733d039ad4a830a1ea34c1a75d2b666788e39a9 by Quanlong Huang

Thank you for the many times of guidance and CR, Quanlong.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 20
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 25 Nov 2022 06:24:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11493/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 04:35:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 411 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/14/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/14/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@512
PS14, Line 512:         sb.append(String.format("SORT BY %s (\n  %s\n)\n", sortProperties.second.toString(),
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 09:37:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11730/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 09:56:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

(2 comments)

Thanks very much.

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@21
PS17, Line 21: th
> nit: "the"
Done


http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@27
PS17, Line 27: drop
> nit: dropping
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 12:14:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 14:

(1 comment)

> Patch Set 13:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   ;
> Yeah, this is just for table creation. For adding write support, we can sup
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 09:36:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 411 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8643/


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 08:37:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY HASH(i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY KUDU_HASH(i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY RANDOM INTO 24 BUCKETS;

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 439 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/11
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8671/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Oct 2022 08:36:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8679/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 10 Oct 2022 11:42:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is the hash function used
  in Hive's bucketed tables;
2. Create Bucketed Table statements currently don't support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;
4. Support dropping bucketed table;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Reviewed-on: http://gerrit.cloudera.org:8080/19055
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
18 files changed, 380 insertions(+), 24 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 20
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8819/


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 09:28:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29: 
> I see. Previously, CLUSTERED is identified as an IDENTIFIER. Now we define 
Wow,  I was puzzled for a long time, thanks very much.


http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG@19
PS11, Line 19: :
> Is RANDOM actually useful in practise? Could you share some use cases?
No, isn't. And the random ensures an even distribution of the data,  but do not apply bucket_join.
Don't worry about that. As communicated, only one hash algorithm is supported.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> I see. I checked the Hive parser and realized that in HiveQL the SortBy cla
Yes, i think so. But it was originally intended that later versions would add sortby, because this increases the complexity of the implementation. This should be achieved in the future.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = TableDataLayout.createKuduPartitionedLayout(partition_params); :}
> This hasn't been addressed.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 09:27:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY ([column [, column ...]]) INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i);

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
3. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
4. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 350 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/13
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11720/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 09:44:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11731/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 10:04:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 353 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/15
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8643/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 04:23:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8702/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 03:35:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 08:41:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19055/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19055/5/common/thrift/CatalogObjects.thrift@196
PS5, Line 196: // When create bucketd table, need to know about bucket's type, bucket's columns and number.
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19055/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19055/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java@792
PS5, Line 792:     if (bucketDesc.getBucket_columns() == null || bucketDesc.getBucket_columns().size() == 0) {
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/19055/5/fe/src/main/java/org/apache/impala/util/BucketUtils.java
File fe/src/main/java/org/apache/impala/util/BucketUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/5/fe/src/main/java/org/apache/impala/util/BucketUtils.java@40
PS5, Line 40:     TBucketType bucketType = TBucketType.valueOf(params.get(Table.HIVE_IMPALA_BUCKET_TYPE));
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 04:29:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19055/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19055/6/common/thrift/CatalogObjects.thrift@196
PS6, Line 196: // When create bucketd table, need to know 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19055/6/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19055/6/fe/src/main/java/org/apache/impala/analysis/TableDef.java@792
PS6, Line 792:     if (bucketDesc.getBucket_columns() == null 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 06:16:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/11492/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 03:30:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 04:22:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8823/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 11:45:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 17: Code-Review+1

(2 comments)

Thanks for the update! The patch LGTM.

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@21
PS17, Line 21: he
nit: "the"


http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@27
PS17, Line 27: drop
nit: dropping



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 12:04:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 17:

(12 comments)

Hi Quanlong, thanks for your review and comments.
I have fixed your comments. When testing 'show-create-table', I found a bug, and fixed it, and added support for bucketed table deletion.

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@21
PS16, Line 21: he hash functi
> nit: "the hash function used in Hive's bucketed tables"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@22
PS16, Line 22: 
> nit: currently don't
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@155
PS16, Line 155: tion 
> nit: "type" ?
Maybe that makes it easier to understand: 'Data distribution method of bucketed table.'


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@194
PS16, Line 194:   2: optional i64 total_file_bytes
              : }
> nit: The variable names are clear enough. We can simplify the comment to so
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@497
PS16, Line 497:  optional TValidWriteIdList 
> nit: "Bucket information for HDFS tables"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@404
PS16, Line 404: isBucketableFormat() {
> nit: it'd be better to rename it to something like "isBucketableFormat"
Great.


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@756
PS16, Line 756: yzeBucketColumns(options_.bucketInfo, getColumnNames(),
> nit: we can skip this check since it's done in the following method.
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@778
PS16, Line 778:           "'%s'", options_.fileFormat));
              :     }
              :     if (bucketInfo.getNum_bucket() <= 0) {
              :      
> nit: kudu is checked in isSupportBucketedTable(). Do we still need this che
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java
File fe/src/main/java/org/apache/impala/util/BucketUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@20
PS16, Line 20: import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@31
PS16, Line 31: mStorageDescriptor(StorageDe
> nit: "StorageDescriptor of the HMS table"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test
File testdata/workloads/functional-query/queries/QueryTest/create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test@349
PS16, Line 349: ---- RESULTS: VERIFY_IS_SUBSET
> Can we add the rows of "Num Buckets" and "Bucket Columns" ?
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@1013
PS16, Line 1013:  'engine.hive.enabled'='true', 'table_type'='ICEBERG', 'write.merge.mode'='copy-on-write')
> Could you also add a test for bucket table in this file?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 11:35:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@18
PS10, Line 18: CREATE TABLE tbl (i int COMMENT 'hello', s string)
> It'd be better if we can highlight this line since it's the only new part. 
Got.


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29:   the hash partition is equivalent to a bucket,
> Do you mean "CLUSTERED" has been used as a hint so we can't use it as a key
If add supported "CLUSTERED" in cup file, execute SQL with CLUTERED in hint and an error will be reported.I.E. execute sql - "create  /* +CLUSTERED */ test  as select * from tpcds.item;", error messge:
`
Query: create /* +CLUSTERED */ test
as select * from tpcds.item
Query submitted at: 2022-10-24 09:09:52 (Coordinator: http://d403ca04eda0:25000)
ERROR: ParseException: Syntax error in line 1:
create /* +CLUSTERED */ test
           ^
Encountered: CLUSTERED
Expected: STRAIGHT_JOIN, COMMA, IDENTIFIER

CAUSED BY: Exception: Syntax error
`


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@30
PS10, Line 30:   and the optimization rule applies to join query;
> Are these recognized by Hive? i.e. if Hive inserts data into the table, is 
If HASH is used, the behavior is the same as hive. If not, the hive behavior is incompatible with the Hive behavior.
If Hive inserts data into the table, it's considered a HASH, which is what we expect.

Multiple bucket hash functions are used because hive's bucket hash algorithm is different from kudu's bucket hash algorithm. To be compatible with bucket join optimization in kudu table, multiple bucket hash functions are used. In other words, the kudu table is not supported in HASH mode. Using KUDU_HASH, however, results in tabular forms not being recognized by computing engines other than impala.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   | opt_bucket_desc:bucket
> I think we don't need two switches here. Like other optional fields, we can
Yes, but adding empty to opt_bucket_desc causes a compilation error. So I took this approach. Or, can you give me some advice?
`
Warning : *** Reduce/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and     opt_sort_cols ::= (*)
  under symbols: {}
  Resolved in favor of the first production.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_sort_cols ::= (*)
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.
`


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = new Pair<List<String>, TSortingOrder>(null, TSortingOrder.LEXICAL); :}
> nit: Let's skip reformatting unrelated codes.
I Got.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex@178
PS10, Line 178: kudu_has
> The commit message mentions "kudu_hash".
Done


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2844
PS10, Line 2844: ber mu
> Can we add tests for "HASH" and "KUDUHASH" without the parentheses?
Yes, I can.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@a4347
PS10, Line 4347: 
> Can we keep this since this still doesn't work?
Yes, we can keep this since. This was taken off when I tried clustered.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3074
PS10, Line 3074:     ParsesOk("CREATE TABLE bucketed_test (i int COMMENT 'hello', s string) " +
> Could you add some tests for KUDU_HASH?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Oct 2022 12:16:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> OK, I'm going to do that.
Yeah, this is just for table creation. For adding write support, we can support inserting into non-sorted bucketed table first.



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 28 Oct 2022 08:08:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
     INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 353 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/16
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 408 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 25 Oct 2022 13:54:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11717/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 27 Oct 2022 08:25:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [BUCKETED BY ([column [, column ...]]) INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY (i);

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm is a hash of Hive;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 349 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11768/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 11:51:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11769/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 02 Nov 2022 12:32:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8772/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Nov 2022 06:17:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Nov 2022 00:51:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 16: Code-Review+1

(12 comments)

Thanks for updating the syntax! I only have some minor comments.

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@21
PS16, Line 21: a hash of Hive
nit: "the hash function used in Hive's bucketed tables"


http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@22
PS16, Line 22: that do not
nit: currently don't


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@155
PS16, Line 155: table
nit: "type" ?


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@194
PS16, Line 194: // When create bucketd table, need to know
              : // about bucket's type, bucket's columns and number.
nit: The variable names are clear enough. We can simplify the comment to something like "Represents the bucket spec of a table".


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@497
PS16, Line 497: Bucket type, columns, number
nit: "Bucket information for HDFS tables"


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@404
PS16, Line 404: isSupportBucketedTable
nit: it'd be better to rename it to something like "isBucketableFormat"


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@756
PS16, Line 756: && options_.bucketInfo.getBucket_type() != TBucketType.NONE
nit: we can skip this check since it's done in the following method.


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@778
PS16, Line 778:     if (isKuduTable()) {
              :       throw new AnalysisException(String.format("CLUSTERED BY not supported for Kudu " +
              :           "tables."));
              :     }
nit: kudu is checked in isSupportBucketedTable(). Do we still need this check?


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java
File fe/src/main/java/org/apache/impala/util/BucketUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@20
PS16, Line 20: import java.util.Map;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@31
PS16, Line 31: hive table'StorageDescriptor
nit: "StorageDescriptor of the HMS table"


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test
File testdata/workloads/functional-query/queries/QueryTest/create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test@349
PS16, Line 349: ---- RESULTS: VERIFY_IS_SUBSET
Can we add the rows of "Num Buckets" and "Bucket Columns" ?


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@1013
PS16, Line 1013:  'engine.hive.enabled'='true', 'table_type'='ICEBERG', 'write.merge.mode'='copy-on-write')
Could you also add a test for bucket table in this file?



-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 01 Nov 2022 13:33:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18: Code-Review+2

Confirmed with Manish offline. The syntax is good for him. Merging this. Thanks for your contribution and your patience, Baike!


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 04:21:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 19: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 19
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Manish Maheshwari <ma...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Nov 2022 16:46:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11495/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Sep 2022 06:37:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8671/


-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Oct 2022 12:49:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Posted by "Baike Xia (Code Review)" <ge...@cloudera.org>.
Baike Xia has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 420 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/10
-- 
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>