You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/06/10 15:34:33 UTC

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17575


Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala to use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax. We might later consider making 'SPEC' optional.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
19 files changed, 171 insertions(+), 178 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 14 Jul 2021 14:07:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17575

to look at the new patch set (#2).

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala to use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax. We might later consider making 'SPEC' optional.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
19 files changed, 173 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7301/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Jul 2021 08:59:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7216/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 15:35:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9096/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Jul 2021 09:17:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 14 Jul 2021 14:10:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7299/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 14 Jul 2021 14:10:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 3:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17575/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17575/3//COMMIT_MSG@35
PS3, Line 35: We might later consider making 'SPEC' optional.
> I think making 'SPEC' optional maybe confused users. It's better to removed
Hive uses PARTITIONED BY SPEC, Spark uses PARTITIONED BY only, so it would make sense to support both in the future.

Anyway, I removed this sentence because in the near future we'll only support the Hive-like PARTITIONED BY SPEC. Supporting PARTITIONED BY would require bigger changes in the SQL parser.


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java:

http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@97
PS3, Line 97: builder.append(transformType_.toString() + "(");
> Maybe this is better: 
Done


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@99
PS3, Line 99: builder.append(transformParam_.toString() + ", ");
> Same as above
Done


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@297
PS3, Line 297:       case "HOUR":  case "HOURS":  return TIcebergPartitionTransformType.HOUR;
             :       case "DAY":   case "DAYS":   return TIcebergPartitionTransformType.DAY;
             :       case "MONTH": case "MONTHS": return TIcebergPartitionTransformType.MONTH;
             :       case "YEAR":  case "YEARS":  return TIcebergPartitionTransformType.YEAR;
> I see that both Hive and Spark use HOURS/DAYS/MONTHS/YEARS in your comment 
I contacted the Hive team, and actually Hive supports both:
https://github.com/apache/hive/blob/8ef538c6d84d0c9d7b610ca446bdc1083d62fa1b/iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java#L182



-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 21 Jun 2021 12:39:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 4:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG@33
PS4, Line 33: makes Impala to use
> typo: makes Impala use
Done


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@291
PS4, Line 291: transformType.startsWit
> Not your change, but why is startsWith() used instead of equals() for BUCKE
Because transformType might contain the parameters as well (e.g. "BUCKET[5]") when we process the partition spec loaded from iceberg.


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@302
PS4, Line 302: "Unsupported iceberg partition type: "
> Do we have a test that exercises this error message?
Added a test case to iceberg-negative.test


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@296
PS4, Line 296:   switch (transformType) {
             :       case "HOUR":  case "HOURS":  return TIcebergPartitionTransformType.HOUR;
             :       case "DAY":   case "DAYS":   return TIcebergPartitionTransformType.DAY;
             :       case "MONTH": case "MONTHS": return TIcebergPartitionTransformType.MONTH;
             :       case "YEAR":  case "YEARS":  return TIcebergPartitionTransformType.YEAR;
             :       default:
             :         throw new TableLoadingException("Unsupported iceberg partition type: " +
             :             transformType);
             :     }
> nit: Maybe adding these transform type strings and the ones above to a Stri
'transformType' for BUCKET and TRUNCATE might contain the parameters as well, see above.



-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 13 Jul 2021 08:44:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Jul 2021 08:59:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 4: Code-Review+1

Ok, I understand, thanks for your explanation!


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 22 Jun 2021 01:50:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17575

to look at the new patch set (#4).

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala to use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_event_processing.py
20 files changed, 174 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Jul 2021 15:15:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17575

to look at the new patch set (#7).

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_event_processing.py
20 files changed, 181 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17575

to look at the new patch set (#5).

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_event_processing.py
20 files changed, 181 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 7: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Jul 2021 08:58:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8957/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 21 Jun 2021 13:01:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Reviewed-on: http://gerrit.cloudera.org:8080/17575
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_event_processing.py
20 files changed, 181 insertions(+), 180 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7299/


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 14 Jul 2021 20:24:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8896/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 12:56:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG@33
PS4, Line 33: makes Impala to use
typo: makes Impala use


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@291
PS4, Line 291: transformType.startsWit
Not your change, but why is startsWith() used instead of equals() for BUCKET and TRUNCATE transports?


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@302
PS4, Line 302: "Unsupported iceberg partition type: "
Do we have a test that exercises this error message?


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@296
PS4, Line 296:   switch (transformType) {
             :       case "HOUR":  case "HOURS":  return TIcebergPartitionTransformType.HOUR;
             :       case "DAY":   case "DAYS":   return TIcebergPartitionTransformType.DAY;
             :       case "MONTH": case "MONTHS": return TIcebergPartitionTransformType.MONTH;
             :       case "YEAR":  case "YEARS":  return TIcebergPartitionTransformType.YEAR;
             :       default:
             :         throw new TableLoadingException("Unsupported iceberg partition type: " +
             :             transformType);
             :     }
nit: Maybe adding these transform type strings and the ones above to a String -> TIcebergPartitionTransformType immutable map would make the code shorter and simpler.



-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 07 Jul 2021 14:04:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7216/


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 21:32:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Attila Jeges, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17575

to look at the new patch set (#3).

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala to use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax. We might later consider making 'SPEC' optional.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-overwrite.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_event_processing.py
20 files changed, 174 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17575/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 3:

(4 comments)

Sorry for my late reply, Zoltan. Thanks for this feature, it's quite important to keep consistent syntax with other engines. Here is some of my opinions.

http://gerrit.cloudera.org:8080/#/c/17575/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17575/3//COMMIT_MSG@35
PS3, Line 35: We might later consider making 'SPEC' optional.
I think making 'SPEC' optional maybe confused users. It's better to removed this key word or keeping it as necessary in create statement. How do you think?


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java:

http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@97
PS3, Line 97: builder.append(transformType_.toString() + "(");
Maybe this is better: 
builder.append(transformType_.toString()).append("(");


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@99
PS3, Line 99: builder.append(transformParam_.toString() + ", ");
Same as above


http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17575/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@297
PS3, Line 297:       case "HOUR":  case "HOURS":  return TIcebergPartitionTransformType.HOUR;
             :       case "DAY":   case "DAYS":   return TIcebergPartitionTransformType.DAY;
             :       case "MONTH": case "MONTHS": return TIcebergPartitionTransformType.MONTH;
             :       case "YEAR":  case "YEARS":  return TIcebergPartitionTransformType.YEAR;
I see that both Hive and Spark use HOURS/DAYS/MONTHS/YEARS in your comment example, maybe we do not need to support HOUR/DAY/MONTH/YEAR.



-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 17 Jun 2021 08:16:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 15:57:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8903/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 14 Jun 2021 13:15:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17575/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/17575/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@4861
PS1, Line 4861:         "PARTITIONED BY SPEC (BUCKET(10, p1), TRUNCATE(5, p1), DAY(p2)) STORED AS ICEBERG" +
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 15:35:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg partitions
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9080/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 13 Jul 2021 09:04:19 +0000
Gerrit-HasComments: No