You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/02/26 17:07:28 UTC

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17130


Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................

IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

This patch adds support for CREATE TABLE AS SELECT statements
for Iceberg tables.

CTAS statements work like the following in Impala:

1. Analysis of the whole CTAS statement
2. Divide CTAS to CREATE stmt and INSERT stmt
3. Create temporary in-memory target table from the CREATE stmt
4. Analyse the INSERT statement by using the temporary target table
5. If everything is OK so far, create the target table
6. Execute the INSERT query

For Iceberg tables the non-trivial thing was to create the temporary
target table without actually creating it via Iceberg API. I've created
a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be
used with catalog V1 and V2 as well.

Testing
 * e2e CTAS tests in iceberg-ctas.test
 * SHOW CREATE TABLE stmts in show-create-table.test

Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
---
M be/src/exec/hdfs-table-sink.cc
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
18 files changed, 698 insertions(+), 44 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17130/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17130

to look at the new patch set (#3).

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................

IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

This patch adds support for CREATE TABLE AS SELECT statements
for Iceberg tables.

CTAS statements work like the following in Impala:

1. Analysis of the whole CTAS statement
2. Divide CTAS to CREATE stmt and INSERT stmt
3. Create temporary in-memory target table from the CREATE stmt
4. Analyse the INSERT statement by using the temporary target table
5. If everything is OK so far, create the target table
6. Execute the INSERT query

For Iceberg tables the non-trivial thing was to create the temporary
target table without actually creating it via Iceberg API. I've created
a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be
used with catalog V1 and V2 as well.

Testing
 * e2e CTAS tests in iceberg-ctas.test
 * SHOW CREATE TABLE stmts in show-create-table.test

Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
18 files changed, 686 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17130/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 1:

(5 comments)

Nice work, Zoltan! I left some comments, nothing too serious.

http://gerrit.cloudera.org:8080/#/c/17130/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17130/1/be/src/exec/hdfs-table-sink.cc@545
PS1, Line 545:           new HdfsParquetTableWriter(
I'm a bit confused here. This is for Iceberg with Parquet as file format, but don't we support other formats such as ORC?


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@514
PS1, Line 514: if (isStaticPartitioning) {
             :             throw new AnalysisException("Static partitioning is not supported for " +
             :                 "Iceberg tables.");
             :           }
You can move this check inside the for loop above.


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
File fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java@48
PS1, Line 48:     // colsByPos[i] refers to the ith column in the table. The first numClusteringCols are
nit: indentation


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@74
PS1, Line 74: private final FeDb db_;
            :   private final org.apache.hadoop.hive.metastore.api.Table msTable_;
CtasTargetTble already has members with the same name, abs as I see they get the same value as well. Are these needed here?


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@222
PS1, Line 222:   public void addColumn(Column col) {
Can't you use IcebergColumn type for the param? You could get rid of the DCHECK in L223 then.



-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 15:19:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 1:

(7 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17130/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17130/1/be/src/exec/hdfs-table-sink.cc@545
PS1, Line 545:           new HdfsParquetTableWriter(
> I'm a bit confused here. This is for Iceberg with Parquet as file format, b
This is only the write side. But now that I'm setting the file format for the prototype partition in IcebergCtasTarget this line is no longer needed.


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@502
PS1, Line 502:           IcebergPartitionSpec partSpec = ((FeIcebergTable)table_).getDefaultPartitionSpec();
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@504
PS1, Line 504:             throw new AnalysisException("PARTITION clause is only valid for INSERT into " +
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@514
PS1, Line 514: if (isStaticPartitioning) {
             :             throw new AnalysisException("Static partitioning is not supported for " +
             :                 "Iceberg tables.");
             :           }
> You can move this check inside the for loop above.
Done


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
File fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java@48
PS1, Line 48:     // colsByPos[i] refers to the ith column in the table. The first numClusteringCols are
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@74
PS1, Line 74: private final FeDb db_;
            :   private final org.apache.hadoop.hive.metastore.api.Table msTable_;
> CtasTargetTble already has members with the same name, abs as I see they ge
Thanks, removed them from this class.


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@222
PS1, Line 222:   public void addColumn(Column col) {
> Can't you use IcebergColumn type for the param? You could get rid of the DC
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 18:27:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................

IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

This patch adds support for CREATE TABLE AS SELECT statements
for Iceberg tables.

CTAS statements work like the following in Impala:

1. Analysis of the whole CTAS statement
2. Divide CTAS to CREATE stmt and INSERT stmt
3. Create temporary in-memory target table from the CREATE stmt
4. Analyse the INSERT statement by using the temporary target table
5. If everything is OK so far, create the target table
6. Execute the INSERT query

For Iceberg tables the non-trivial thing was to create the temporary
target table without actually creating it via Iceberg API. I've created
a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be
used with catalog V1 and V2 as well.

Testing
 * e2e CTAS tests in iceberg-ctas.test
 * SHOW CREATE TABLE stmts in show-create-table.test

Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Reviewed-on: http://gerrit.cloudera.org:8080/17130
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
18 files changed, 686 insertions(+), 46 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 3: Code-Review+1

Thanks for change, this patch LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:33:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8330/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 13:31:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8320/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 18:45:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 19:28:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 10:14:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 2:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@233
PS1, Line 233: Preconditions.checkState(createStmt_.getIcebergPartitionSpecs().size() == 1);
> Maybe we can add a comment here for this check.
Done


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@243
PS2, Line 243: transfromToTHdfsTable
> typo: transform
Done


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@262
PS2, Line 262:     THdfsTable hdfsTable = new THdfsTable(localFsTable.getHdfsBaseDir(),
             :         getColumnNames(), localFsTable.getNullPartitionKeyValue(),
             :         FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition, tPrototypePartition);
             :     return hdfsTable;
> Maybe we can return this new THdfsTable directly.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 13:11:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 13:45:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 2:

(3 comments)

Sorry for late reply. Thanks for new feature, only few nits.

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@233
PS1, Line 233: Preconditions.checkState(createStmt_.getIcebergPartitionSpecs().size() == 1);
Maybe we can add a comment here for this check.


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@243
PS2, Line 243: transfromToTHdfsTable
typo: transform


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@262
PS2, Line 262:     THdfsTable hdfsTable = new THdfsTable(localFsTable.getHdfsBaseDir(),
             :         getColumnNames(), localFsTable.getNullPartitionKeyValue(),
             :         FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition, tPrototypePartition);
             :     return hdfsTable;
Maybe we can return this new THdfsTable directly.



-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 06:39:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17130

to look at the new patch set (#2).

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................

IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

This patch adds support for CREATE TABLE AS SELECT statements
for Iceberg tables.

CTAS statements work like the following in Impala:

1. Analysis of the whole CTAS statement
2. Divide CTAS to CREATE stmt and INSERT stmt
3. Create temporary in-memory target table from the CREATE stmt
4. Analyse the INSERT statement by using the temporary target table
5. If everything is OK so far, create the target table
6. Execute the INSERT query

For Iceberg tables the non-trivial thing was to create the temporary
target table without actually creating it via Iceberg API. I've created
a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be
used with catalog V1 and V2 as well.

Testing
 * e2e CTAS tests in iceberg-ctas.test
 * SHOW CREATE TABLE stmts in show-create-table.test

Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
18 files changed, 685 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17130/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6960/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 13:45:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8253/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 26 Feb 2021 17:26:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@502
PS1, Line 502:           IcebergPartitionSpec partSpec = ((FeIcebergTable)table_).getDefaultPartitionSpec();
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@504
PS1, Line 504:             throw new AnalysisException("PARTITION clause is only valid for INSERT into " +
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 26 Feb 2021 17:08:17 +0000
Gerrit-HasComments: Yes