You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2020/11/13 17:09:22 UTC

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16721


Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2d tests for creating, writing, and altering Iceberg
   tables

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
13 files changed, 408 insertions(+), 62 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#5).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 568 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7688/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:58:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#4).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
14 files changed, 524 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7695/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:05:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 6:

(2 comments)

Thanks for a quick turnaround, just two nits.

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84
PS6, Line 84:   // Internal Iceberg table property that specifies the absolute path of the current
            :   // table metadata.
We may add some explain here: this propery is only valid for 'hive.catalog' or 'HiveCatalog'


http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127
PS6, Line 127: CREATE TABLE iceberg_hive_cat_with_metadata_locaction
Shall we add a test for HadoopCatalog here?



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:22:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7:

Thanks for the review! After +2 we can run the verify job with DRY_RUN=false, so on success the job submits the patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:18:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Reviewed-on: http://gerrit.cloudera.org:8080/16721
Reviewed-by: wangsheng <sk...@163.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 577 insertions(+), 92 deletions(-)

Approvals:
  wangsheng: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7671/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 10:40:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7648/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 17:30:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2012
PS2, Line 2012: existingTbl instanceof IncompleteTable
This will "double drop" Kudu tables where  existingTbl instanceof IncompleteTable, but msTbl table could be retrieved and it indicates a synchronyzed Kudu table, as we dropped them in line 1998. My guess is that this will result in an exception from HMS dropTable, leading to keeping the table in catalogd.


http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2015
PS2, Line 2015: needsHmsAlterTable
it calls dropTable, so needsHmsDropTable would clearer



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 16:48:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#7).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 577 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7:

(2 comments)

Thanks for the quick review!

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84
PS6, Line 84:   // Internal Iceberg table property that specifies the absolute path of the current
            :   // table metadata.
> We may add some explain here: this propery is only valid for 'hive.catalog'
Done


http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127
PS6, Line 127: CREATE TABLE iceberg_hadoop_cat_with_metadata_locacti
> Shall we add a test for HadoopCatalog here?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 13:18:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7: Code-Review+2

Thanks for this new feature, Zoltan, LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG@29
PS1, Line 29: e2d
e2e


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS1, Line 65:     // We pass null as 'location' to let Iceberg decide the table location.
remove comment


http://gerrit.cloudera.org:8080/#/c/16721/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test:

http://gerrit.cloudera.org:8080/#/c/16721/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test@268
PS1, Line 268: ====
add test with custom table location



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 17:18:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6653/


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 22:59:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 6:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419
PS4, Line 419: cebergTable.METAD
> This table property is generated by HiveCatalog, I'm curious about:
Good point! 'metadata_location' is internal to Iceberg so we shouldn't allow users modifying it.

Updated the code accordingly.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS4, Line 65:     return hiveCatalog_.createTable(identifier, schema, spec, location, properties);
            :   }
> nits: One line is ok, unnecessary for two lines.
Done


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242
PS4, Line 242:    * Get TIcebergFileFormat from a string, usually from table properties.
             :    * 
> Maybe we need add a comment here, since we change the default value to 'PAR
Updated the code a bit. Now this method is returning PARQUET when 'format' is null. Format can be null when the table was created by other engines. And returning null when the format string is invalid.



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 09:45:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 4:

PS4 is only a rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:42:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419
PS4, Line 419: metadata_location
This table property is generated by HiveCatalog, I'm curious about:
1. Can we set this table property when creating table? If not, I think we need to add some check in code;
2. Can we alter table to set this table property? If not, we also need to add some check in code;
3. Maybe we should define a static variable in IcebergTable.java, just like ICEBERG_CATALOG, and we can use a reference here.

As far as I know, 'metadata_location' is refer to a metadata file's absolute path, and maybe we cannot modify this property manually.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS4, Line 65:     return hiveCatalog_.createTable(identifier, schema, spec, location,
            :         properties);
nits: One line is ok, unnecessary for two lines.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242
PS4, Line 242:    * Get TIcebergFileFormat from a string, usually from table properties
             :    */
Maybe we need add a comment here, since we change the default value to 'PARQUET' to replace 'null'



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 07:59:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2012
PS2, Line 2012: ncompleteTable && isSynchronizedIceber
> We still need to invoke HMS dropTable for synchronized tables that don't ha
I am also unsure about this scenario, but I preferred not to change the original handling.



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 15:10:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6686/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG@29
PS1, Line 29: e2e
> e2e
Done


http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2012
PS2, Line 2012: 
> This will "double drop" Kudu tables where  existingTbl instanceof Incomplet
We still need to invoke HMS dropTable for synchronized tables that don't have HMS integration enabled.

So the "double drop" can only happen when

 existingTbl instanceof IncompleteTable &&
 msTbl table could be retrieved &&
 isHmsIntegrationAutomatic(msTbl)

I'm not sure if we can hit such scenario with normal usage, but anyway I restricted this condition to Iceberg tables.


http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2015
PS2, Line 2015: !isHmsIntegrationA
> it calls dropTable, so needsHmsDropTable would clearer
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:29:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#2).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2d tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
14 files changed, 523 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6665/


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 15:57:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 20:39:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7698/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 13:40:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 2:

(16 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG@11
PS1, Line 11: when the table data is stored in object stores like S3.
> Just curios - is this related to eventual consistency? If yes, then I think
Iceberg requires that the underlying filesystem supports atomic renames. I'm not sure if S3Guard solves that.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@606
PS1, Line 606:     TIcebergCatalog catalog;
> Can you add a comment about HIVE_CATALOG being the default here?
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS1, Line 65:     return hiveCatalog_.createTable(identifier, schema, spec, location,
> remove comment
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@73
PS1, Line 73:     TableIdentifier tableId = IcebergUtil.getIcebergTableIdentifier(feTable);
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@81
PS1, Line 81:     try {
> Can we check for tableLocation==null too?
No, in Iceberg util we pass both tableId and tableLocation to make the code simpler.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@83
PS1, Line 83:     } catch (Exception e) {
> I am not 100% sure, but I think it would be better to catch all exceptions 
I wrapped them into TableLoadingException.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@93
PS1, Line 93:     TableIdentifier tableId = IcebergUtil.getIcebergTableIdentifier(feTable);
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1870
PS1, Line 1870: Iceberg'
> Iceberg
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1960
PS1, Line 1960: 
> now this is needed for Iceberg tables too
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1976
PS1, Line 1976:   throw new CatalogException(errorMsg);
              :       }
              : 
              :       // Retrieve the HMS table to determine if this is a Kudu or Iceberg table.
              :       org.apache.hadoop.hive.metastore.api.Table msTbl = existingTbl.getMetaStoreTable();
              :       if (msTbl == null) {
              :         Preconditions.checkState(existingTbl instanceof IncompleteTable);
              :         Stopwatch hmsLoadSW = Stopwatch.createStarted();
              :         long hmsLoadTime;
> These codes seems similar, can we extract to a method?
I don't think I can do that without some additional refactorings. If I had moved isSynchronizedTable() from KuduTable and IcebergTable to Table, I would still need to branch based on 'isKuduTable()/isIcebergTable()' because KuduCatalogOpexecutor and IcebergCatalogOpExecutor doesn't have a common base class. I don't want to do too much refactorings in the context of this patch, so I might just leave it as it is.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1985
PS1, Line 1985: y (MetaStoreClient msClient = catalog_.g
> This case (existingTbl instanceof IncompleteTable && isSynchronizedIcebergT
Synchronized table doesn't mean that the table is stored in HiveCatalog. It means that the 'external.table.purge' property is true. But the Iceberg table might be stored in HadoopTables or HadoopCatalog.

An Iceberg table is incomplete if we couldn't load it via the Iceberg API, therefore we cannot execute Iceberg DROP TABLE.

existingTbl instanceof IncompleteTable && isSynchronizedIcebergTable == true is quite of an edge case, but it can happen when the underlying directory is deleted outside of Impala. We have this test for this case: https://github.com/apache/impala/blob/master/tests/query_test/test_iceberg.py#L45

If the table is synchronized and is stored in HiveCatalog, and gets deleted outside of Impala, then I think Impala just needs to refresh its metadata cache. Anyway, now I'm also dropping incomplete tables via HMS.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1985
PS1, Line 1985:   try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
              :           msTbl = msClient.getHiveClie
> One line is ok, unnecessary to two lines.
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1991
PS1, Line 1991:           hmsLoadTime = hmsLoadSW.elapsed(TimeUnit.NANOSECONDS);
              :         }
              :         existingTbl.updateHMSLoadTableSchemaTime(hmsLoadTime);
              :       }
              :       boolean isSynchronizedKuduTable = msTbl != null &&
              :               KuduTable.isKuduTable(msTbl) && KuduTable.isSynchronizedTable(msTbl);
              :       if (isSynchronizedKuduTable) {
              :         KuduCatalogOpExecutor.dropTable(msTbl, /* if exists */ true);
              :       }
              : 
              :       boolean isSynchronizedIcebergTable = msTbl != null &&
              :           IcebergTable.isIcebergTable(msTbl) &&
              :           IcebergTable.isSynchronizedTable(msTbl);
              :       if (!(existingTbl instanceof IncompleteTable) && isSynchronizedIcebergTable) {
              :         Preconditions.checkState(existingTbl instanceof IcebergTable);
              :        
> nice to have: IMO it would be better to move this before Kudu Iceberg handl
Done


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2011
PS1, Line 2011:       boolean isSynchronizedTable = isSynchronizedKuduTable || isSynchronizedIcebergTable;
> I think that something like the old implementation of alterTable would be c
I introduced a new variable to make the code more verbose.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3343
PS1, Line 3343: e tableName = oldT
> I think the old bool is clearer (maybe a name like needsHmsAlterTable would
Introduced a new variable with that name.


http://gerrit.cloudera.org:8080/#/c/16721/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test:

http://gerrit.cloudera.org:8080/#/c/16721/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test@268
PS1, Line 268: ====
> add test with custom table location
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 10:24:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#6).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 568 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6665/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 10:32:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7687/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:48:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#3).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
14 files changed, 524 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG@11
PS1, Line 11: when the table data is stored in object stores like S3.
Just curios - is this related to eventual consistency? If yes, then I think that we should mention it, as S3Guard should make S3 usable even with other catalogs.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@606
PS1, Line 606:     if (catalogStr == null || catalogStr.isEmpty()) {
Can you add a comment about HIVE_CATALOG being the default here?


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@73
PS1, Line 73:       feTable.getIcebergCatalog() == TIcebergCatalog.HIVE_CATALOG);
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@81
PS1, Line 81:     Preconditions.checkState(tableId != null);
Can we check for tableLocation==null too?


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@83
PS1, Line 83:       return hiveCatalog_.loadTable(tableId);
I am not 100% sure, but I think it would be better to catch all exceptions from API calls and wrap them in ImpalaRuntimeException.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@93
PS1, Line 93:       feTable.getIcebergCatalog() == TIcebergCatalog.HIVE_CATALOG);
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1960
PS1, Line 1960: Kudu
now this is needed for Iceberg tables too


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1985
PS1, Line 1985: !(existingTbl instanceof IncompleteTable
This case (existingTbl instanceof IncompleteTable && isSynchronizedIcebergTable == true) is not clear to me - is it possible for an Iceberg table to be IncompleteTable at this point and what do we want to do in this case?

My understanding is that we allow IncompleteTables to be able to cleanup up "junk" tables that fail to load, e.g. Kudu tables that were deleted from Kudu but still exist in Impala and HMS (if it wouldn't exist in HMS, then line 1967 wouldn't be able to retrieve msTbl). This is the point where we explicitly allow this: https://github.com/apache/impala/blob/6360657cb4d3b7655d9ff80958b2694ae4609370/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java#L128

If I understand things correctly, your implementation will not drop the table via Iceberg API, and also not drop it via HMS API, but it will be removed from Impala catalog at line 2027, causing HMS and catalogd to be out of sync.

This is not a huge issue, as an Iceberg table failing to load means that something is already not ok, but its handling could be easily improved:
a. Throw an exception if isSynchronizedIcebergTable is true based on msTbl, but it is an IncompleteTable. This would notify the user that something is wrong and keep HMS and catalogd in sync.
b. Try to drop the table via Iceberg API even if it is an IncompleteTable - this would need changes in Iceberg dropTable functions to be able to work with a single HMS table argument.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1991
PS1, Line 1991:       // Check to make sure we don't drop a view with "drop table" statement and
              :       // vice versa. is_table field is marked optional in TDropTableOrViewParams to
              :       // maintain catalog api compatibility.
              :       // TODO: Remove params.isSetIs_table() check once catalog api compatibility is
              :       // fixed.
              :       if (params.isSetIs_table() && ((params.is_table && existingTbl instanceof View)
              :           || (!params.is_table && !(existingTbl instanceof View)))) {
              :         String errorMsg = "DROP " + (params.is_table ? "TABLE " : "VIEW ") +
              :             "not allowed on a " + (params.is_table ? "view: " : "table: ") + tableName;
              :         if (params.if_exists) {
              :           addSummary(resp, "Drop " + (params.is_table ? "table " : "view ") +
              :               "is not allowed on a " + (params.is_table ? "view." : "table."));
              :           return;
              :         }
              :         throw new CatalogException(errorMsg);
              :       }
nice to have: IMO it would be better to move this before Kudu Iceberg handling to keep the API calls close to each other


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2011
PS1, Line 2011:       if (!isSynchronizedTable || !isHmsIntegrationAutomatic(msTbl)) {
I think that something like the old implementation of alterTable would be clearer, e.g. bool needsHmsAlterTable = true, which would be turned false if another API does the job.


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3343
PS1, Line 3343: integratedHmsTable
I think the old bool is clearer (maybe a name like needsHmsAlterTable would be even better)



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 16 Nov 2020 14:51:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6653/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 17:36:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7694/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:04:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
......................................................................


Patch Set 1:

(3 comments)

Thanks for new feature, Zoltan. Just some nits. I will take a deeper look tomorrow.

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1870
PS1, Line 1870: Icebergs
Iceberg


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1976
PS1, Line 1976: boolean isSynchronizedKuduTable = msTbl != null &&
              :               KuduTable.isKuduTable(msTbl) && KuduTable.isSynchronizedTable(msTbl);
              :       if (isSynchronizedKuduTable) {
              :         KuduCatalogOpExecutor.dropTable(msTbl, /* if exists */ true);
              :       }
              : 
              :       boolean isSynchronizedIcebergTable = msTbl != null &&
              :           IcebergTable.isIcebergTable(msTbl) &&
              :           IcebergTable.isSynchronizedTable(msTbl);
These codes seems similar, can we extract to a method?


http://gerrit.cloudera.org:8080/#/c/16721/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1985
PS1, Line 1985: if (!(existingTbl instanceof IncompleteTable) &&
              :           isSynchronizedIcebergTable) 
One line is ok, unnecessary to two lines.



-- 
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 16 Nov 2020 09:36:08 +0000
Gerrit-HasComments: Yes