You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/03/19 17:59:29 UTC

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17207


Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................

IMPALA-10597: Enable setting Iceberg table properties

Currently we prohibit setting the following properties:

* iceberg.catalog
* iceberg.catalog_location
* iceberg.file_format
* icceberg.table_identifier

Impala needs these properties to be able to correctly use the table.
However, if the table was created by an other engine, e.g. by Hive,
then the table won't have these properties.

Therefore to make such tables usable, this patch allows setting the
above properties.

Testing:
 * added e2e test

Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
3 files changed, 16 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/17207/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 1:

Hi Zoltan, this patch is ok, but I have a question:
You mentioned that: other engines create Iceberg table may not contains these properties, so we need to allow setting these properties to make table usable.
I understand that if we need to set table property by impala, catalog need to load table first. If Iceberg table do not contain these properties, does catalogd can load this kind of table normally? Maybe throw TableLoadingException in some situation?


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 01:59:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17207

to look at the new patch set (#2).

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................

IMPALA-10597: Enable setting Iceberg table properties

Currently we prohibit setting the following properties:

* iceberg.catalog
* iceberg.catalog_location
* iceberg.file_format
* iceberg.table_identifier

This patch enables setting 'iceberg.file_format', therefore if
a table was created by another engine, but using HiveCatalog,
we'll be able to set the data file format to the proper value
and make the table readable by Impala. Setting the other
properties are not needed for HiveCatalog tables.

If the table wasn't created by HiveCatalog, then we cannot load the
table, therefore we cannot invoke any ALTER TABLE statement at all.
In that case we need to create an external table.

Testing:
 * added e2e test

Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
3 files changed, 16 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/17207/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17207/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17207/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-10597: Enable setting Iceberg table properties
> I found the commit msg a bit misleading as I had the impression that more t
Done


http://gerrit.cloudera.org:8080/#/c/17207/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test:

http://gerrit.cloudera.org:8080/#/c/17207/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test@32
PS2, Line 32: ALTER TABLE iceberg_hadoop_tables set TBLPROPERTIES('iceberg.file_format'='orc');
> As I learnt, if there is already some data being written into the table e.g
Added additional checks to AlterTableSetTblProperties.



-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 25 Mar 2021 14:41:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@171
PS3, Line 171:     Preconditions.checkState(fileformat != null);
> Could you please add a test for this as well?
Done


http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@182
PS3, Line 182:     "file %s with file format %s.";
             :       if (!firstFile.format().name().equalsIgnoreCase(fileformat)) {
             :         throw new AnalysisException(String.format(errorMsg, fileformat, firstFile.path(),
             :        
> Is this part needed? The for loop below would cover the same, right?
Here we check if the first file's format equals to the parameter 'fileformat'.

The below for loop checks if every file format in 'dataFiles' are the same. If not, it prints out the first file format that differs from the others, and therefore differs from the parameter 'fileformat' as well.

If we remove this 'if', then the user could set the file format to PARQUET even if all the data files are ORC.



-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 11:30:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17207

to look at the new patch set (#3).

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................

IMPALA-10597: Enable setting 'iceberg.file_format'

Currently we prohibit setting the following properties:

* iceberg.catalog
* iceberg.catalog_location
* iceberg.file_format
* iceberg.table_identifier

This patch enables setting 'iceberg.file_format', therefore if
a table was created by another engine, but using HiveCatalog,
we'll be able to set the data file format to the proper value
and make the table readable by Impala. Setting the other
properties are not needed for HiveCatalog tables.

If the table wasn't created by HiveCatalog, then we cannot load the
table, therefore we cannot invoke any ALTER TABLE statement at all.
In that case we need to create an external table.

If the table already contains data files, then Impala checks if
all of them have the proper file format. If not, the ALTER TABLE
statement fails.

Before this patch a CREATE TABLE statement accepted any string
for 'iceberg.file_format', and in case of invalid file formats the
frontend silently used Parquet. This patch also adds a check to only
allow valid file formats.

Testing:
 * added e2e test

Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
5 files changed, 84 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/17207/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8444/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 25 Mar 2021 14:59:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7018/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 12:36:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 1:

Hi Zoltan, thanks for explain. But I still have some questions. You said that this patch is mainly for Iceberg tables with HiveCatalog which have proper Input/Output format, SerDe. And Impala can load these tables normally even without 'iceberg.catalog'(because HiveCatalog is default catalog type?). If so, I think we only need to enable setting 'iceberg.file_format', because 'iceberg.catalog_location' and 'iceberg.table_identifier' are related to HadoopCatalog instead of HiveCatalog, as for 'iceberg.catalog', we do not need to setting it at all. I not sure if my understanding correct?


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 12:22:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 2:

Yeah, I was allowing setting these properties just in case users might need it for some reason.

But I agree that the other properties are not needed for HiveCatalog tables. Also, if there's a table using another catalog type, then Impala is not able to load the table, there we can't run any ALTER TABLE statements.

So yeah, it's probably better to make them "constant" for the life cycle of the table.

In the long-term I think we'll probably want to standardise the table properties across the engines anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 24 Mar 2021 11:07:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

I only had a commit msg related comment, otherwise the patch seems fine.

http://gerrit.cloudera.org:8080/#/c/17207/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17207/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-10597: Enable setting Iceberg table properties
I found the commit msg a bit misleading as I had the impression that more than a single table property is enableb with this commit. As long as it's only one I think you can instead name that particular one in the msg.



-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 25 Mar 2021 12:16:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................

IMPALA-10597: Enable setting 'iceberg.file_format'

Currently we prohibit setting the following properties:

* iceberg.catalog
* iceberg.catalog_location
* iceberg.file_format
* iceberg.table_identifier

This patch enables setting 'iceberg.file_format', therefore if
a table was created by another engine, but using HiveCatalog,
we'll be able to set the data file format to the proper value
and make the table readable by Impala. Setting the other
properties are not needed for HiveCatalog tables.

If the table wasn't created by HiveCatalog, then we cannot load the
table, therefore we cannot invoke any ALTER TABLE statement at all.
In that case we need to create an external table.

If the table already contains data files, then Impala checks if
all of them have the proper file format. If not, the ALTER TABLE
statement fails.

Before this patch a CREATE TABLE statement accepted any string
for 'iceberg.file_format', and in case of invalid file formats the
frontend silently used Parquet. This patch also adds a check to only
allow valid file formats.

Testing:
 * added e2e test

Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Reviewed-on: http://gerrit.cloudera.org:8080/17207
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
5 files changed, 92 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, wangsheng, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17207

to look at the new patch set (#4).

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................

IMPALA-10597: Enable setting 'iceberg.file_format'

Currently we prohibit setting the following properties:

* iceberg.catalog
* iceberg.catalog_location
* iceberg.file_format
* iceberg.table_identifier

This patch enables setting 'iceberg.file_format', therefore if
a table was created by another engine, but using HiveCatalog,
we'll be able to set the data file format to the proper value
and make the table readable by Impala. Setting the other
properties are not needed for HiveCatalog tables.

If the table wasn't created by HiveCatalog, then we cannot load the
table, therefore we cannot invoke any ALTER TABLE statement at all.
In that case we need to create an external table.

If the table already contains data files, then Impala checks if
all of them have the proper file format. If not, the ALTER TABLE
statement fails.

Before this patch a CREATE TABLE statement accepted any string
for 'iceberg.file_format', and in case of invalid file formats the
frontend silently used Parquet. This patch also adds a check to only
allow valid file formats.

Testing:
 * added e2e test

Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
5 files changed, 92 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/17207/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 18:32:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 12:36:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@171
PS3, Line 171:     if (fileformat == null || IcebergUtil.getIcebergFileFormat(fileformat) == null) {
Could you please add a test for this as well?


http://gerrit.cloudera.org:8080/#/c/17207/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@182
PS3, Line 182: if (!firstFile.format().name().equalsIgnoreCase(fileformat)) {
             :         throw new AnalysisException(String.format(errorMsg, fileformat, firstFile.path(),
             :         firstFile.format().name()));
             :       }
Is this part needed? The for loop below would cover the same, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 09:02:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 1:

Thanks WangSheng for your comment.

It's mainly for tables that are created via HiveCatalog. These tables will have the proper Input/Output format, SerDe, etc., but don't have 'iceberg.catalog'. But in this case Impala correctly assume that HiveCatalog was used, and will be able to load the table. After that the user can set e.g. 'iceberg.file_format'='orc'.

Tables created by HadoopTables or HadoopCatalog are not necesserily have an HMS table associated with, so in those cases Impala users can just create an external table.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 10:24:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 11:41:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 2: -Code-Review

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17207/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test:

http://gerrit.cloudera.org:8080/#/c/17207/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test@32
PS2, Line 32: ALTER TABLE iceberg_hadoop_tables set TBLPROPERTIES('iceberg.file_format'='orc');
As I learnt, if there is already some data being written into the table e.g. in Parquet format, it won't be readable after changing the file format.
I think we should consider writing a warning to the user that in case they have data they won't be able to read it. More sophisticated way would be to write this warning only if the table contains data when changing the file format.



-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 25 Mar 2021 12:46:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 2: Code-Review+1

Thanks for this new feature, LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 25 Mar 2021 06:29:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting 'iceberg.file format'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting 'iceberg.file_format'
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8456/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 29 Mar 2021 11:50:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8430/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 24 Mar 2021 11:19:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10597: Enable setting Iceberg table properties

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17207 )

Change subject: IMPALA-10597: Enable setting Iceberg table properties
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8404/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
Gerrit-Change-Number: 17207
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 18:19:48 +0000
Gerrit-HasComments: No