You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/09/16 13:51:56 UTC

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18995


Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

Testing:
 * added e2e tests for COMPUTE STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 101 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:28:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:30:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8597/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:30:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#2).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

Testing:
 * added e2e tests for COMPUTE STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 154 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8595/


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 14:19:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11390/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 15:20:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 09:17:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#3).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

Testing:
 * added e2e tests for COMPUTE STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 153 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@86
PS4, Line 86: show column stats ice_alltypes;
            : ---- RESULTS
> Users can set numRows for non-HMS integrated Iceberg tables.
Probably this would also worth a follow up jira - so problematic properties could be collected for different kind of HMS tables. numRows is very important property, and probably many ETL workloads set it manually if they know the number of rows and want to avoid running compute stats.



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 07:08:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11406/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 12:57:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#8).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

COMPUTE INCREMENTAL STATS without partition clause is the same as
plain COMPUTE STATS for Iceberg tables. This behavior is aligned
with current behavior on non-partitioned tables:
https://impala.apache.org/docs/build/html/topics/impala_compute_stats.html

COMPUTE INCREMENTAL STATS .. PARTITION raises an error.

DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.

Testing:
 * added e2e tests for COMPUTE STATS
 * added e2e tests for DROP STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
4 files changed, 448 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 8:

Fixed bug found by GVO: in old catalog mode COMPUTE INCREMENTAL STATS hit an IllegalArgumentException. Extended commit message about COMPUTE INCREMENTAL STATS as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 12:38:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11379/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 14:12:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

COMPUTE INCREMENTAL STATS without partition clause is the same as
plain COMPUTE STATS for Iceberg tables. This behavior is aligned
with current behavior on non-partitioned tables:
https://impala.apache.org/docs/build/html/topics/impala_compute_stats.html

COMPUTE INCREMENTAL STATS .. PARTITION raises an error.

DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.

Testing:
 * added e2e tests for COMPUTE STATS
 * added e2e tests for DROP STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Reviewed-on: http://gerrit.cloudera.org:8080/18995
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
4 files changed, 448 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11378/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 14:09:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11377/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 14:08:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1698
PS1, Line 1698:     Preconditions.checkState(msTbl.getParameters().containsKey(COMPUTE_STATS_TIME));
Are you sure that this is always set? Is it possible for the condition at line 1658 to be false in case of Iceberg tables? I forgot how this worked exacly, but I think that it is false if you manually change columns stats.


http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1707
PS1, Line 1707:     IcebergCatalogOpExecutor.unsetTblProperties(iceTxn, Lists.newArrayList(
              :         StatsSetupConst.COLUMN_STATS_ACCURATE));
Do you know whether COLUMN_STATS_ACCURATE is used with Iceberg tables? It may make sense to discuss this with Hive people - AFAIK the only stat where this matters in numRows, and numRows will stay accurate. So we do not change stats stored as table properties, while column stats are stored for Hive and Impala independently - this means that we cannot mess up the stats of Hive.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@33
PS1, Line 33: compute stats ice_alltypes;
What happens if we run COMPUTE INCREMENTAL STATS for Iceberg? It shouldn't be supported, but I don't know whether we have tests for that.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@57
PS1, Line 57: ====
Can you also add a test for setting stats manually? See https://docs.cloudera.com/best-practices/latest/impala-performance/topics/bp-impala-manually-setting-table-and-partition-statistics.html


http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py@795
PS1, Line 795:                        unique_database)
nit: would fit to one line



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Sep 2022 16:37:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 1:

(5 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1698
PS1, Line 1698:     Preconditions.checkState(msTbl.getParameters().containsKey(COMPUTE_STATS_TIME));
> Are you sure that this is always set? Is it possible for the condition at l
Looking at catalog-op-executor.cc I think this should be always set, but removed from the precondition check anyway.

Manually setting stats doesn't go through this code.


http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1707
PS1, Line 1707:     IcebergCatalogOpExecutor.unsetTblProperties(iceTxn, Lists.newArrayList(
              :         StatsSetupConst.COLUMN_STATS_ACCURATE));
> Do you know whether COLUMN_STATS_ACCURATE is used with Iceberg tables? It m
I didn't want to change behavior in this CR. But I agree that we could remove this, as table-level stats are not modified and column stats are separate.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@33
PS1, Line 33: compute stats ice_alltypes;
> What happens if we run COMPUTE INCREMENTAL STATS for Iceberg? It shouldn't 
It's supported, works similarly to non-partitioned tables.
Though the output can be confusing, extended IMPALA-11538 with COMPUTE STATS.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@57
PS1, Line 57: ====
> Can you also add a test for setting stats manually? See https://docs.cloude
Sure, done.


http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py@795
PS1, Line 795:                        unique_database)
> nit: would fit to one line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 13:51:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#4).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

Testing:
 * added e2e tests for COMPUTE STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 152 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 6:

(3 comments)

Thanks for the comments.

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@77
PS4, Line 77: '','numRows             ','2                   '
> I am not sure which operation sets this property - can you unset it before 
Added UNSET.
Also opened IMPALA-11597.


http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@86
PS4, Line 86: show column stats ice_alltypes;
            : ---- RESULTS
> Shouldn't we deny this operation, or at least return a warning?
Users can set numRows for non-HMS integrated Iceberg tables.

I could also add this check probably to https://github.com/apache/impala/blob/cff286e7512e9d1e2ff2b4ea033d3e575f54b353/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java#L153

But there is no complete list of Iceberg-managed table properties, so we will run into this repeatedly.


http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@111
PS4, Line 111: 'p_d
> Can you also add a drop stats statement?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 15:58:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8595/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 09:17:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 18:33:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18995/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18995/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4498
PS7, Line 4498:     try {
              :       tryWriteLock(table, reason);
              :       long newCatalogVersion = catalog_.incrementAndGetCatalogVersion();
              :       catalog_.getLock().writeLock().unlock();
              :      
This doesn't look good, the code was recently added in https://gerrit.cloudera.org/#/c/19020/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:27:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#6).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.

Testing:
 * added e2e tests for COMPUTE STATS
 * added e2e tests for DROP STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 413 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11391/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 15:30:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gergely Fürnstáhl, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18995

to look at the new patch set (#5).

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................

IMPALA-11583: Use Iceberg API to update stats

Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.

In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.

DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.

Testing:
 * added e2e tests for COMPUTE STATS
 * added e2e tests for DROP STATS
 * manually tested concurrent Hive INSERT and Impala COMPUTE STATS
   using latest Hive
 * opened IMPALA-11590 to add automated interop tests with Hive

Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
M tests/query_test/test_iceberg.py
3 files changed, 397 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/18995/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@77
PS4, Line 77: row_regex:'','impala.lastComputeStatsTime','\d+\s+'
I am not sure which operation sets this property - can you unset it before executing compute incremental stat?


http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@86
PS4, Line 86: ALTER TABLE ice_alltypes
            : SET TBLPROPERTIES('numRows'='1000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
Shouldn't we deny this operation, or at least return a warning?


http://gerrit.cloudera.org:8080/#/c/18995/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@111
PS4, Line 111: ====
Can you also add a drop stats statement?



-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 15:43:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11583: Use Iceberg API to update stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11369/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Sep 2022 14:12:58 +0000
Gerrit-HasComments: No