You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/12/09 14:37:02 UTC

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19334


Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................

IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Iceberg's planFiles() API is very expensive because it involves reading
the Avro manifest files. It's especially expensive on object stores,
though manifest caching can help here.

Currently we invoke this API two times during table loading (via
IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in
loadPartitionStats().

With this patch we invoke IcebergUtil.getIcebergFiles() once, then pass
the result object to loadAllPartition() and loadPartitionStats().

Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
2 files changed, 13 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19334/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1: Code-Review+2

Nice catch, LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Tue, 13 Dec 2022 09:39:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8893/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 14:38:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8894/


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Sat, 10 Dec 2022 02:23:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
lipenglin@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19334/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19334/1//COMMIT_MSG@9
PS1, Line 9: Iceberg's planFiles() API is very expensive
> I noticed that 'org.apache.impala.analysis.AlterTableSetTblProperties#icebe
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 16:09:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8895/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Sat, 10 Dec 2022 10:39:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12012/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 14:56:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8893/


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 19:47:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................

IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Iceberg's planFiles() API is very expensive because it involves reading
the Avro manifest files. It's especially expensive on object stores,
though manifest caching can help here.

Currently we invoke this API two times during table loading (via
IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in
loadPartitionStats().

With this patch we invoke IcebergUtil.getIcebergFiles() once, then pass
the result object to loadAllPartition() and loadPartitionStats().

Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Reviewed-on: http://gerrit.cloudera.org:8080/19334
Reviewed-by: <li...@sensorsdata.cn>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tamas Mate <tm...@apache.org>
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
2 files changed, 13 insertions(+), 9 deletions(-)

Approvals:
  lipenglin@sensorsdata.cn: Looks good to me, but someone else must approve
  Impala Public Jenkins: Verified
  Tamas Mate: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
lipenglin@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

Thanks for fixing this. LGTM!

http://gerrit.cloudera.org:8080/#/c/19334/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19334/1//COMMIT_MSG@9
PS1, Line 9: Iceberg's planFiles() API is very expensive
I noticed that 'org.apache.impala.analysis.AlterTableSetTblProperties#icebergTableFormatCheck' (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java#L184) will also call 'planFiles()', maybe we can use 'org.apache.impala.catalog.FeIcebergTable#getContentFileStore' instead of this method to obtain 'DataFile' obj. So, can I create another Jira to solve it?



-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 16:08:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8894/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 09 Dec 2022 21:15:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19334 )

Change subject: IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Gerrit-Change-Number: 19334
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Sat, 10 Dec 2022 15:40:43 +0000
Gerrit-HasComments: No