You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2022/08/01 09:45:13 UTC

[Impala-ASF-CR] IMPALA-10453: Support file pruning via runtime filters on Iceberg

Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18531 )

Change subject: IMPALA-10453: Support file pruning via runtime filters on Iceberg
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18531/10/be/src/exec/file-metadata-utils.h
File be/src/exec/file-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/18531/10/be/src/exec/file-metadata-utils.h@42
PS10, Line 42:   void Open(RuntimeState* state, const HdfsFileDesc* file_desc);
I think that the name "Open" is a bit misleading as this function can be called more than once in the lifetime of a  FileMetadataUtils (without calling reset or close).

I have 3 ideas to improve this:
1. add a comment about Open being called more than once
2. change the name to something like SetFile()
3. change the usage of this class to be be instantiated where we call Open() now, so that the constructor can get this params.


http://gerrit.cloudera.org:8080/#/c/18531/9/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18531/9/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@116
PS9, Line 116:     // Increase the per instance memory estimate with the tuples created for runtime file
             :     // filtering.
I am a bit confused about why do have to increase the mem estimate - do we need to hold the tuples for all files in memory at the same time?  Don't we have a tuple per-partition, not per-file? Also, this was added in a previous commit, right?


http://gerrit.cloudera.org:8080/#/c/18531/9/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@120
PS9, Line 120: avgRowSize_
Can you add a Precondition that this is not negative?

Note that we are overestimating the memory here, as avgRowSize also contains strings, and those won't be created for non-partition columns.


http://gerrit.cloudera.org:8080/#/c/18531/9/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@123
PS9, Line 123: setMinMemReservationBytes
Shouldn't we increase min reservation similarly to the estimate?



-- 
To view, visit http://gerrit.cloudera.org:8080/18531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a
Gerrit-Change-Number: 18531
Gerrit-PatchSet: 10
Gerrit-Owner: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Aug 2022 09:45:13 +0000
Gerrit-HasComments: Yes