You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tamas Mate (Code Review)" <ge...@cloudera.org> on 2022/01/19 13:53:46 UTC

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Tamas Mate has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18160


Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................

IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Impala uses the Iceberg API to load the DataFiles which is needed to
create the FileDescriptors. This operation is redundant, because the
IcebergTable's internal HdfsTable loads the FileDescriptors prior as
well.

This commit updates the loadAllPartition operation to re-use the
HdfsTable's FileDescriptors.

Testing:
 - Executed the Iceberg tests.

Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
3 files changed, 16 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/18160/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@529
PS2, Line 529: }
> We should raise an error/warning when we don't find a file. Or maybe we cou
Agree, this case should be handled, do you think a TableLoadingException would be reasonable as well? The query would be aborted and the table would be incomplete.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 10:09:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................

IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Impala used the FileSystem.getFileStatus() on every Iceberg DataFile to
create the FileDescriptors. This operation is redundant because there is
an internal HdfsTable inside the IcebergTable which loads the
FileDescriptors recursively as well earlier.

This commit updates the loadAllPartition operation to reuse the
HdfsTable's FileDescriptors.

Testing:
 - Executed the Iceberg tests.

Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Reviewed-on: http://gerrit.cloudera.org:8080/18160
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
1 file changed, 29 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 5
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Jan 2022 23:33:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@472
PS1, Line 472:     /**
             :      * Get FileDescriptor by data file location
             :      */
             :     public static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
             :         Path tableLoc, ListMap<TNetworkAddress> hostIndex) throws IOException {
             :       FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);
             :       FileStatus fileStatus = fs.getFileStatus(fileLoc);
             :       return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
             :     }
             : 
             :     private static HdfsPartition.FileDescriptor getFileDescriptor(FileSystem fs,
             :         Path tableLoc, FileStatus fileStatus, ListMap<TNetworkAddress> hostIndex)
             :         throws IOException {
             :       Reference<Long> numUnknownDiskIds = new Reference<Long>(Long.valueOf(0));
             :       String relPath = FileSystemUtil.relativizePath(fileStatus.getPath(), tableLoc);
             : 
             :       if (!FileSystemUtil.supportsStorageIds(fs)) {
             :         return HdfsPartition.FileDescriptor.createWithNoBlocks(fileStatus, relPath);
             :       }
             : 
             :       BlockLocation[] locations;
             :       if (fileStatus instanceof LocatedFileStatus) {
             :         locations = ((LocatedFileStatus)fileStatus).getBlockLocations();
             :       } else {
             :         locations = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
             :       }
             : 
             :       return HdfsPartition.FileDescriptor.create(fileStatus, relPath, locations,
             :           hostIndex, HdfsShim.isErasureCoded(fileStatus), numUnknownDiskIds);
             :     }
Becomes unused? Or maybe we can keep it as a fallback? E.g. generate a warning when we cannot find the file descriptor in the HDFS table, but load it anyway with these methods?


http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@516
PS1, Line 516: fileDescMap
We should only put entries to the fileDescMap if they are actually in the Iceberg data file list. E.g. if the table is compacted then the 'data' directory potentially contain much more files than the current snapshot.

Also, later we'll need to decorate the file descriptors with information coming from the Iceberg data files (e.g. which partition they belong, their file format, etc.).

So I think we should create a temporary map of the file descriptors in the HDFS table (only using paths under the 'data' directory) and use it as a lookup table to populate another map ('fileDescMap') that only contains the snapshot data files.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jan 2022 19:29:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7770/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 16:13:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@529
PS2, Line 529: }
> Agree, this case should be handled, do you think a TableLoadingException wo
At first we can try to load the file descriptor with getFileDescriptor(), like we did earlier. If that also fails we can throw an exception because the table cannot be loaded properly.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 10:52:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 22:46:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................

IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Impala uses the Iceberg API to load the DataFiles which is needed to
create the FileDescriptors. This operation is redundant, because the
IcebergTable's internal HdfsTable loads the FileDescriptors prior as
well.

This commit updates the loadAllPartition operation to re-use the
HdfsTable's FileDescriptors.

Testing:
 - Executed the Iceberg tests.

Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
1 file changed, 22 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/18160/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10046/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Jan 2022 16:47:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 3: Code-Review+2

Thanks for fixing this!


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 16:10:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 16:13:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@529
PS2, Line 529: }
> At first we can try to load the file descriptor with getFileDescriptor(), l
I see, I have updated the condition to handle missing files.

Checked the FileSystems.getFileStatus(), it throws FileNotFoundException in case the file is missing.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 15:20:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10017/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Jan 2022 14:17:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7764/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Jan 2022 17:03:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 4:

Thank you for the reviews!


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 16:13:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................

IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Impala used the FileSystem.getFileStatus() on every Iceberg DataFile to
create the FileDescriptors. This operation is redundant because there is
an internal HdfsTable inside the IcebergTable which loads the
FileDescriptors recursively as well earlier.

This commit updates the loadAllPartition operation to reuse the
HdfsTable's FileDescriptors.

Testing:
 - Executed the Iceberg tests.

Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
---
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
1 file changed, 29 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/18160/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10060/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jan 2022 15:45:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

The change LGTM, only had a minor comment.

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/2/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@529
PS2, Line 529: }
We should raise an error/warning when we don't find a file. Or maybe we could load the file descriptor with getFileStatus() in this case.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jan 2022 14:26:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18160 )

Change subject: IMPALA-11076: Re-use FDs loaded by HdfsTable during IcebergTable load
......................................................................


Patch Set 1:

(2 comments)

Hi Zoltank, thank you for the review, the CR has been updated.

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@472
PS1, Line 472:     /**
             :      * Get FileDescriptor by data file location
             :      */
             :     public static HdfsPartition.FileDescriptor getFileDescriptor(Path fileLoc,
             :         Path tableLoc, ListMap<TNetworkAddress> hostIndex) throws IOException {
             :       FileSystem fs = FileSystemUtil.getFileSystemForPath(tableLoc);
             :       FileStatus fileStatus = fs.getFileStatus(fileLoc);
             :       return getFileDescriptor(fs, tableLoc, fileStatus, hostIndex);
             :     }
             : 
             :     private static HdfsPartition.FileDescriptor getFileDescriptor(FileSystem fs,
             :         Path tableLoc, FileStatus fileStatus, ListMap<TNetworkAddress> hostIndex)
             :         throws IOException {
             :       Reference<Long> numUnknownDiskIds = new Reference<Long>(Long.valueOf(0));
             :       String relPath = FileSystemUtil.relativizePath(fileStatus.getPath(), tableLoc);
             : 
             :       if (!FileSystemUtil.supportsStorageIds(fs)) {
             :         return HdfsPartition.FileDescriptor.createWithNoBlocks(fileStatus, relPath);
             :       }
             : 
             :       BlockLocation[] locations;
             :       if (fileStatus instanceof LocatedFileStatus) {
             :         locations = ((LocatedFileStatus)fileStatus).getBlockLocations();
             :       } else {
             :         locations = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
             :       }
             : 
             :       return HdfsPartition.FileDescriptor.create(fileStatus, relPath, locations,
             :           hostIndex, HdfsShim.isErasureCoded(fileStatus), numUnknownDiskIds);
             :     }
> Becomes unused? Or maybe we can keep it as a fallback? E.g. generate a warn
These are used by IcebergScanNode.getFileDescriptorByIcebergPredicates() as well. Maybe I need to increase the scope.


http://gerrit.cloudera.org:8080/#/c/18160/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@516
PS1, Line 516: fileDescMap
> We should only put entries to the fileDescMap if they are actually in the I
Thanks for the pointers, this part has been updated.



-- 
To view, visit http://gerrit.cloudera.org:8080/18160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Gerrit-Change-Number: 18160
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Jan 2022 16:31:49 +0000
Gerrit-HasComments: Yes