You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2020/12/10 23:29:34 UTC

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16856


Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................

IMPALA-10375: Lock down which filesystems use the file handle cache

The logic for determining whether to use the file handle cache
currently use the file handle cache for anything that is
expected to be local.

This adds defensive code to limit the file handle cache
to filesystems that are known to support it (currently
HDFS, S3A, ABFS). This prevents any weird behavior for
untested configurations (Alluxio, Ozone) that might
be considered local.

Testing:
 - Ran core job

Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
---
M be/src/runtime/io/disk-io-mgr-internal.h
M be/src/runtime/io/scan-range.cc
2 files changed, 15 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16856/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has abandoned this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc@200
PS1, Line 200:   if (is_file_handle_caching_enabled() && filesystem_supports_handle_caching(file()) &&
> Yeah, it seems like the bare minimum would be to detect the filesystem type
We might need to be a bit careful about passing from frontend when it comes to HDFS-15289 since it's *possible* there that the hdfs mount configurations could be different between coordinator and executor and therefore the fs implementation could be different.



-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 19:04:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6762/


-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 02:38:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7836/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Dec 2020 23:51:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1:

(1 comment)

I suggested a refactoring, it would make the patch a bit bigger but might pay off. LMK what you think.

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc@200
PS1, Line 200:   if (is_file_handle_caching_enabled() && filesystem_supports_handle_caching(file()) &&
I was originally thinking about this from the point of view of performance overhead and mostly convinced myself that the extra string parsing wouldn't add much.

But it occurs to me that the code would probably be simplified overall if we figured out the filesystem implementation once per scan range, i.e. at the same time as AssignDiskQueue() is called. We could convert it to an enum like FsType in the frontend these disk_id/filename checks in ScanRange would be a bit less ad-hoc.


I think that would make handling https://issues.apache.org/jira/browse/HDFS-15289 simpler too once we get to that.



-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 05:16:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc@200
PS1, Line 200:   if (is_file_handle_caching_enabled() && filesystem_supports_handle_caching(file()) &&
> I was originally thinking about this from the point of view of performance 
Yeah, it seems like the bare minimum would be to detect the filesystem type prior to AssignQueue() and pass it into that as an enum. Then AssignQueue() wouldn't need the filename, and the same enum can be passed into ScanRange::Reset() which can be used for this.

I will look into passing it from the Frontend, and at the least do the above.



-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 18:32:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10375: Lock down which filesystems use the file handle cache

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16856 )

Change subject: IMPALA-10375: Lock down which filesystems use the file handle cache
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc@200
PS1, Line 200:   if (is_file_handle_caching_enabled() && filesystem_supports_handle_caching(file()) &&
> We might need to be a bit careful about passing from frontend when it comes
Now more places (e.g. spilling to remote fs like s3) depend on the path prefix matching. So after we get HDFS-15289, we should change these as well:
https://github.com/apache/impala/blob/1dcd59638c340e2e1d167f7e8b21019e897893a9/be/src/util/hdfs-util.cc#L95-L118


http://gerrit.cloudera.org:8080/#/c/16856/1/be/src/runtime/io/scan-range.cc@201
PS1, Line 201: expected_local_
Can we simply change this to this?

 (expected_local_ && disk_id_ < io_mgr_->num_local_disks())



-- 
To view, visit http://gerrit.cloudera.org:8080/16856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136c3da9d19590cdbe8623d22480b8dd07192ce3
Gerrit-Change-Number: 16856
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 01:43:54 +0000
Gerrit-HasComments: Yes