You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org> on 2022/03/01 10:32:55 UTC

[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

Gergely Fürnstáhl has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
......................................................................

IMPALA-9433: Improved caching of HdfsFileHandles

Seperated LRU caching functionality to a templated LruMultiCache class.

Replaced std::multimap with std::unordered_map with std::list for O(1)
lookups and less memory overhead, as it stores each key one time. Added
boost::intrusive::list to handle LRU relations with less overhead.
Added O(1) release method, instead of O(n) with minimal memory overhead.
Implemented RAII Accessor to remove the responsibility of releasing
the objects from the user.

Wrapped cache accessor and related DiskIOManager metrics to a
FileHandleCache::Accessor. Removed Release*() call trees from
FileHandleCache and DiskIOManager, removed scoped exit from
HdfsFileReader as they are handled automatically.

Testing:

Implemented extensive unit testing of the class, including forced
rehashes, collisions, capacity overshoot, explicit/automatic release
and destroy.

Ran tests/custom_cluster/test_hdfs_fd_caching.py to verify
FileHandleCache::Accessor behaviour through metrics.

Ran bin/single_node_perf_run.py with TPCH and TPC-DS on parquet tables,
no visible change in performance:
TPCH   scale=10 iterations=100: Delta(Avg)=-0.67% Delta(GeoMean)=-0.49%
TPC-DS scale=10 iterations= 50: Delta(Avg)=-0.02% Delta(GeoMean)= 0.00%

Tested some manual queries on functional_parquet.widetable_1000_cols
with 64 threads but did not notice significant changes in scan times.

Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/CMakeLists.txt
A be/src/util/lru-multi-cache-test.cc
A be/src/util/lru-multi-cache.h
A be/src/util/lru-multi-cache.inline.h
9 files changed, 1,175 insertions(+), 274 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/18191/27
-- 
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 27
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>