You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zihao Ye (Code Review)" <ge...@cloudera.org> on 2023/05/06 08:28:07 UTC

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#9).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 703 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>