You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2023/02/24 08:08:35 UTC

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

18770832838@163.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19532


Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 651 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <18...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12431/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:53:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12741/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Apr 2023 09:53:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
18770832838@163.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 4:

(4 comments)

Thanks for your comment! Actually I have considered and tested what if the parameter changes before loading, and here are some explanations:

1. If data_cache changes, that may include add/remove partitions and increase/reduce  partition capacity, since dump & load are granular to partitions level and each partition has an independent dump file located in the partition directory, so add/remove partitions has little effect, just adds new empty partitions or remove some partition with data. Increase partition capacity is no problem, that allows fully load all the dumped data and cache more later. If reduce partition capacity, then it will trigger cache entries evictions during loaded entries insertion process so that after loading the cache size has be reduced to less than the new capacity,and extra cache entries evicted. 

2. If data_cache_checksum changes, there will be no effect from enabling to disabling, but from disabling to enabling will gradually cause all the data that has been loaded from dump to become invalid and evicted, because they cannot pass the VerifyChecksum.

3. If data_cache_file_max_size_bytes changes, it does not affect the existing cache files, but it will affect the files created later (and may also affect the newest existing cache file).

4. Other parameter changes, no matter.

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@993
PS4, Line 993:   std::ofstream file;
             :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
> What happens if the file already exists? For example, if we dump to
 > the file, then start back up, load it, and then we are shutting
 > down again. Should we be truncating any existing file? (e.g.
 > std::ofstream::trunc)
 > 
 > Also, do we need to open the file in binary mode? (std::ofstream::binary)

It doesn't matter if the file exists.  The default open mode for ofstream is std::ostream::out which also implicitly includes std::ofstream::trunc, so if the file already exists it will be opened and truncated as if it were a new empty file.

I have not found any difference yet between adding std::ofstream::binary or not, they both work, but I will add it later anyway just to be on the safe side.


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@1010
PS4, Line 1010:   std::ifstream file;
              :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
> Do we need to open in binary mode? (std::ofstream::binary)

Be consistent with above, open in binary mode too.


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h
File be/src/util/cache/cache-internal.h:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h@316
PS4, Line 316:         handles.emplace_back(
             :             reinterpret_cast<Cache::Handle*>(handle), Cache::HandleDeleter(this));
> I haven't tried this, but I think we should be able to move the
 > UniqueHandle into the vector. i.e.
 > 
 > handles.push_back(std::move(handle));
 > 
 > This should also bring along the custom deleter.

I tried but couldn't compile. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc@1088
PS4, Line 1088:  The UNPROTECTED
              :   // entry can be in a recency list and a unprotected list at the same time, so a
              :   // temporary list is required to remove duplicating unprotected entries when collecting.
              :   std::list<LIRSHandle*> unprotecteds;
> There are two ways to avoid constructing this list:
 > 
 > 1) When iterating the recency_list_, don't do anything for
 > UNPROTECTED entries. Then, all the UNPROTECTED will be handled by
 > walking the unprotected list.
 > 
 > 2) When iterating the recency_list_, put both PROTECTED and
 > UNPROTECTED into the handles vector. Then, when walking the
 > unprotected list, check whether it is linked on the recency_list_
 > (i.e. e->recency_list_hook_.is_linked()) and only add it to the
 > handles vector if it is not.

Thanks for the suggestion, I will fix it later.



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 10:37:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#8).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 703 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 11:

> (4 comments)
 > 
 > Hi Joe, thanks for the code review. I've fixed some issues, and I
 > still work about the custom cluster test.
 > In the meantime, it would be better if we could solve IMPALA-10971
 > first, as this issue affects the correctness of some metrics (cache
 > size and number of entries) and could cause tests to fail. If you
 > don't have time to deal with it, you could assign it to me, and
 > I'll fix it as soon as possible, thanks.

I put together a change to fix IMPALA-10971 here: https://gerrit.cloudera.org/#/c/19930/
I'm in the process of merging it.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 11
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Tue, 30 May 2023 17:36:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9373/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Mon, 05 Jun 2023 18:51:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#6).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 641 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12870/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Apr 2023 04:09:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#3).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 651 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19532/11/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/19532/11/tests/custom_cluster/test_data_cache.py@59
PS11, Line 59: mertic
> Nit: Spelling 'metric'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 11
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Wed, 31 May 2023 03:06:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#9).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 703 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has uploaded a new patch set (#14) to the change originally created by Zihao Ye. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump&load function with setting
'data_cache_keep_across_restarts=true', when the Impalad process is
closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will
collect the cache metadata and dump them to the location where the cache
directory is located. When the Impalad process restarts, it will try to
load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.
- Add end-to-end test in test_data_cache.py
Perform end-to-end testing in a custom cluster, including executing
queries, gracefully restarting, verifying metrics, re-executing the same
query and verifying hits/misses. This also includes testing the
modification of cache capacity and restart, as well as testing restarts
while querie is in progress.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
M tests/common/impala_cluster.py
M tests/custom_cluster/test_data_cache.py
14 files changed, 883 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/14
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.h@496
PS1, Line 496:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@204
PS1, Line 204:         kudu::Env::Default()->NewRWFile(opts, path, &cache_file->file_), 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@484
PS1, Line 484:     string key; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@487
PS1, Line 487:     /// cache file. When we dump the cache metadata, the ‘file_’ is meaningless because
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@683
PS1, Line 683:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@1037
PS1, Line 1037:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@1284
PS1, Line 1284:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.h@417
PS1, Line 417:   /// Try to dump the data of remote data cache to disk, so it could be loaded when 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.cc@78
PS1, Line 78: DEFINE_bool(data_cache_enable_dumping, false, 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1086
PS1, Line 1086:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1105
PS1, Line 1105:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1125
PS1, Line 1125:   
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:09:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12429/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:37:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12430/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:41:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13147/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 12
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Wed, 31 May 2023 03:26:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 14: Code-Review+2

The test runs passed, so this looks good to me. Thanks for the improvement!


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Tue, 06 Jun 2023 19:40:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12985/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 10
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Wed, 10 May 2023 08:15:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12530/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 10:34:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
18770832838@163.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/scheduling/executor-group.cc
File be/src/scheduling/executor-group.cc:

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/scheduling/executor-group.cc@108
PS5, Line 108: auto cmp = [](const BackendDescriptorPB& a, const BackendDescriptorPB& b) {
             :     return a.address().port() < b.address().port();
             :   };
             :   std::sort(be_descs.begin(), be_descs.end(), cmp);
> This seems fine for providing consistent scheduling across multiple backend
Thank you for the reminder!

- Regarding the changes in the number of executors in the group, based on the characteristics of consistent hashing, the addition or removal of nodes will only affect neighboring nodes, meaning that the addition or removal of individual nodes will not have a significant impact on the scheduling results. I have already conducted relevant tests before, and the test results have also confirmed this. Shutting down a node will only cause the scan range that should have been assigned to it to be assigned to another node. After restarting the node, we can obtain the same scheduling results as before shutdown. Therefore, this does not require additional processing, and normal dumping and reloading is ok.

- If the IP of the executor changes, it will indeed have a relatively large impact, as consistent hashing is based on node IP addresses. However, if only a small number of node IPs change, the situation is similar to changes in the number of executors, with limited impact. But if the IPs of a large number of nodes change, it will lead to completely different scheduling results, and the cache hit rate in the period after restart will indeed be greatly reduced. If IMPALA-11979 is resolved, it will be very helpful in such cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Mar 2023 03:27:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13015/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 11
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Fri, 12 May 2023 11:04:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#10).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 712 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/10
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 10
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#12).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.
- Add end-to-end test in test_data_cache.py
Perform end-to-end testing in a custom cluster, including executing
queries, gracefully restarting, verifying metrics, re-executing the same
query and verifying hits/misses. This also includes testing the
modification of cache capacity and restart, as well as testing restarts
while querie is in progress.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
M tests/common/impala_cluster.py
M tests/custom_cluster/test_data_cache.py
14 files changed, 872 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 12
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12742/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Apr 2023 10:56:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 13:

Thank you for the changes, I will be rebasing this change and starting some test jobs (ASAN, etc). Once those succeed, this will be good to go.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 13
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Mon, 05 Jun 2023 17:00:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#13).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump&load function with setting
'data_cache_keep_across_restarts=true', when the Impalad process is
closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will
collect the cache metadata and dump them to the location where the cache
directory is located. When the Impalad process restarts, it will try to
load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.
- Add end-to-end test in test_data_cache.py
Perform end-to-end testing in a custom cluster, including executing
queries, gracefully restarting, verifying metrics, re-executing the same
query and verifying hits/misses. This also includes testing the
modification of cache capacity and restart, as well as testing restarts
while querie is in progress.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
M tests/common/impala_cluster.py
M tests/custom_cluster/test_data_cache.py
14 files changed, 872 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/13
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 13
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#2).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 651 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#4).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 653 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Tue, 06 Jun 2023 00:16:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Mon, 05 Jun 2023 19:10:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#5).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 640 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12428/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:27:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 9:

(4 comments)

Hi Joe, thanks for the code review. I've fixed some issues, and I still work about the custom cluster test.
In the meantime, it would be better if we could solve IMPALA-10971 first, as this issue affects the correctness of some metrics (cache size and number of entries) and could cause tests to fail. If you don't have time to deal with it, you could assign it to me, and I'll fix it as soon as possible, thanks.

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@131
PS9, Line 131: DEFINE_bool(data_cache_enable_loading, false,
             :     "(Experimental) With loading enabled, data cache will try to load the previously dump"
             :     " file (created by the dumping feature enabled by the 'data_cache_enable_dumping') "
             :     "before create a empty cache. if loading fails, it will proceed with regular "
             :     "initialization.");
> I'm thinking that we can combine data_cache_enable_dumping and data_cache_e
Done


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@684
PS9, Line 684:   // Check if there is enough space available at this point in time.
             :   uint64_t available_bytes;
             :   RETURN_IF_ERROR(FileSystemUtil::GetSpaceAvailable(path_, &available_bytes));
             :   if (available_bytes < capacity_) {
             :     const string& err = Substitute("Insufficient space for $0. Required $1. Only $2 is "
             :         "available", path_, PrettyPrinter::PrintBytes(capacity_),
             :         PrettyPrinter::PrintBytes(available_bytes));
             :     LOG(ERROR) << err;
             :     return Status(err);
             :   }
> I think this check can fire if we have the cache on its own device with the
Done. The current check now takes into account the size of existing cached data, but it relies on TOTAL_BYTES, as mentioned earlier. It is better to first solve IMPALA-10971 to prevent any potential issues.


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@1056
PS9, Line 1056: Status DataCache::Partition::Load() {
> I think we want to delete files that are not listed in the metadata. It sho
Done


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h
File be/src/util/cache/cache.h:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h@288
PS9, Line 288:  Traverse
> Nit: Since we are exporting all entries as a single vector, let's call this
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Wed, 10 May 2023 07:59:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12959/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Sat, 06 May 2023 08:48:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Zihao Ye (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#11).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.
- Add end-to-end test in test_data_cache.py
Perform end-to-end testing in a custom cluster, including executing
queries, gracefully restarting, verifying metrics, re-executing the same
query and verifying hits/misses. This also includes testing the
modification of cache capacity and restart, as well as testing restarts
while querie is in progress.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
M tests/common/impala_cluster.py
M tests/custom_cluster/test_data_cache.py
14 files changed, 872 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/11
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 11
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump&load function with setting
'data_cache_keep_across_restarts=true', when the Impalad process is
closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will
collect the cache metadata and dump them to the location where the cache
directory is located. When the Impalad process restarts, it will try to
load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.
- Add DataCacheTest,#ChangeConfBeforeLoad
Used to test whether the original cache contents can be read after the
data cache is dumped and the configuration is changed and then reloaded.
- Add end-to-end test in test_data_cache.py
Perform end-to-end testing in a custom cluster, including executing
queries, gracefully restarting, verifying metrics, re-executing the same
query and verifying hits/misses. This also includes testing the
modification of cache capacity and restart, as well as testing restarts
while querie is in progress.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Reviewed-on: http://gerrit.cloudera.org:8080/19532
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
M tests/common/impala_cluster.py
M tests/custom_cluster/test_data_cache.py
14 files changed, 883 insertions(+), 82 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 15
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 4:

(4 comments)

Thanks for putting this up, we have been thinking about implementing this for a while now!

I'm still reading through this, but here are some initial comments.

From a testing point of view, I'm interested about what happens if a user changes the data_cache parameter value. For example, they could change the size of the cache when they restart.

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@993
PS4, Line 993:   std::ofstream file;
             :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
What happens if the file already exists? For example, if we dump to the file, then start back up, load it, and then we are shutting down again. Should we be truncating any existing file? (e.g. std::ofstream::trunc)

Also, do we need to open the file in binary mode? (std::ofstream::binary)


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@1010
PS4, Line 1010:   std::ifstream file;
              :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
Do we need to open in binary mode? (std::ofstream::binary)


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h
File be/src/util/cache/cache-internal.h:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h@316
PS4, Line 316:         handles.emplace_back(
             :             reinterpret_cast<Cache::Handle*>(handle), Cache::HandleDeleter(this));
I haven't tried this, but I think we should be able to move the UniqueHandle into the vector. i.e.

handles.push_back(std::move(handle));

This should also bring along the custom deleter.


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc@1088
PS4, Line 1088:  The UNPROTECTED
              :   // entry can be in a recency list and a unprotected list at the same time, so a
              :   // temporary list is required to remove duplicating unprotected entries when collecting.
              :   std::list<LIRSHandle*> unprotecteds;
There are two ways to avoid constructing this list:

1) When iterating the recency_list_, don't do anything for UNPROTECTED entries. Then, all the UNPROTECTED will be handled by walking the unprotected list.

2) When iterating the recency_list_, put both PROTECTED and UNPROTECTED into the handles vector. Then, when walking the unprotected list, check whether it is linked on the recency_list_ (i.e. e->recency_list_hook_.is_linked()) and only add it to the handles vector if it is not.



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Feb 2023 03:22:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19532/2/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/19532/2/be/src/runtime/io/data-cache.h@496
PS2, Line 496:   
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:19:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/util/cache/lirs-cache.cc@1088
PS5, Line 1088:   // and ignore entries that are not resident (i.e. TOMBSTONE entries). 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 10:15:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "David Rorke (Code Review)" <ge...@cloudera.org>.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/scheduling/executor-group.cc
File be/src/scheduling/executor-group.cc:

http://gerrit.cloudera.org:8080/#/c/19532/5/be/src/scheduling/executor-group.cc@108
PS5, Line 108: auto cmp = [](const BackendDescriptorPB& a, const BackendDescriptorPB& b) {
             :     return a.address().port() < b.address().port();
             :   };
             :   std::sort(be_descs.begin(), be_descs.end(), cmp);
This seems fine for providing consistent scheduling across multiple backends within a host, but I think there are a couple other cases that may need to be handled:

* Number of executors in the group changes after restart leading to different scan range scheduling (maybe we should just not support reload and clear any dumped cache state in this case?).

* IP addresses of executors in the group change after restart leading to different hashing and different scan range scheduling (IMPALA-11979). Note IP addresses will change on restart in a kubernetes environment.



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Mar 2023 16:37:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello David Rorke, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19532

to look at the new patch set (#7).

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................

IMPALA-11904: Data cache support dumping for reloading

Data cache mainly includes cache metadata and cache files. The cache
files are located on the disk and is responsible for storing cached data
content, while the cache metadata is located in the memory and is
responsible for indexing to the cache file according to the cache key.
Before this patch, if the impalad process exits, the cache metadata will
be lost. After the Impalad process restarts, we cannot reuse the cache
file even though it is still on the disk, because there is no
corresponding cache metadata for index.

This patch implements the dump and load functions of the data cache.
After enabling the dump function with setting
'data_cache_enable_dumping', when the Impalad process is closed by
graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the
cache metadata and dump them to the location where the cache directory
is located. After enabling the load function with setting
'data_cache_enable_loading', when the Impalad process starts, it will
try to load the dumped files on the disk to restore the original cache
metadata, so that the existing cache files can be reused without
refilling the cache.

The cache can be safely dumped during query execution, because before
the dump starts, the data cache will be set to read-only to prevent the
inconsistency between the metadata dump and the cache file. Note that
the dump files will also use disk space. After testing, the size of the
dump file is generally not more than 0.5% of the size of all cache
files.

Testing:
- Add DataCacheTest,#SetReadOnly
Used to test whether set/revoke read-only takes effect, even when there
are writes in progress.
- Add DataCacheTest,#DumpAndLoad
Used to test whether the original cache contents can be read after a
data cache dump and reload.

Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
---
M CMakeLists.txt
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/scheduling/executor-group.cc
M be/src/service/impala-server.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache.h
M be/src/util/cache/lirs-cache.cc
M be/src/util/cache/rl-cache.cc
12 files changed, 650 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: Anonymous Coward <18...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 9:

(5 comments)

Sorry for the delay, I'm back to reviewing this.

From a testing point of view, it would be nice to have a custom cluster test that starts up Impala with data cache enabled (along with this functionality), does some queries, does a graceful shutdown, then starts back up, verifies metrics, and then does some queries and verifies hits/misses/etc.

It would basically do a mixture of tests/custom_cluster/test_data_cache.py and tests/custom_cluster/test_restart_services.py (which does some cases of shutting down gracefully and then starting back up).

Here are some things it would be useful to verify:
1. The metrics for the size of the data cache and number of entries are the same before graceful shutdown (for an idle system) and after startup.
2. If we know the data cache has entries for a particular query, then after startup we should be able to run the same query and see it get cache hits like usual.
3. It would be nice to do a sanity check on the read-only aspect. One way to do that would be to start a query that will try to write to the cache in one thread, have another thread immediately do shutdown, they race and we want to make sure nothing crashes and the cache is usable after startup. In some custom cluster tests, we process the impalad logs to look for a particular message, so it might be useful to look for messages and indicate errors in loading the cache files. We use that technique in custom_cluster/test_thrift_socket.py https://github.com/apache/impala/blob/master/tests/custom_cluster/test_thrift_socket.py#L93-L95
4. Maybe fill up a cache of one size and then restart with a smaller size and verify that things got evicted.

In my hand tests, things work as expected, and it is nice to see this functionality.

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@131
PS9, Line 131: DEFINE_bool(data_cache_enable_loading, false,
             :     "(Experimental) With loading enabled, data cache will try to load the previously dump"
             :     " file (created by the dumping feature enabled by the 'data_cache_enable_dumping') "
             :     "before create a empty cache. if loading fails, it will proceed with regular "
             :     "initialization.");
I'm thinking that we can combine data_cache_enable_dumping and data_cache_enable_loading into a single parameter. I don't think you would set one without the other, but if there is a use case that I'm not thinking about, let me know.

Maybe something like "data_cache_keep_across_restarts"?


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@684
PS9, Line 684:   // Check if there is enough space available at this point in time.
             :   uint64_t available_bytes;
             :   RETURN_IF_ERROR(FileSystemUtil::GetSpaceAvailable(path_, &available_bytes));
             :   if (available_bytes < capacity_) {
             :     const string& err = Substitute("Insufficient space for $0. Required $1. Only $2 is "
             :         "available", path_, PrettyPrinter::PrintBytes(capacity_),
             :         PrettyPrinter::PrintBytes(available_bytes));
             :     LOG(ERROR) << err;
             :     return Status(err);
             :   }
I think this check can fire if we have the cache on its own device with the cache capacity close to the size of the device. The cache files are still there consuming space, and this check doesn't understand that the files are part of the cache. i.e. If the capacity is 200GB and the device has 250GB but the existing files use 200GB, it will look at the remaining 50GB and say it isn't enough.


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@1056
PS9, Line 1056: Status DataCache::Partition::Load() {
I think we want to delete files that are not listed in the metadata. It shouldn't happen, but if it does, we want to have the cache directory to only have files that we know about. Anything left over that we don't know about is taking up space on the device and could cause problems for our calculations of space usage.

Another question is whether we should delete the metadata file after we have loaded it. Once we start writing to the data cache, the metadata file can't be used any more, so it might be better to remove it to free up space.


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h
File be/src/util/cache/cache-internal.h:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h@316
PS4, Line 316:         handles.emplace_back(
             :             reinterpret_cast<Cache::Handle*>(handle), Cache::HandleDeleter(this));
> > I haven't tried this, but I think we should be able to move the
Nevermind, I thought handle was a unique_ptr, and it isn't. This is fine as-is.


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h
File be/src/util/cache/cache.h:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h@288
PS9, Line 288:  Traverse
Nit: Since we are exporting all entries as a single vector, let's call this Dump rather than Traverse (and apply the name change to the other pieces of the cache implementation).



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Sat, 06 May 2023 18:44:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 11:

(1 comment)

Thanks for adding the tests, this is looking close to ready.

http://gerrit.cloudera.org:8080/#/c/19532/11/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/19532/11/tests/custom_cluster/test_data_cache.py@59
PS11, Line 59: mertic
Nit: Spelling 'metric'



-- 
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 11
Gerrit-Owner: Zihao Ye <ey...@163.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <ey...@163.com>
Gerrit-Comment-Date: Tue, 30 May 2023 19:15:08 +0000
Gerrit-HasComments: Yes