You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yingchun Lai (Code Review)" <ge...@cloudera.org> on 2022/05/26 15:48:25 UTC

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Yingchun Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18569


Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,093 insertions(+), 499 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#12).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
38 files changed, 2,140 insertions(+), 451 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/12
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 12
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#16).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
44 files changed, 2,156 insertions(+), 579 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/16
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 16
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#35).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,089 insertions(+), 572 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/35
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 35
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 73:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt
File src/kudu/fs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44
PS71, Line 44: kudu_util
> Well, NO_TESTS is rather something about the way the binaries are built, no
Done


http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc@220
PS72, Line 220:     // In non-test en
> Please use IsGTest() instead.  You could find its usages in data_dirs.cc an
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:54:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#64).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,660 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/64
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 64
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#18).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
44 files changed, 2,167 insertions(+), 589 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/18
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 18
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#43).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
28 files changed, 1,813 insertions(+), 371 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/43
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 43
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#29).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,820 insertions(+), 445 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/29
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 29
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#30).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,820 insertions(+), 445 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/30
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 30
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#28).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
33 files changed, 2,016 insertions(+), 586 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/28
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 28
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 60: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 60
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Nov 2023 04:09:24 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 61:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/log_block_manager.cc@2020
PS60, Line 2020:  CONTAINER_DISK_FAILURE(block_manager_->env_->DeleteFile(data_file_name),
> Should also delete the metadata stored RocksDB?
Yes, it is done by L2014, all block ids belong to this container will be deleted.



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 61
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Nov 2023 07:26:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#13).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
38 files changed, 2,142 insertions(+), 451 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/13
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 13
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#19).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
44 files changed, 2,168 insertions(+), 589 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/19
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 19
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#36).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,085 insertions(+), 576 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/36
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 36
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#46).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,846 insertions(+), 383 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/46
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 46
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#50).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,478 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/50
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 50
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#49).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,578 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/49
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 49
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 59:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.h@46
PS1, Line 46: // Convert a rocksdb::Status to a kudu::Status.
RdbStatusToKuduStatus


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc@131
PS1, Line 131:   Shutdown();
             : }
             : 
             : Status Dir::OpenRocksDB() {
             :   CHECK_STREQ(FLAGS_block_manager.c_str(), "logr");
             :   if (db_ != nullptr) {
             :     // Check 'db_' is only possible to be non-nullptr in test environments.
             :     // Some unit tests (e.g. BlockManagerTest.PersistenceTest) will reopen the block manager,
             :     // 'db_' is non-nullptr in this case.
             :     CHECK(!GetTestDataDirectory().empty()) <<
             :         Substitute("It's not allowed to reopen the RocksDB $0 except in tests", dir_);
             :     return Status::OK();
             :   }
It is better to read configure from Kudu gflagfile


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc@172
PS1, Line 172: .prefix_
better to rename it is_rdb_init


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@146
PS58, Line 146:   opts.create_if_missing = true;
Could you define a flag for this parameter, and give some comments about the meaning?


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@197
PS58, Line 197: dir_
JoinPathSegments(dir_, "rdb")?
Maybe define a parameter rdb_path.


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@198
PS58, Line 198: delete db_;
              :     db_ = nullptr;
How about using unique_ptr for db_?


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/fs_manager.cc@340
PS1, Line 340:   } else {
Use Enum type to represent file, log, logr


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h@423
PS58, Line 423: EstimateContainerCount
Please give some comments.


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h@562
PS58, Line 562:     return children_count / 2;
Why can not get the correct container number? 
medata + data file?


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/log_block_manager.cc@1269
PS1, Line 1269:                                                         const st
The parameter 'id' is not used in this function. The design of this interface is a little strange



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 59
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Nov 2023 08:32:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#62).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,678 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/62
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 62
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Reviewed-on: http://gerrit.cloudera.org:8080/18569
Tested-by: Alexey Serbin <al...@apache.org>
Reviewed-by: Alexey Serbin <al...@apache.org>
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,782 insertions(+), 185 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 74
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#38).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,075 insertions(+), 578 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/38
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 38
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has restored this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 28
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#31).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,762 insertions(+), 451 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/31
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 31
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#59).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are contructed the same to the above.
f. Remove blocks from a container
   Contruct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,599 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/59
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 59
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 59:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc@1899
PS59, Line 1899: WARN_NOT_OK
Should it return when delete data file failed? Would it affects line 1902?


http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc@3966
PS59, Line 3966: Data directory failed
nit: Open rocksDB failed



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 59
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Nov 2023 09:28:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#60).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are contructed the same to the above.
f. Remove blocks from a container
   Contruct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,610 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/60
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 60
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#68).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,748 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/68
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 68
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#66).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,663 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/66
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 66
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 71:

(71 comments)

http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@12
PS71, Line 12: long time bootstrap consumption
> nit: long bootstrap times
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@24
PS71, Line 24: different
> nit: difference
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@25
PS71, Line 25: is
> nit: is that
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@26
PS71, Line 26:  a
             : native
> nit: in a native
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@27
PS71, Line 27: ,
> nit: end the sentence here, it's long enough already
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@27
PS71, Line 27: the
> nit: The
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@28
PS71, Line 28: clase
> class
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@29
PS71, Line 29:  container
> nit: a container
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@35
PS71, Line 35:  container
> nit: a container
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@38
PS71, Line 38: load containers when bootstrap
> nit: loading containers during the bootstrap phase
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@39
PS71, Line 39:  container
> nit: a container
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@43
PS71, Line 43:  container
> nit: a container
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@56
PS71, Line 56: use
> nit: uses
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@57
PS71, Line 57: , it is specified by flag
             :   '--block_manager'.
> The new block manager is enabled by setting --block_manager=logr
Done


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@66
PS71, Line 66: it
             : shows that reopen staged reduced upto 90% time cost.
> nit: it shows that the time spent to re-open tablet server's metadata when 
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/client/CMakeLists.txt
File src/kudu/client/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/client/CMakeLists.txt@284
PS71, Line 284: rocksdb
> This looks a bit odd: how does it happen that Kudu C++ client tests need ro
I guess it's because the tests depends mini_cluster, and mini_cluster depends rocksdb.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt
File src/kudu/fs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44
PS71, Line 44: kudu_test_util
> This looks wrong: a non-test libkudu_fs library is now dependent on libkudu
In https://gerrit.cloudera.org/c/18569/71/src/kudu/fs/dir_manager.cc#221, I use the function GetTestDataDirectory() in kudu_test_util (from src/kudu/util/test_util.h) to judge whether the proccess is in running in test environments.

I updated it to use NO_TESTS macro instead.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1205
PS71, Line 1205: larger
> nit: greater
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: to
> into
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: is not take effect on
> does not affect
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: so the metadata updating of "logr"
               :   // block_manager works well
> updating the metadata stored in the logr-based block manager succeeds witho
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@121
PS71, Line 121: perpare
> preparatory
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@202
PS71, Line 202: public
> style nit: add a space before the 'public' label, similar to the other visi
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@220
PS71, Line 220: private
> ditto for the 'private' label
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@207
PS71, Line 207: shared_ptr<rocksdb::Cache> RdbDir::s_block_cache_;
> Why do you need it here?  Isn't it enough to have it initialized using std:
Because it's a static member, it causes a link error if not define it here.

https://stackoverflow.com/questions/272900/undefined-reference-to-static-class-member


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: will be
> is
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: The
> A
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: is not
> does not
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@233
PS71, Line 233: opts.error_if_exists
> Should at least this option be enabled to avoid overriding existing metadat
It's a good point.
We should distinguish creating new data directory and opening existing data directory, and set proper options to avoid mishaps.
I added a TODO comment to complete this work because it's a bit of complex, and may introduce much changes.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@281
PS71, Line 281: wait flush jobs to finish is enough
> it's enough to wait for the flush jobs to finish
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@282
PS71, Line 282: may consume longer time which cause longer time to shut down server.
> nit: ... may take more time, which results in longer times to shut down a s
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@295
PS71, Line 295: const 
> Either remove 'const' or add 'const' to the result pointer.  Otherwise, it'
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_util.cc
File src/kudu/fs/dir_util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_util.cc@169
PS71, Line 169: dir_type_ == "log" || dir_type_ == "logr"
> This pattern is seen in multiple places.  Does it make sense to introduce a
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/file_block_manager.h@74
PS71, Line 74: kName
> This should have been 'name'; 'kSomething' is for variables, not method/fun
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_manager-test.cc
File src/kudu/fs/fs_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_manager-test.cc@224
PS71, Line 224: done
> nit: addressed
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h
File src/kudu/fs/fs_report.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h@189
PS71, Line 189:     std::string container;
              :     std::string rocksdb_key;;
> Could these be 'const'?
No, because std::move is used in constructor to init these members.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h@190
PS71, Line 190: ;
> nit: remove the extra semi-colon
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc
File src/kudu/fs/log_block_manager-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@671
PS71, Line 671: slice_key
> nit: use rocksdb::Slice(key) in-place here as well?
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@678
PS71, Line 678:       DCHECK(dir);
> I'm not sure this makes sense -- it 'dir' was nullptr, it would crash one l
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@697
PS71, Line 697: Kinds
> nit: Types
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@700
PS71, Line 700:   auto r = rand_.Uniform(1);
              :   DCHECK_EQ(r, 0);
> What is this for?
Useless, remvoed.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@734
PS71, Line 734: rocksdb::WriteOptions()
> nit: could this be replaced with {} ?
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@734
PS71, Line 734: slice_key
> nit: use rocksdb::Slice(key) here as well as for the value?
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@741
PS71, Line 741: slice_key
> nit: use rocksdb::Slice(key) in-place here?
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@1629
PS71, Line 1629:     CHECK_EQ(FLAGS_block_manager, "logr");
> nit: is this necessary given of the condition of the enclosing if() clause?
Not necessary, I removed it.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@2612
PS71, Line 2612: CHECK_OK
> Is it possible to use ASSERT_OK() here as well?
No, because this lambda needs a return value. An error returns if using ASSERT*

 /.../src/kudu/fs/log_block_manager-test.cc: In lambda function:
 /.../src/kudu/util/test_macros.h:38:45: error: void value not ignored as it ought to be
     FAIL() << "Bad status: " << _s.ToString();  \
                                             ^
 /.../src/kudu/fs/log_block_manager-test.cc:2612:5: note: in expansion of macro ‘ASSERT_OK’
     ASSERT_OK(s);
     ^~~~~~~~~


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@2773
PS71, Line 2773: // TODO(yingchun): need to clear up dead metadata
> Is this still relevant?  If yes, please add corresponding comment into the 
Yes, it's relevant. I added comment in the latest PS in log_block_manager.cc.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@546
PS71, Line 546: kName
> style nit: this should be 'name'; 'kSomething' is for variables
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@546
PS71, Line 546:   static const char* kName() { return "log"; }
> Could this be constexpr?
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@592
PS71, Line 592: to an RocksDB
> into a RocksDB
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@592
PS71, Line 592: The metadata part of a container
> All the container's metadata
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@594
PS71, Line 594: will be
> are
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@594
PS71, Line 594: in
> of
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@595
PS71, Line 595: are removed
> are being removed
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@596
PS71, Line 596: in RocksDB will be compacted
> ... in the RocksDB instance is compacted ...
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@595
PS71, Line 595: block
              : // manager
> the block manager
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@599
PS71, Line 599: Compare
> Comparing
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@601
PS71, Line 601: to scan all records
> to scan through all the records
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@601
PS71, Line 601: may
              : // cause lower performance
> may adversely affect the performance
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@603
PS71, Line 603: a lot of options
> many configuration options
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@604
PS71, Line 604:  flexibly.
> nit: drop this part
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@666
PS71, Line 666:   static std::string ConstructRocksDBKey(const std::string& container_id,
> Could this be 'constexpr' as well?
It seens it couldn't, because the parameters container_id and block_id are variables, don't match The requirement of "each of its parameters must be of a LiteralType" [1].

1. https://en.cppreference.com/w/cpp/language/constexpr


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@123
PS71, Line 123: useful
> effective
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@4080
PS71, Line 4080:   // The 'keys' is used to keep the lifetime of the data referenced by Slices in 'batch'.
               :   vector<string> keys;
> Why to have this 'keys' array when in the while() cycle below only the curr
It's not enough, we have to keep their lifetime before rdb->Write() the batch.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc@2117
PS71, Line 2117: only logr block manager is not supported
> logr block manager is not yet supported
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc@4574
PS71, Line 4574:   if (FLAGS_block_manager == "logr") {
               :     // Exclude the RocksDB data size.
               :     uint64_t size_of_rdb;
               :     ASSERT_OK(env_->GetFileSizeOnDiskRecursively(JoinPathSegments(data_dir, "rdb"), &size_of_rdb));
               :     size_before_delete -= size_of_rdb;
               :   }
> nit: this block and another one at lines 4615 look very similar.  Does it m
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc@761
PS71, Line 761: to reset 'dd_manager' to release
              :   // the last reference of dd_manager() to release the RocksDB LOCK file, otherwise
> ... to reset 'dd_manager', releasing the last 'dd_manager' reference.  That
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc@763
PS71, Line 763: require
> acquire
Done


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt@a453
PS71, Line 453: 
> Why to remove this?  This doesn't look to be a rocksdb-related update.
Because I used a function in [1], now it has been updated, and this change is not needed now.


1. https://gerrit.cloudera.org/c/18569/71/src/kudu/fs/dir_manager.cc#221


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt@658
PS71, Line 658:     rocksdb)
> Just curious: why lz4 isn't needed here, but is needed for jwt-util-test be
I'm not sure, maybe the order of some dependent libraries take effect?


http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh@1192
PS71, Line 1192:     -DWITH_LZ4=ON
> With current settings, does this use LZ4 from Kudu's thirdparty or it someh
I added a new option "-Dlz4_ROOT_DIR=$PREFIX" for RocksDB to find lz4 according to [1].
I tried again to build Kudu on a clean node (without LZ4 library installed in system), everything built well, and I can see the LZ4 is found in Kudu's thirdparty path when build RocksDB.

 -- Found lz4: /root/kudu/thirdparty/installed/uninstrumented/lib/liblz4.so

or

 -- Found lz4: /root/kudu/thirdparty/installed/tsan/lib/liblz4.so

1. https://github.com/facebook/rocksdb/blob/v7.7.3/cmake/modules/Findlz4.cmake#L10



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 71
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Jan 2024 17:05:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#72).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,782 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/72
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 72
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#40).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,576 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/40
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 40
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#20).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
44 files changed, 2,159 insertions(+), 588 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/20
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 20
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#5).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,143 insertions(+), 499 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 5
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#23).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch introduce RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds RocksDB as a thirdparty lib
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,142 insertions(+), 585 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/23
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 23
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#42).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
28 files changed, 1,787 insertions(+), 362 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/42
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 42
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 49:

> Patch Set 47:
> 
> (1 comment)

Done, see https://github.com/apache/kudu/commit/8217fd87a6bb1401cb23acb85e1d89345483e4a6


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 49
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Comment-Date: Sun, 22 Oct 2023 08:06:00 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 72: Code-Review+2

(4 comments)

Please address one nit about using IsGTest() instead using NO_TESTS macro.

Otherwise, this looks good to me!

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt
File src/kudu/fs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44
PS71, Line 44: kudu_util
> In https://gerrit.cloudera.org/c/18569/71/src/kudu/fs/dir_manager.cc#221, I
Well, NO_TESTS is rather something about the way the binaries are built, not whether some code is running in a test context.

Please consider updating this and using IsGTest() utility function that's declared in test_util_prod.h


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@207
PS71, Line 207: RdbDir::RdbDir(Env* env, DirMetrics* metrics,
> Because it's a static member, it causes a link error if not define it here.
Ah, indeed -- I missed this.  Thank you for the clarification.


http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc@220
PS72, Line 220: #if defined(NO_TESTS)
Please use IsGTest() instead.  You could find its usages in data_dirs.cc and a few other places.


http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh@1192
PS71, Line 1192:     -DWITH_LZ4=ON
> I added a new option "-Dlz4_ROOT_DIR=$PREFIX" for RocksDB to find lz4 accor
That's perfect, thank you!



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 72
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 01:56:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 62:

(16 comments)

Thank you for revving the patch!

Overall looks better than a few patchsets ago, but I'm too many down_casts bother me.  I'll take a closer look, trying to suggest a way to get rid of those.  Meanwhile, maybe you could also take a look at possible options to minimize using down_cast?  Using down_cast in many places is a sign that the class hierarchy needs to be revisited.

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

PS61: 
> Yep, you ever asked about it [1].
Right, there are a few main themes behind:
  1. Protect against possible licensing issues (if they change the license of a whim)
  2. Reduce the size of RELEASE Kudu binaries.

If taking a brief view on that, it's not clear how to achieve that sort of behavior for RocksDB library.  Since RocksDB's API/ABI is C++, not C, dlopen() isn't going to work as easy as it would do for C API/ABI.  I was just curious whether you've done any further research in that direction.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@208
PS66, Line 208: RdbDir::RdbDir(Env* env, DirMetrics* metrics, FsType fs_type, std::string dir,
              :                std::unique_ptr<DirInstanceMetadataFile> metadata_file,
              :                std::unique_ptr<ThreadPool> pool)
nit for here and elsewhere: in case of a method's signature with many parameters spread across multiple lines, consider placing each parameter in its own line -- that improves readability and makes it easier to maintain and track changes in the long run


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@216
PS66, Line 216: Check 'db_' is only possible to be valid in test environments when OpenRocksDB().
nit: try to rephrase this a bit for a better readility


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@217
PS66, Line 217: will reopen
nit: reopen

It's better to use the present tense when describing facts that are true in the current version of the code.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@219
PS66, Line 219: CHECK
nit: does it makes sense to change this to DCHECK, similar to DCHECK_STREQ above (since the code paths that could call this method are static, a misuse/mistake would be caught in DEBUG builds anyway)?


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@242
PS66, Line 242:   rocksdb::BlockBasedTableOptions tbl_opts;
nit: move this closer to the point of usage


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@268
PS66, Line 268:     return;
nit: is it expected to call Shutdown() multiple times, or that would be a programming mistake?  If the latter, consider adding corresponding DCHECK() here.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: ,
nit: drop the comma


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: there is no
there aren't any


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: in flight
in-flight


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@278
PS66, Line 278: ich cause longer time to shut down server.
reduces bootstrapping time upon next start-up.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@294
PS66, Line 294: }
Is it OK to call this even when 'db_' is null?  If not, consider adding corresponding DCHECK() to catch programming mistakes.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.h@671
PS66, Line 671: };
Please add a small comment describing the essence of this method's functionality, and document the parameters as well (similar to the doc for the DeleteContainerMetadata method above).


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: The data in RocksDB will be compacted
> When the tserver shut down normally, the RocksDB object will be destoryed i
Thank you for the explanation!


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657
PS61, Line 657: 
> In fact, I copied this comment from the super class, I will leave the "fata
Ah, so a 'fatal inconsistency' is when there isn't a way to safely recover/ignore the error and continue with the rest of server's activity without a risk of data corruption.  Thank you for the clarification.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.cc@3979
PS66, Line 3979: lockContainerRdbMeta::Open(this, dir, 
Maybe, it's possible to introduce a new virtual method into the base class called something like Prepare(), and implement it as no-op for Dir::Prepare(), while do all the necessary operations there for RdbDir::Prepare()?



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 62
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 06 Jan 2024 04:41:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#69).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,788 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/69
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 69
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 60:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.h@46
PS1, Line 46: 
> RdbStatusToKuduStatus
The return type indicate the target type, and the kudu::Status is the deault "Status" we use in Kudu, so I think it would be convenient and obvious to omit the KuduStatus.


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc@131
PS1, Line 131: }
             : 
             : Status Dir::OpenRocksDB() {
             :   CHECK_STREQ(FLAGS_block_manager.c_str(), "logr");
             :   if (db_) {
             :     // Check 'db_' is only possible to be valid in test environments when OpenRocksDB().
             :     // Some unit tests (e.g. BlockManagerTest.PersistenceTest) will reopen the block manager,
             :     // 'db_' is valid in these cases.
             :     CHECK(!GetTestDataDirectory().empty()) <<
             :         Substitute("It's not allowed to reopen the RocksDB $0 except in tests", dir_);
             :     return Status::OK();
             :   }
             : 
> It is better to read configure from Kudu gflagfile
Of course, we can improve this in following patches, as the comments in the latest patches mentioned.


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/dir_manager.cc@172
PS1, Line 172: .table_f
> better to rename it is_rdb_init
It has been updated, see the latest patch please.


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@146
PS58, Line 146:   rocksdb::Options opts;
> Could you define a flag for this parameter, and give some comments about th
Comments have been added, but gflag is not necessary currently, it is always true.


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@197
PS58, Line 197: 
> JoinPathSegments(dir_, "rdb")?
Done


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/dir_manager.cc@198
PS58, Line 198: 
              :   if (db_) {
> How about using unique_ptr for db_?
Done


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/fs_manager.cc@340
PS1, Line 340:   } else {
> Use Enum type to represent file, log, logr
There are much code assign the string flag FLAGS_block_manager to this option, it will introduce many changes. I think it's not necessary to this patch, we can do this refactor in following patch if you think it's meaningful.


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h@423
PS58, Line 423:  count of containers a
> Please give some comments.
Done


http://gerrit.cloudera.org:8080/#/c/18569/58/src/kudu/fs/log_block_manager.h@562
PS58, Line 562:   size_t EstimateContainerCoun
> Why can not get the correct container number? 
The data direcory is a common dirctory on the filesystem, Kudu users are possible to place any kind of files there. For example, a backup file will be generated if using 'kudu pbc edit' against a metadata file.


http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/1/src/kudu/fs/log_block_manager.cc@1269
PS1, Line 1269:                                                         const st
> The parameter 'id' is not used in this function. The design of this interfa
Yes, it has been removed already in the latest patch set.


http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc@1899
PS59, Line 1899: WARN_NOT_OK
> Should it return when delete data file failed? Would it affects line 1902?
Similar to line 1165, it doesn't matter, it is an operation in a do..while loop, another container name is generated in the next iteration.


http://gerrit.cloudera.org:8080/#/c/18569/59/src/kudu/fs/log_block_manager.cc@3966
PS59, Line 3966: Initialize data direc
> nit: Open rocksDB failed
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 60
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Nov 2023 00:39:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#7).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
37 files changed, 2,158 insertions(+), 510 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#4).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,130 insertions(+), 499 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#15).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,114 insertions(+), 528 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/15
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 15
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#14).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,105 insertions(+), 524 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/14
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 14
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#45).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,845 insertions(+), 383 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/45
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 45
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#32).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,038 insertions(+), 574 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/32
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 32
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#52).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
30 files changed, 1,512 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/52
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 52
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#58).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are contructed the same to the above.
f. Remove blocks from a container
   Contruct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,593 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/58
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 58
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#53).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
30 files changed, 1,590 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/53
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 53
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 73:

> Patch Set 73:
> 
> Thank you very much for the contribution!

Thanks for reviewing this big patch!


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:55:24 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 73: Verified+1

unrelated test failures:
  * CatalogManagerConfigurations/MasterStressTest.Test/2 (ASAN): KUDU-3481
  * MultiThreadedTabletTest/3.DeleteAndReinsert (RELEASE): KUDU-2667
  * SecurityITest.TestNonDefaultPrincipalMultipleMaster (RELEASE)
  * ToolTest.TestHmsList (RELEASE)
  * AutoRebalancerTest.NextLeaderResumesAutoRebalancing (TSAN)


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:27:52 +0000
Gerrit-HasComments: No

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#8).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/postflight.py
M thirdparty/vars.sh
38 files changed, 2,175 insertions(+), 514 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 27:

(3 comments)

This patch is just for overview, I will separete it to some smaller patches.

http://gerrit.cloudera.org:8080/#/c/18569/25//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/25//COMMIT_MSG@58
PS25, Line 58: - Adds RocksDB as a thirdparty lib
> I'd suggest separating this into its own patch.
OK
And I'm planning to separate this large patch into some smaller patches to make it convenient to review (as mentioned above).
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only file, is separeted from LBM and specified a new name LogfBlockManager. Its behavior has no change.
3. Some refactors, such as create and delete blocks in batch to reduce lock consult times.
4. Introduce RocksDB to thirdparty
5. Introduce a new class LogrBlockManager that stores metadata in RocksDB, and related unit tests.
6. More CLI tools.


http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.h
File src/kudu/util/oid_generator.h:

http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.h@49
PS25, Line 49: comparation
> What's "comparation"?
The usage case is to generate a log block container ID which is 1 larger than the parameter 'id', then rocksdb use it to delete a whole container's keys by using DeleteRange [id, new_id).
new_id maybe an invalid uuid, for example, when id is 'xxxxf', new_id is 'xxxxg', it will not be used to represent a container's id , but just for comparing keys in rocksdb, it's enough.


http://gerrit.cloudera.org:8080/#/c/18569/25/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/18569/25/thirdparty/build-definitions.sh@1177
PS25, Line 1177:     -DCMAKE_BUILD
> Just curious: is it possible to get away without using RTTI for the embedde
It can be removed.



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 27
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Sun, 20 Nov 2022 10:38:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#22).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch introduce RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds RocksDB as a thirdparty lib
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
43 files changed, 2,143 insertions(+), 586 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/22
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 22
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 28:

This CR is not needed to review, I'll separete it into some small CRs.


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 28
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Dec 2022 17:05:38 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#24).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch introduce RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds RocksDB as a thirdparty lib
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,145 insertions(+), 586 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/24
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 24
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#34).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,117 insertions(+), 668 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/34
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 34
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#44).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
29 files changed, 1,831 insertions(+), 375 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/44
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 44
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 61:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@50
PS61, Line 50: contructed 
nit: constructed


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@52
PS61, Line 52: Contruct 
nit: Construct


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@56
PS61, Line 56: logr
"logr" a little short  and not clear.  use 'rocksdb' directly?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h
File src/kudu/fs/fs_report.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h@291
PS61, Line 291: corrupted_rdb_record_check
This field seems no used. So add a TODO?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@2093
PS61, Line 2093:     tmp_key = Substitute("$0.$1", id_, lb->block_id().ToString());
this statement happens several times in this file, may be a simple function to construct this key is better.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@4035
PS61, Line 4035: key(e.block_id);
Should this key be      id + "." + block_id ?



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 61
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 23 Dec 2023 15:59:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 62:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

PS61: 
> I don't remember whether I asked this question before, but have you conside
Yep, you ever asked about it [1].

I guess the main motivation of using dlopen() is to avoid lisense risk, since we declared using the APLv2 license [2], so I think it's not necessary.

Is reducing the size of master/tserver binaries is another motivation?

1. https://gerrit.cloudera.org/c/18569/47//COMMIT_MSG
2. https://github.com/apache/kudu/blob/master/thirdparty/LICENSE.txt#L600



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 62
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 26 Dec 2023 03:18:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 71:

(16 comments)

I've reduced some down_cast by introducing a "rocksdb::DB* const rdb_;" member to class LogBlockContainerRdbMeta.

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

PS61: 
> Right, there are a few main themes behind:
Thank you for pointing out the issues.

1. Even it change license in new version, we can still use the old version under APLv2. We should keep in mind to check the license when upgrade thirdparty libraries.
2. It is an issue. I will explore in depth to reduce the binaries size in another patches. Maybe we can use its C APIs (https://github.com/facebook/rocksdb/blob/main/include/rocksdb/c.h)


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@208
PS66, Line 208: RdbDir::RdbDir(Env* env, DirMetrics* metrics,
              :                FsType fs_type,
              :                string dir,
> nit for here and elsewhere: in case of a method's signature with many param
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@216
PS66, Line 216: K_STREQ(FLAGS_block_manager.c_str(), "logr");
> nit: try to rephrase this a bit for a better readility
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@217
PS66, Line 217: 
> nit: reopen
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@219
PS66, Line 219: // 'd
> nit: does it makes sense to change this to DCHECK, similar to DCHECK_STREQ 
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@242
PS66, Line 242:   //  opts.max_write_buffer_number
> nit: move this closer to the point of usage
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: 
> in-flight
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: 
> nit: drop the comma
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@271
PS66, Line 271: 
> there aren't any
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@278
PS66, Line 278: .
> reduces bootstrapping time upon next start-up.
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/dir_manager.cc@294
PS66, Line 294: 
> Is it OK to call this even when 'db_' is null?  If not, consider adding cor
Done


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.h@671
PS66, Line 671: } // namespace kudu
> Please add a small comment describing the essence of this method's function
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: ith the LogBlockManagerNativeMeta, th
> Thank you for the explanation!
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657
PS61, Line 657: 
> Ah, so a 'fatal inconsistency' is when there isn't a way to safely recover/
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@1987
PS61, Line 1987: .ok() && !s_data.IsNo
> It's copied from LogBlockContainerNativeMeta::CheckContainerFiles().
'static' can not be used here, because there are some parameterized tests (enable encryption or not) re-run these code, and the static variable 'kEncryptionHeaderSize' is immutable, which cause some tests failed.


http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/66/src/kudu/fs/log_block_manager.cc@3979
PS66, Line 3979: _t* limit = FindFloorOrNull(kPerFsBloc
> Maybe, it's possible to introduce a new virtual method into the base class 
Good idea! Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 71
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Jan 2024 15:18:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#2).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,129 insertions(+), 499 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#3).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,129 insertions(+), 499 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#9).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
37 files changed, 2,158 insertions(+), 510 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 9
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#10).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
37 files changed, 2,159 insertions(+), 511 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 10
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#37).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,085 insertions(+), 576 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/37
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 37
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#47).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,846 insertions(+), 383 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/47
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 47
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 71:

(74 comments)

http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@12
PS71, Line 12: long time bootstrap consumption
nit: long bootstrap times


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@24
PS71, Line 24: different
nit: difference


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@25
PS71, Line 25: is
nit: is that


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@27
PS71, Line 27: the
nit: The


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@26
PS71, Line 26:  a
             : native
nit: in a native


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@27
PS71, Line 27: ,
nit: end the sentence here, it's long enough already


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@28
PS71, Line 28: clase
class


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@29
PS71, Line 29:  container
nit: a container


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@35
PS71, Line 35:  container
nit: a container


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@38
PS71, Line 38: load containers when bootstrap
nit: loading containers during the bootstrap phase


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@39
PS71, Line 39:  container
nit: a container


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@43
PS71, Line 43:  container
nit: a container


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@56
PS71, Line 56: use
nit: uses


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@57
PS71, Line 57: , it is specified by flag
             :   '--block_manager'.
The new block manager is enabled by setting --block_manager=logr


http://gerrit.cloudera.org:8080/#/c/18569/71//COMMIT_MSG@66
PS71, Line 66: it
             : shows that reopen staged reduced upto 90% time cost.
nit: it shows that the time spent to re-open tablet server's metadata when 99.99% of all the records removed reduced about 9.5 times when using LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/client/CMakeLists.txt
File src/kudu/client/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/client/CMakeLists.txt@284
PS71, Line 284: rocksdb
This looks a bit odd: how does it happen that Kudu C++ client tests need rocksdb as well?  Is rocksdb library really needed here?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt
File src/kudu/fs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44
PS71, Line 44: kudu_test_util
This looks wrong: a non-test libkudu_fs library is now dependent on libkudu_test_util test-only library.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1205
PS71, Line 1205: larger
nit: greater


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: to
into


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: is not take effect on
does not affect


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/block_manager-test.cc@1243
PS71, Line 1243: so the metadata updating of "logr"
               :   // block_manager works well
updating the metadata stored in the logr-based block manager succeeds without any errors


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@121
PS71, Line 121: perpare
preparatory


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@202
PS71, Line 202: public
style nit: add a space before the 'public' label, similar to the other visibility labels in this file


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.h@220
PS71, Line 220: private
ditto for the 'private' label


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@207
PS71, Line 207: shared_ptr<rocksdb::Cache> RdbDir::s_block_cache_;
Why do you need it here?  Isn't it enough to have it initialized using std::once() in the code below?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: The
A


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: will be
is


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@229
PS71, Line 229: is not
does not


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@233
PS71, Line 233: opts.error_if_exists
Should at least this option be enabled to avoid overriding existing metadata due to a mishap?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@281
PS71, Line 281: wait flush jobs to finish is enough
it's enough to wait for the flush jobs to finish


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@282
PS71, Line 282: may consume longer time which cause longer time to shut down server.
nit: ... may take more time, which results in longer times to shut down a server.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@295
PS71, Line 295: const 
Either remove 'const' or add 'const' to the result pointer.  Otherwise, it's not a const-correct construct.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_util.cc
File src/kudu/fs/dir_util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_util.cc@169
PS71, Line 169: dir_type_ == "log" || dir_type_ == "logr"
This pattern is seen in multiple places.  Does it make sense to introduce an utility function IsLogTypeDir() to check for this condition?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/file_block_manager.h@74
PS71, Line 74: kName
This should have been 'name'; 'kSomething' is for variables, not method/function names.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/file_block_manager.h@74
PS71, Line 74: static
Could this be 'constexpr' as well?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_manager-test.cc
File src/kudu/fs/fs_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_manager-test.cc@224
PS71, Line 224: done
nit: addressed


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h
File src/kudu/fs/fs_report.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h@189
PS71, Line 189:     std::string container;
              :     std::string rocksdb_key;;
Could these be 'const'?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/fs_report.h@190
PS71, Line 190: ;
nit: remove the extra semi-colon


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc
File src/kudu/fs/log_block_manager-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@671
PS71, Line 671: slice_key
nit: use rocksdb::Slice(key) in-place here as well?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@678
PS71, Line 678:       DCHECK(dir);
I'm not sure this makes sense -- it 'dir' was nullptr, it would crash one line above


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@697
PS71, Line 697: Kinds
nit: Types


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@700
PS71, Line 700:   auto r = rand_.Uniform(1);
              :   DCHECK_EQ(r, 0);
What is this for?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@734
PS71, Line 734: rocksdb::WriteOptions()
nit: could this be replaced with {} ?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@734
PS71, Line 734: slice_key
nit: use rocksdb::Slice(key) here as well as for the value?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test-util.cc@741
PS71, Line 741: slice_key
nit: use rocksdb::Slice(key) in-place here?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@1629
PS71, Line 1629:     CHECK_EQ(FLAGS_block_manager, "logr");
nit: is this necessary given of the condition of the enclosing if() clause?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@2612
PS71, Line 2612: CHECK_OK
Is it possible to use ASSERT_OK() here as well?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager-test.cc@2773
PS71, Line 2773: // TODO(yingchun): need to clear up dead metadata
Is this still relevant?  If yes, please add corresponding comment into the corresponding method of the LogBlockManagerRdbMeta implementation as well.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@546
PS71, Line 546: kName
style nit: this should be 'name'; 'kSomething' is for variables


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@546
PS71, Line 546:   static const char* kName() { return "log"; }
Could this be constexpr?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@592
PS71, Line 592: The metadata part of a container
All the container's metadata


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@592
PS71, Line 592: to an RocksDB
into a RocksDB


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@594
PS71, Line 594: will be
are


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@594
PS71, Line 594: in
of


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@595
PS71, Line 595: are removed
are being removed


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@595
PS71, Line 595: block
              : // manager
the block manager


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@596
PS71, Line 596: in RocksDB will be compacted
... in the RocksDB instance is compacted ...


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@599
PS71, Line 599: Compare
Comparing


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@601
PS71, Line 601: to scan all records
to scan through all the records


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@601
PS71, Line 601: may
              : // cause lower performance
may adversely affect the performance


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@603
PS71, Line 603: a lot of options
many configuration options


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@604
PS71, Line 604:  flexibly.
nit: drop this part


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.h@666
PS71, Line 666:   static std::string ConstructRocksDBKey(const std::string& container_id,
Could this be 'constexpr' as well?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@123
PS71, Line 123: useful
effective


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@4033
PS71, Line 4033: string cid
nit: use 'const auto&' to avoid copying?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@4050
PS71, Line 4050:  rocksdb::WriteOptions del_opt;
Remove this variable and use {} as the first parameter for the Delete() call below?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/log_block_manager.cc@4080
PS71, Line 4080:   // The 'keys' is used to keep the lifetime of the data referenced by Slices in 'batch'.
               :   vector<string> keys;
Why to have this 'keys' array when in the while() cycle below only the current key is used further in the code?  Isn't Delete() working in a synchronous way, i.e. once it returns, it doesn't refer anymore to the parameter it was given?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc@2117
PS71, Line 2117: only logr block manager is not supported
logr block manager is not yet supported


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tools/kudu-tool-test.cc@4574
PS71, Line 4574:   if (FLAGS_block_manager == "logr") {
               :     // Exclude the RocksDB data size.
               :     uint64_t size_of_rdb;
               :     ASSERT_OK(env_->GetFileSizeOnDiskRecursively(JoinPathSegments(data_dir, "rdb"), &size_of_rdb));
               :     size_before_delete -= size_of_rdb;
               :   }
nit: this block and another one at lines 4615 look very similar.  Does it make sense to separate the code into a function and use it both places (and maybe in other future tests)?


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc@761
PS71, Line 761: to reset 'dd_manager' to release
              :   // the last reference of dd_manager() to release the RocksDB LOCK file, otherwise
... to reset 'dd_manager', releasing the last 'dd_manager' reference.  That's to allow for releasing the RocksDB LOCK file.  Otherwise ...


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/tserver/tablet_server-test.cc@763
PS71, Line 763: require
acquire


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt@a453
PS71, Line 453: 
Why to remove this?  This doesn't look to be a rocksdb-related update.


http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/util/CMakeLists.txt@658
PS71, Line 658:     rocksdb)
Just curious: why lz4 isn't needed here, but is needed for jwt-util-test below?


http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh@1192
PS71, Line 1192:     -DWITH_LZ4=ON
With current settings, does this use LZ4 from Kudu's thirdparty or it somehow searches and picks up whatever is available on a build node?  Did you try to build this on a clean node where LZ4 library isn't available from anywhere but from the Kudu's thirdparty 'installed' directory?



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 71
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Jan 2024 03:52:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 61:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

PS61: 
I don't remember whether I asked this question before, but have you considered making the rocksdb library dependency not a compile-time, but rather run-time?  For example, for C libraries that's usually done using dlopen() and friends.

Is that feasible at all?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@49
PS61, Line 49: is only use internally
nit: is used only internally


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@109
PS61, Line 109: class Dir
Does it make sense to use the sub-classing here, creating a class that inherits from Dir, so only the derived class has the extra member fields and extra methods that are RocksDB-only?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@121
PS61, Line 121: return
nit: returns


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@199
PS61, Line 199: variables
nit: fields


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@201
PS61, Line 201: ,
nit: .


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@308
PS61, Line 308:   // Finds a directory by full path name, returning null if it can't be found.
              :   //
              :   // NOTE: Only for test purpose.
              :   Dir* FindDirByFullPathForTests(const std::string& full_path) const;
If this is test-only, is it feasible to move this into the protected/private section, and introduce corresponding friendship for particular tests?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc@134
PS61, Line 134: CHECK_STREQ
It seems DCHECK_STREQ() is used for the same invariant check on line 187.  Does it make sense to switch to DCHECK here as well?

From the other side, maybe these checks will not be necessary at all of using using sub-classing for the Dir class to have a class that has RocksDB specifics?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc
File src/kudu/fs/log_block_manager-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc@626
PS61, Line 626: CHECK
Why CHECK() here and DCHECK() elsewhere?  Is it possible to switch to DCHECK() everywhere?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@420
PS61, Line 420: Some initialize work
Could you be more specific what exactly is supposed to be initialized by this method?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: The data in RocksDB will be compacted
BTW, what happens if a tablet server shuts down while the RocksDB background compaction is running?  Could it bring any inconsistency for the RocksDB's data?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: background
in background


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657
PS61, Line 657: repairing a fatal inconsistency failed
failing to repair an inconsistency

BTW, what's difference between 'inconsistency' and 'fatal inconsistency'?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@1987
PS61, Line 1987: kEncryptionHeaderSize
nit: why is this name for the variable that's not a static constant?


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: will start fail
will fail starting


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: failed
of the failure


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@764
PS61, Line 764: cases
case



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 61
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 22 Dec 2023 23:36:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#54).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space amplification and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block is created
in LBM, and the item will be Delete() from RocksDB when
the block is removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
30 files changed, 1,590 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/54
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 54
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#56).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space amplification and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block is created
in LBM, and the item will be Delete() from RocksDB when
the block is removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,591 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/56
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 56
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#57).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space amplification and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block is created
in LBM, and the item will be Delete() from RocksDB when
the block is removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,591 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/57
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 57
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 60:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/dir_manager.cc@200
PS60, Line 200: Flush
nit: Flushing


http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/60/src/kudu/fs/log_block_manager.cc@2020
PS60, Line 2020:  CONTAINER_DISK_FAILURE(block_manager_->env_->DeleteFile(data_file_name),
Should also delete the metadata stored RocksDB?



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 60
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Nov 2023 04:09:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#17).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
44 files changed, 2,166 insertions(+), 589 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/17
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 17
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#21).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
43 files changed, 2,147 insertions(+), 588 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/21
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 21
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#33).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,117 insertions(+), 668 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/33
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 33
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#39).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 2,080 insertions(+), 578 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/39
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 39
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has abandoned this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Abandoned

will separeted into small patches
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 28
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 25:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18569/25//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/25//COMMIT_MSG@58
PS25, Line 58: - Adds RocksDB as a thirdparty lib
I'd suggest separating this into its own patch.


http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.h
File src/kudu/util/oid_generator.h:

http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.h@49
PS25, Line 49: comparation
What's "comparation"?


http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.cc
File src/kudu/util/oid_generator.cc:

http://gerrit.cloudera.org:8080/#/c/18569/25/src/kudu/util/oid_generator.cc@66
PS25, Line 66: string ObjectIdGenerator::NextOf(const string& id) {
             :   DCHECK(!id.empty());
             :   string next = id;
             :   next[next.size() - 1] += 1;
             :   return next;
             : }
Separate this into its own patch and add tests for this.


http://gerrit.cloudera.org:8080/#/c/18569/25/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/18569/25/thirdparty/build-definitions.sh@1177
PS25, Line 1177:     -DUSE_RTTI=ON
Just curious: is it possible to get away without using RTTI for the embedded RocksDB?  If not, it would be great to add a comment to explain why this is needed.



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 25
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 17 Nov 2022 04:50:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#48).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,582 insertions(+), 251 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/48
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 48
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#41).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
28 files changed, 1,819 insertions(+), 362 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/41
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 41
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 62:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@50
PS61, Line 50: constructed
> nit: constructed
Done


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@52
PS61, Line 52: Construct
> nit: Construct
Done


http://gerrit.cloudera.org:8080/#/c/18569/61//COMMIT_MSG@56
PS61, Line 56: logr
> "logr" a little short  and not clear.  use 'rocksdb' directly?
Now there are block manage types named "file" and "log", where "file" type indicate the FileBlockManager which maps each block to its own file on disk, and "log" type indicate the LogBlockManagerNativeMeta which combine tens of thousands of blocks to a shared data file, the data file is a log-backed file (i.e. sequentially allocated file).

The newly introduced type (class LogBlockManagerRdbMeta) is the same to LogBlockManagerNativeMeta when write the data file, the difference between them are the metadata part, while the former store metadata in RocksDB and the latter store metadata in a native file, both of them inherit from class LogBlockManager.

So I keep the "log" part and append "r" to indicate rocksdb. If using "rocksdb", users may misunderstand this is to store all data (both data and metadata) in rocksdb.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h
File src/kudu/fs/dir_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@49
PS61, Line 49: tus.
> nit: is used only internally
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@109
PS61, Line 109: 
> Does it make sense to use the sub-classing here, creating a class that inhe
I've updated the psatch to do this.

Because the inheritance hierarchy is a bit of complex, if just adding a sub-class, say RdbDir, it would be more complex, the graph show the inheritance hierarchy:

 Dir <-------- DataDir
  ^               ^
  |               |
 RdbDir <---  RdbDataDir

And this will make the RdbDataDir constructor hard to implement [1].
> Since B, C inherit A virtually, A must be constructed in each child

Additionally, there are unique pointers in the constructor's parameter list.
So I made a small refactor before this (merge that if necessary), remove the meaningless DataDir, see [2].

After [1], the inheritance hierarchy would be:

 Dir <--- RdbDir

1. https://en.wikipedia.org/wiki/Virtual_inheritance
2. https://gerrit.cloudera.org/#/c/20833/


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@121
PS61, Line 121: ures s
> nit: returns
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@199
PS61, Line 199: 
> nit: fields
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@201
PS61, Line 201: 
> nit: .
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.h@308
PS61, Line 308:     return failed_dirs_.size() == dirs_.size();
              :   }
              : 
              :   // Return a list of the canonicalized root directory names.
> If this is test-only, is it feasible to move this into the protected/privat
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc
File src/kudu/fs/dir_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/dir_manager.cc@134
PS61, Line 134:   return;
> It seems DCHECK_STREQ() is used for the same invariant check on line 187.  
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h
File src/kudu/fs/fs_report.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/fs_report.h@291
PS61, Line 291: corrupted_rdb_record_check
> This field seems no used. So add a TODO?
It is used actually, garthered in [1] and repaired in [2].

1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#2054
2. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/log_block_manager.cc#4031


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc
File src/kudu/fs/log_block_manager-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager-test-util.cc@626
PS61, Line 626: auto*
> Why CHECK() here and DCHECK() elsewhere?  Is it possible to switch to DCHEC
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@420
PS61, Line 420: Some initialize work
> Could you be more specific what exactly is supposed to be initialized by th
It depends on the sub-class implementation, LogBlockManagerNativeMeta::InitDataDir() is noop, LogBlockManagerRdbMeta::InitDataDir() needs to initialize something, I've added some comments to the latter.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: in backgro
> in background
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@599
PS61, Line 599: The data in RocksDB will be compacted
> BTW, what happens if a tablet server shuts down while the RocksDB backgroun
When the tserver shut down normally, the RocksDB object will be destoryed in gracefully [1][2] (I've updated the code to call db_->Close() to expose the error messages if any). If the tserver crashed unexpectedly, the RocksDB internal threads (includes the compaction/flush threads) will also exit.
If the server crashed unexpectedly, it will not cause inconsistency. Simply speaking, RocksDB has a mechanism of "VersionSet" (Similar to Kudu's tablet-meta to keep rowsets and block-ids consistency), which maintains a set of data in different "versions", to ensure that different versions of data are correctly linked and referenced, even during compaction and deletion processes. Data files before and after comptation are in different versions, so as long as the old version is valid, the data files are exist and the data is consistency.

1. https://gerrit.cloudera.org/c/18569/61/src/kudu/fs/dir_manager.cc#205
2. https://github.com/facebook/rocksdb/wiki/Basic-Operations#closing-a-database


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.h@657
PS61, Line 657: 
> failing to repair an inconsistency
In fact, I copied this comment from the super class, I will leave the "fatal" word here for clarification.

There are 4 type of inconsistencies in Kudu fs layer, see [1]. The difference is whether they are "fatal"
- For the non-fatal inconsistencies, the repair procedure may fail, but just log it and will not return non-OK to the caller, the server will still run happily
- If the fatal inconsistencies repair failed, non-OK will be returned from DoRepair() to the caller, the Kudu process will exit in this case.

1. https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_report.h#L187


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@1987
PS61, Line 1987: t auto kEncryptionHea
> nit: why is this name for the variable that's not a static constant?
It's copied from LogBlockContainerNativeMeta::CheckContainerFiles().

I've updated to use static constants.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@2093
PS61, Line 2093:     // Construct key.
> this statement happens several times in this file, may be a simple function
Good idea!

I've updated all related code, including tests.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/fs/log_block_manager.cc@4035
PS61, Line 4035: ast<RdbDir*>(dir
> Should this key be      id + "." + block_id ?
Sorry it's the bad naming make you misunderstand, the "block_id" should be "rocksdb_key" which is pushed in line 2054. It is the key in rocksdb which is in the form of "<container_id>.<block_id>" already.

I've updated the name.


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: will fail start
> will fail starting
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@763
PS61, Line 763: se of 
> of the failure
Done


http://gerrit.cloudera.org:8080/#/c/18569/61/src/kudu/tserver/tablet_server-test.cc@764
PS61, Line 764: t it 
> case
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 62
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 26 Dec 2023 03:03:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#61).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are contructed the same to the above.
f. Remove blocks from a container
   Contruct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,610 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/61
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 61
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#71).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,783 insertions(+), 196 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/71
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 71
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#73).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
32 files changed, 1,782 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/73
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#25).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch introduce RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds RocksDB as a thirdparty lib
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,145 insertions(+), 586 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/25
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 25
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#11).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/file_cache.cc
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
40 files changed, 2,159 insertions(+), 454 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 11
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#26).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch introduce RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
1. Make LogBlockManager as a super class
2. The former LBM that stores metadata in a append only
   file, is separeted from LBM and specified a new name
   LogfBlockManager. Its behavior has no change.
3. Introduce a new class LogrBlockManager that stores
   metadata in RocksDB, the main idea:
   a. Create container
      Data file is created as before, metadata is stored
      in keys prefixed by the container's id, append the
      block id, e.g. <container_id>.<block_id>. Make sure
      there is no such keys in RocksDB before this
      container created.
   b. Open container
      Make sure the data file is healthy.
   c. Deconstruct container
      If the container is dead (full and no live blocks),
      remove the data file, and clean up keys prefixed by
      the container's id.
   d. Load container (by ProcessRecords())
      Iterate the RocksDB in the key range
      [<container_id>, <next_container_id>), only live
      block records will be populated, we can use them
      as before.
   e. Create blocks in a container
      Put() serialized BlockRecordPB records into RocksDB
      in batch, keys are in form of
      '<container_id>.<block_id>' as mentioned above.
   f. Remove blocks from a container
      Contruct the keys by container's id and block's
      id, Delete() them from RocksDB in batch.

4. Some refactors, such as create and delete blocks in
   batch to reduce lock consult times.

This patch contains the following changes:
- Adds RocksDB as a thirdparty lib
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
42 files changed, 2,144 insertions(+), 586 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/26
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 26
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#6).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M CMakeLists.txt
A cmake_modules/FindRocksdb.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/server/CMakeLists.txt
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/oid_generator-test.cc
M src/kudu/util/oid_generator.cc
M src/kudu/util/oid_generator.h
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
36 files changed, 2,143 insertions(+), 499 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 47:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18569/47//COMMIT_MSG
Commit Message:

PS47: 
Before posting many comments on the code of this patch itself, I think it's important to rule out one major point first.

I looked at this patch briefly in the past, posting some feedback, but it seems I was missing one major issue with RocksDB: its dual GPLv2 and Apache 2.0 license that the RocksDB comes with: https://github.com/facebook/rocksdb/#license.

Even if the project's license states that one _may_ select one of the two when using RocksDB, I think it's better to be safe than sorry [1] and assume the worst, i.e. assume that RocksDB is actually comes just under GPLv2.  Even if we decide to distribute the RocksDB as a part of Kudu (librocksdb) under Apache 2.0, I don't have enough expertise if licensing and patent issues to add my binding +2 for having Apache Kudu coming with librocksdb linked into kudu-master and kudu-tserver binaries because I don't understand potential issues that might arise out of such a dual-licensing approach.

I think that the safest way is to go down the route similar to what has been done in the scope of https://issues.apache.org/jira/browse/KUDU-2990 (memkind/libnuma) and don't link rocksdb library into Kudu code, but rather use dlopen to open the library instead.

Yingchun, what do you think of doing something similar to what's been done w.r.t. using dlopen() instead of direct linkage?  You might take a look at https://github.com/apache/kudu/commit/ba908efa15774bb24b5bfa4ea88915161d1100d1 as a reference.

Thanks a lot!

[1] https://www.apache.org/legal/resolved.html#category-x



-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 47
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Comment-Date: Wed, 23 Aug 2023 19:26:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 61: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 61
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Nov 2023 07:40:28 +0000
Gerrit-HasComments: No

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#51).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block created
in LBM, and the item will be Delete()d from RocksDB when
the block removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_test_util.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
31 files changed, 1,531 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/51
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 51
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#55).

Change subject: WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

WIP KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since LogBlockContainer store block records sequentially in
metadata file, the live blocks maybe in a very low ratio,
and it cause disk space amplification and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new
item will be Put() into RocksDB when a new block is created
in LBM, and the item will be Delete() from RocksDB when
the block is removed from LBM. Data in RocksDB can be
maintained in RocksDB itself, i.e. deleted items will be
GCed so doesn't need rewriting as how we do it in current
LBM.

The implemention also reuse most logic of LBM, the main
difference is store Block records metadata in RocksDB.
Introduce a new class LogrBlockManager that stores
metadata in RocksDB, the main idea:
a. Create container
   Data file is created as before, metadata is stored
   in keys prefixed by the container's id, append the
   block id, e.g. <container_id>.<block_id>. Make sure
   there is no such keys in RocksDB before this
   container created.
b. Open container
   Make sure the data file is healthy.
c. Deconstruct container
   If the container is dead (full and no live blocks),
   remove the data file, and clean up keys prefixed by
   the container's id.
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), only live
   block records will be populated, we can use them
   as before.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB
   in batch, keys are in form of
   '<container_id>.<block_id>' as mentioned above.
f. Remove blocks from a container
   Contruct the keys by container's id and block's
   id, Delete() them from RocksDB in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is also specified by flag
  '--block_manager'.
- block_manager-test supports to test LogrBlockManager
- block_manager-stress-test supports to test LogrBlockManager
- log_block_manager-test supports to test LogrBlockManager
- tablet_server-test supports to test LogrBlockManager
- dense_node-itest supports to test LogrBlockManager
- kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as
before, we can introduce more tools to convert data between
the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,591 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/55
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 55
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#65).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,661 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/65
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 65
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#67).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,682 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/67
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 67
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 73:

Thank you very much for the contribution!


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:29:21 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................


Patch Set 73: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 73
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:28:07 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18569

to look at the new patch set (#70).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata
......................................................................

KUDU-3371 [fs] Use RocksDB to store LBM metadata

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long time bootstrap consumption.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main different with
LogBlockContainerNativeMeta is LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than a
native file, the main implementation of interfaces from
the base clase including:
a. Create container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as "<container_id>.<block_id>", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   load containers when bootstrap.
c. Destroy container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "<container_id>".
d. Load container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [<container_id>, <next_container_id>), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it use RocksDB
  to store LBM metadata, it is specified by flag
  '--block_manager'.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it
shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
---
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
31 files changed, 1,789 insertions(+), 193 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/70
-- 
To view, visit http://gerrit.cloudera.org:8080/18569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f
Gerrit-Change-Number: 18569
Gerrit-PatchSet: 70
Gerrit-Owner: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>