You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/24 19:48:24 UTC

[kudu-CR] log block manager: use sparse hash map for block map

Hello Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6471

to review the following change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................

log_block_manager: use sparse_hash_map for block map

Based on my testing, this reduces the memory usage of 1M blocks from
~24M to ~9MB. Lookup performance should be similar. Write performance
may be slightly slower but this is relatively infrequent and not on a
hot path.

I also did an allocation trace of a simple program putting 1M <int,int>
pairs into both types of maps. With unordered_map, this resulted in 1M
16-byte allocations and several larger allocations as the hashtable
bucket array grew (largest being 8MB). With the sparse_hash_map, the
allocation sizes were evenly distributed with ~55K allocations at a
bunch of sizes between 16 and 384 bytes, with no large allocations. This
should be more efficient in terms of avoiding memory fragmentation or
longer resizing pauses as blocks are added.

Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
2 files changed, 4 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/6471/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6471/1/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 31: #include <sparsehash/sparse_hash_map>
Nit: should come after gtest.


-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 4: Verified+1

Test failure was unrelated (put up a fix here: http://gerrit.cloudera.org:8080/6494

-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6471

to look at the new patch set (#3).

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................

log_block_manager: use sparse_hash_map for block map

Based on my testing, this reduces the memory usage of 1M blocks from
~24M to ~9MB. Lookup performance should be similar. Write performance
may be slightly slower but this is relatively infrequent and not on a
hot path.

I also did an allocation trace of a simple program putting 1M <int,int>
pairs into both types of maps. With unordered_map, this resulted in 1M
16-byte allocations and several larger allocations as the hashtable
bucket array grew (largest being 8MB). With the sparse_hash_map, the
allocation sizes were evenly distributed with ~55K allocations at a
bunch of sizes between 16 and 384 bytes, with no large allocations. This
should be more efficient in terms of avoiding memory fragmentation or
longer resizing pauses as blocks are added.

In order for this patch to work on gcc 4.9, I had to patch sparsehash.
gcc 4 doesn't support std::is_trivially_copy_constructible. This adds a
workaround to the sparsehash source to use boost's implementation of the
same when the std:: one is not available.

Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M thirdparty/download-thirdparty.sh
A thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch
4 files changed, 68 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/6471/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 1:

Looks like sparsehash has some issues on gcc 4.8. I put up a pull request against their github to fix.

-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


log_block_manager: use sparse_hash_map for block map

Based on my testing, this reduces the memory usage of 1M blocks from
~24M to ~9MB. Lookup performance should be similar. Write performance
may be slightly slower but this is relatively infrequent and not on a
hot path.

I also did an allocation trace of a simple program putting 1M <int,int>
pairs into both types of maps. With unordered_map, this resulted in 1M
16-byte allocations and several larger allocations as the hashtable
bucket array grew (largest being 8MB). With the sparse_hash_map, the
allocation sizes were evenly distributed with ~55K allocations at a
bunch of sizes between 16 and 384 bytes, with no large allocations. This
should be more efficient in terms of avoiding memory fragmentation or
longer resizing pauses as blocks are added.

In order for this patch to work on gcc 4.9, I had to patch sparsehash.
gcc 4 doesn't support std::is_trivially_copy_constructible. This adds a
workaround to the sparsehash source to use boost's implementation of the
same.

Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Reviewed-on: http://gerrit.cloudera.org:8080/6471
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M thirdparty/download-thirdparty.sh
A thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch
4 files changed, 65 insertions(+), 2 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 4:

(1 comment)

looks like using the GNUC macro wasn't useful because clang can still be using libstdcxx... just switching to boost entirely, since boost should have the right fallback depending on the compiler version

http://gerrit.cloudera.org:8080/#/c/6471/3/thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch
File thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch:

PS3, Line 47:  struct is_relocatable<const T> : is_relocatable<T> {};
            : -}
            : \ No newline at end of file
            : +}
            : -- 
            : 2.7.4
            : 
> Couldn't help yourself? :)
my emacs couldn't help itself


-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6471/3/thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch
File thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch:

PS3, Line 47: @@ -52,4 +62,4 @@ struct is_relocatable<std::pair<T, U>>
            :  
            :  template <class T>
            :  struct is_relocatable<const T> : is_relocatable<T> {};
            : -}
            : \ No newline at end of file
            : +}
Couldn't help yourself? :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6471

to look at the new patch set (#2).

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................

log_block_manager: use sparse_hash_map for block map

Based on my testing, this reduces the memory usage of 1M blocks from
~24M to ~9MB. Lookup performance should be similar. Write performance
may be slightly slower but this is relatively infrequent and not on a
hot path.

I also did an allocation trace of a simple program putting 1M <int,int>
pairs into both types of maps. With unordered_map, this resulted in 1M
16-byte allocations and several larger allocations as the hashtable
bucket array grew (largest being 8MB). With the sparse_hash_map, the
allocation sizes were evenly distributed with ~55K allocations at a
bunch of sizes between 16 and 384 bytes, with no large allocations. This
should be more efficient in terms of avoiding memory fragmentation or
longer resizing pauses as blocks are added.

Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
2 files changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/6471/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6471

to look at the new patch set (#4).

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................

log_block_manager: use sparse_hash_map for block map

Based on my testing, this reduces the memory usage of 1M blocks from
~24M to ~9MB. Lookup performance should be similar. Write performance
may be slightly slower but this is relatively infrequent and not on a
hot path.

I also did an allocation trace of a simple program putting 1M <int,int>
pairs into both types of maps. With unordered_map, this resulted in 1M
16-byte allocations and several larger allocations as the hashtable
bucket array grew (largest being 8MB). With the sparse_hash_map, the
allocation sizes were evenly distributed with ~55K allocations at a
bunch of sizes between 16 and 384 bytes, with no large allocations. This
should be more efficient in terms of avoiding memory fragmentation or
longer resizing pauses as blocks are added.

In order for this patch to work on gcc 4.9, I had to patch sparsehash.
gcc 4 doesn't support std::is_trivially_copy_constructible. This adds a
workaround to the sparsehash source to use boost's implementation of the
same.

Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M thirdparty/download-thirdparty.sh
A thirdparty/patches/sparsehash-0001-Add-compatibily-for-gcc-4.x-in-traits.patch
4 files changed, 65 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/6471/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] log block manager: use sparse hash map for block map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: use sparse_hash_map for block map
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6471/1/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 31: #include <sparsehash/sparse_hash_map>
> Nit: should come after gtest.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6471
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e75deba7f7fb8d4f800695f195304df603e4ce9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes