You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/24 01:50:18 UTC

[kudu-CR] cfile set: reduce memory usage of reader map

Hello Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6466

to review the following change.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................

cfile_set: reduce memory usage of reader map

We previously used an unordered_map<int, shared_ptr<...>>. This patch
changes to using boost's flat_map container, which is based on sorted
entries in a vector. This means we get one contiguous memory location vs
a bunch of smaller per-item allocations in the separate-chained
hashtable. Additionally switches to unique_ptr which avoids an extra
16-byte control structure per CFileReader.

It's hard to measure the exact win here, but I wrote a little test
program which compared inserting 40 entries into the two types of maps,
leaking the maps, and looking at the LeakSanitizer reports. With the old
map and shared_ptr, it cost 2832 bytes, and with the flat_map and
unique_ptr, it only took 824 bytes. So, we saved about 50 bytes per
entry.

Extrapolating this out to a server with 600k cfiles open, we'll save
around 28MB. Not huge, but worth doing.

Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
2 files changed, 7 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6466/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6466

to look at the new patch set (#2).

Change subject: cfile_set: reduce memory usage of reader map
......................................................................

cfile_set: reduce memory usage of reader map

We previously used an unordered_map<int, shared_ptr<...>>. This patch
changes to using boost's flat_map container, which is based on sorted
entries in a vector. This means we get one contiguous memory location vs
a bunch of smaller per-item allocations in the separate-chained
hashtable. Additionally switches to unique_ptr which avoids an extra
16-byte control structure per CFileReader.

It's hard to measure the exact win here, but I wrote a little test
program which compared inserting 40 entries into the two types of maps,
leaking the maps, and looking at the LeakSanitizer reports. With the old
map and shared_ptr, it cost 2832 bytes, and with the flat_map and
unique_ptr, it only took 824 bytes. So, we saved about 50 bytes per
entry.

Extrapolating this out to a server with 600k cfiles open, we'll save
around 28MB. Not huge, but worth doing.

Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
2 files changed, 9 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6466/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6466/1/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 103:     gscoped_ptr<CFileReader> reader;
Can you convert to unique_ptr in the affected call sites? Or does that spill out all over the place?


http://gerrit.cloudera.org:8080/#/c/6466/1/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

Line 121:   // Map of column ID to reader. These are lazily initialized as needed.
May want to leave a comment justifying flat_map, since it's somewhat unorthodox for us.


PS1, Line 122:  >
Nit: eliminate empty space.


-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6466/2/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

PS2, Line 122: flat_hash_map
Not "flat_map"?


-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6466

to look at the new patch set (#3).

Change subject: cfile_set: reduce memory usage of reader map
......................................................................

cfile_set: reduce memory usage of reader map

We previously used an unordered_map<int, shared_ptr<...>>. This patch
changes to using boost's flat_map container, which is based on sorted
entries in a vector. This means we get one contiguous memory location vs
a bunch of smaller per-item allocations in the separate-chained
hashtable. Additionally switches to unique_ptr which avoids an extra
16-byte control structure per CFileReader.

It's hard to measure the exact win here, but I wrote a little test
program which compared inserting 40 entries into the two types of maps,
leaking the maps, and looking at the LeakSanitizer reports. With the old
map and shared_ptr, it cost 2832 bytes, and with the flat_map and
unique_ptr, it only took 824 bytes. So, we saved about 50 bytes per
entry.

Extrapolating this out to a server with 600k cfiles open, we'll save
around 28MB. Not huge, but worth doing.

Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
2 files changed, 9 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6466/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................


cfile_set: reduce memory usage of reader map

We previously used an unordered_map<int, shared_ptr<...>>. This patch
changes to using boost's flat_map container, which is based on sorted
entries in a vector. This means we get one contiguous memory location vs
a bunch of smaller per-item allocations in the separate-chained
hashtable. Additionally switches to unique_ptr which avoids an extra
16-byte control structure per CFileReader.

It's hard to measure the exact win here, but I wrote a little test
program which compared inserting 40 entries into the two types of maps,
leaking the maps, and looking at the LeakSanitizer reports. With the old
map and shared_ptr, it cost 2832 bytes, and with the flat_map and
unique_ptr, it only took 824 bytes. So, we saved about 50 bytes per
entry.

Extrapolating this out to a server with 600k cfiles open, we'll save
around 28MB. Not huge, but worth doing.

Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Reviewed-on: http://gerrit.cloudera.org:8080/6466
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
2 files changed, 9 insertions(+), 6 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] cfile set: reduce memory usage of reader map

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: cfile_set: reduce memory usage of reader map
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6466/1/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 103:     gscoped_ptr<CFileReader> reader;
> Can you convert to unique_ptr in the affected call sites? Or does that spil
probably the latter, but I'll give it a go in a second patch on top of this one to keep the patches focused


http://gerrit.cloudera.org:8080/#/c/6466/1/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

Line 121:   // Map of column ID to reader. These are lazily initialized as needed.
> May want to leave a comment justifying flat_map, since it's somewhat unorth
Done


PS1, Line 122:  >
> Nit: eliminate empty space.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3812861a11b00e149ca47b090447bc918c465ad4
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes