You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/23 04:30:06 UTC

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Hello David Ribeiro Alves, Andrew Wong,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6460

to review the following change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................

generic_iterators: avoid holding completed iters in merge/union

Previously, the MergeIterator and UnionIterator would hold on to all of
the iterators they consume until the entire process completed. Each such
iterator may hold on to blocks in the block cache (eg dictionary blocks)
and consume a fair amount of memory.

Without this patch, I noticed that issuing a scan caused the block cache
consumed memory to slowly increase untl the scan finished, at which
point it would quickly drop. With the patch, I no longer see this
behavior.

The one feature lost here is the stringification of MergeIterator which
previously included all of the sub-iterators. We only used this in one
place in a VLOG(1) message anyway, and for typical merges across
thousands of DRS iterators, this wasn't likely to be particularly useful
anyway.

Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
2 files changed, 48 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6460/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6460/1//COMMIT_MSG
Commit Message:

PS1, Line 16: With the patch, I no longer see this
            : behavior.
> what is the behavior? memory consumption is constant?
think you forgot this comment. What is the current behavior? memory consumption drops slowly as the scan progresses?


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 325:   iters_.assign(iters.begin(), iters.end());
> iters_ here is a std::deque whereas 'iters' is a vector, so we have to use 
I see, so we're avoiding extra shared_ptr copies by not tracking all_iters, and this gives us the result we want.


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

Line 43:   // a subset of the columns in 'iters'.
> well, since C++11 we've actually moved more towards this style for argument
So we're copying the input iterators because each subiterator is a shared_ptr, and this allows each one to be cleaned up once used.

I'm not seeing where orig_iters_ stops holding the subiterators. Even if it's pushing onto iters_, don't the shared_ptrs still exist in orig_ptrs_?


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6460/1//COMMIT_MSG
Commit Message:

PS1, Line 15: ntl
nit: until


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 325:   iters_.assign(iters.begin(), iters.end());
Would it make sense to std::move(iters) here as well? Do we not do this to preserve ToString, or is memory consumption not as prominent here?


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

Line 43:   // a subset of the columns in 'iters'.
Not sure if this is the right place, but it might be nice to note the reason for not treating these like "normal"  (const reference) inputs. Might not be immediately obvious why we're breaking convention.

Something like:
"Initialization of this class transfers ownership of the schema and iterators so memory allocated for individual subiterators can be freed once used."


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6460/1//COMMIT_MSG
Commit Message:

PS1, Line 16: With the patch, I no longer see this
            : behavior.
what is the behavior? memory consumption is constant?


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 54: 
extra line


Line 65: 
nit: mind making the spacing above/below the anonymous namespace consistent?


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

PS1, Line 78: vector<IteratorStats> finished_iter_stats_by_col_;
docs


PS1, Line 124: void PopFront();
docs


PS1, Line 131: vector<IteratorStats> finished_iter_stats_by_col_;
docs


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 54: 
> extra line
Done


Line 65: 
> nit: mind making the spacing above/below the anonymous namespace consistent
Done


Line 176:     : schema_(std::move(schema)),
> warning: passing result of std::move() as a const reference argument; no mo
tried to get too fancy here. went back to old code


Line 325:   iters_.assign(iters.begin(), iters.end());
> I see, so we're avoiding extra shared_ptr copies by not tracking all_iters,
yea, the idea is that as we pop off elements from iters_, that is hopefully the last remaining reference to the iter, so it gets to destruct as it goes.


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

Line 43:   // a subset of the columns in 'iters'.
> So we're copying the input iterators because each subiterator is a shared_p
Good catch, I'd missed the critical part of the patch for MergeIterator which actually clears orig_iters_! (I did testing with UnionIterator). Fixed in next rev.


PS1, Line 78: vector<IteratorStats> finished_iter_stats_by_col_;
> docs
Done


PS1, Line 124: void PopFront();
> docs
Done


PS1, Line 131: vector<IteratorStats> finished_iter_stats_by_col_;
> docs
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6460

to look at the new patch set (#3).

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................

generic_iterators: avoid holding completed iters in merge/union

Previously, the MergeIterator and UnionIterator would hold on to all of
the iterators they consume until the entire process completed. Each such
iterator may hold on to blocks in the block cache (eg dictionary blocks)
and consume a fair amount of memory.

Without this patch, I noticed that issuing a scan caused the block cache
consumed memory to slowly increase untl the scan finished, at which
point it would quickly drop. With the patch, I no longer see this
behavior.

The one feature lost here is the stringification of MergeIterator which
previously included all of the sub-iterators. We only used this in one
place in a VLOG(1) message anyway, and for typical merges across
thousands of DRS iterators, this wasn't likely to be particularly useful
anyway.

Along the way, I did a bit of cleanup in cfile_set.cc to use unique_ptr
instead of pre-C++11 raw pointers.

Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/integration-tests/update_scan_delta_compact-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/tablet.cc
7 files changed, 106 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6460/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6460/1//COMMIT_MSG
Commit Message:

PS1, Line 16: With the patch, I no longer see this
            : behavior.
> think you forgot this comment. What is the current behavior? memory consump
oops, yea, missed the commit message comments.

Memory is now constant for UnionIterator (because we lazy-init each iter when we get to it)


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6460

to look at the new patch set (#4).

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................

generic_iterators: avoid holding completed iters in merge/union

Previously, the MergeIterator and UnionIterator would hold on to all of
the iterators they consume until the entire process completed. Each such
iterator may hold on to blocks in the block cache (eg dictionary blocks)
and consume a fair amount of memory.

Without this patch, I noticed that issuing a scan caused the block cache
consumed memory to slowly increase until the scan finished, at which
point it would quickly drop. With the patch, I no longer see this
behavior (memory usage is constant).

The one feature lost here is the stringification of MergeIterator which
previously included all of the sub-iterators. We only used this in one
place in a VLOG(1) message anyway, and for typical merges across
thousands of DRS iterators, this wasn't likely to be particularly useful
anyway.

Along the way, I did a bit of cleanup in cfile_set.cc to use unique_ptr
instead of pre-C++11 raw pointers.

Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/integration-tests/update_scan_delta_compact-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/tablet.cc
7 files changed, 106 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6460/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 325:   iters_.assign(iters.begin(), iters.end());
> Would it make sense to std::move(iters) here as well? Do we not do this to 
iters_ here is a std::deque whereas 'iters' is a vector, so we have to use the assign() to copy the elements rather than just move them


http://gerrit.cloudera.org:8080/#/c/6460/1/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

Line 43:   // a subset of the columns in 'iters'.
> Not sure if this is the right place, but it might be nice to note the reaso
well, since C++11 we've actually moved more towards this style for arguments which end up copied inside the implementation. This style is pass-by-copy, but then the copy itself is moved (note it's not an rvalue-reference here). So, the caller isn't actually transferring ownership unless they do std::move() themselves.


-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


Patch Set 2:

ruh roh, i introduced a bug. will upload a new rev in a bit

-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6460

to look at the new patch set (#2).

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................

generic_iterators: avoid holding completed iters in merge/union

Previously, the MergeIterator and UnionIterator would hold on to all of
the iterators they consume until the entire process completed. Each such
iterator may hold on to blocks in the block cache (eg dictionary blocks)
and consume a fair amount of memory.

Without this patch, I noticed that issuing a scan caused the block cache
consumed memory to slowly increase untl the scan finished, at which
point it would quickly drop. With the patch, I no longer see this
behavior.

The one feature lost here is the stringification of MergeIterator which
previously included all of the sub-iterators. We only used this in one
place in a VLOG(1) message anyway, and for typical merges across
thousands of DRS iterators, this wasn't likely to be particularly useful
anyway.

Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/tablet.cc
4 files changed, 73 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6460/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: avoid holding completed iters in merge/union

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: generic_iterators: avoid holding completed iters in merge/union
......................................................................


generic_iterators: avoid holding completed iters in merge/union

Previously, the MergeIterator and UnionIterator would hold on to all of
the iterators they consume until the entire process completed. Each such
iterator may hold on to blocks in the block cache (eg dictionary blocks)
and consume a fair amount of memory.

Without this patch, I noticed that issuing a scan caused the block cache
consumed memory to slowly increase until the scan finished, at which
point it would quickly drop. With the patch, I no longer see this
behavior (memory usage is constant).

The one feature lost here is the stringification of MergeIterator which
previously included all of the sub-iterators. We only used this in one
place in a VLOG(1) message anyway, and for typical merges across
thousands of DRS iterators, this wasn't likely to be particularly useful
anyway.

Along the way, I did a bit of cleanup in cfile_set.cc to use unique_ptr
instead of pre-C++11 raw pointers.

Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Reviewed-on: http://gerrit.cloudera.org:8080/6460
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/integration-tests/update_scan_delta_compact-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/tablet.cc
7 files changed, 106 insertions(+), 82 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8e068c256c5855efeff5c6325dbb5fbf770eb900
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>