You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yingchun Lai (Code Review)" <ge...@cloudera.org> on 2019/10/15 12:20:07 UTC

[kudu-CR] [tablet] TabletMetadata::CollectBlockIds use list instead of vector

Yingchun Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14445


Change subject: [tablet] TabletMetadata::CollectBlockIds use list instead of vector
......................................................................

[tablet] TabletMetadata::CollectBlockIds use list instead of vector

Container size may reallocate and elements will be copied several times
if use std::vector in TabletMetadata::CollectBlockIds, std::list is more
efficient in this case.

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
9 files changed, 57 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 2:

> Patch Set 1:
> 
> (1 comment)

Thanks adar, your suggestions are very helpful!
I did some simple benchmarks, compare std::deque with std::vector and 
std::list, which shows that std::vector provides a linear time cost
when block count per RowSet increasing, and a exponential time cost
when RowSet count increasing. And std::list provides a linear time
cost when RowSet count and block count per RowSet increasing, the same with std::deque, and deque cost less time than list, about half time saved.
Details as follow:
(The first column is test_row_set_count/test_block_count_per_rs,the other columns show 10 times total tablet_meta_->CollectBlockIds() time cost, in millisecond)
           vector        list   deque                                                                                                                                                                       
1000/1000  3464          687    334
2000/1000  15238         1390   647
4000/1000  75139         3057   1392
8000/1000  328808        6593   2805
4000/2000  159517        6488   2965
4000/4000  348471        11967  5513
4000/8000  -(too long)   23706  11704


-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 03:35:13 +0000
Gerrit-HasComments: No

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14445/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/4//COMMIT_MSG@14
PS4, Line 14: linear time
            : cost
Linear time cost for what operations?

I think it's worth having the benchmark stats table in the commit message with the timings and the description of the operations that you posted in one of the response comments.


http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata-test.cc@219
PS4, Line 219:       LOG(INFO) << "block_ids size: " << block_ids.size();
It would be nice to move this out of the timing scope.


http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc@195
PS4, Line 195: deque<BlockId> block_ids;
Did you try to see what if instead of using deques a vector of pre-allocated size is used instead here (i.e. std::vector::reserve(), where the size to reserve is reported by every element from rowsets_)?


http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc@198
PS4, Line 198: block_ids.begin()
I think this was one of the culprits of the bad performance with std::vector: it it's not fit for inserting into the beginning, because with every insert it moves the elements around.

What are the requirements here?  Why to insert into the beginning of the container at all?  Would it work if using block_ids.end() here instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 17:43:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14445/3//COMMIT_MSG
Commit Message:

PS3: 
> Could you summarize your perf results into the commit message?
Done


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@87
PS3, Line 87: 
            : // Test that loading & storing the superblock results in an equivalent file.
            : TEST_F(TestTabletMetadata, TestLoadFromS
> Couldn't you just reference tablet()->tablet_metadata() to get the already-
Done


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@100
PS3, Line 100:   ASSERT_OK(writer_->Insert(*row));
> Nit: ASSERT_OK.
Done


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@102
PS3, Line 102: 
> Add a using for this.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 15:18:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14445/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/6//COMMIT_MSG@23
PS6, Line 23: the other columns show 10 times total TabletMetadata::CollectBlockIds()
> For future readers, perhaps you could also include the list and deque resul
Done


http://gerrit.cloudera.org:8080/#/c/14445/6/src/kudu/fs/block_id.h
File src/kudu/fs/block_id.h:

http://gerrit.cloudera.org:8080/#/c/14445/6/src/kudu/fs/block_id.h@23
PS6, Line 23: #include <iosfwd>
> Probably don't need this anymore.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 06:12:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................

[tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Container size may reallocate and elements will be moved or copied
several times if using std::vector and inserting elements at the
beginning in TabletMetadata::CollectBlockIds, we'd better to insert
elements at the end.

We did some simple benchmarks for TabletMetadata::CollectBlockIds()
operation in TestTabletMetadata.BenchmarkCollectBlockIds, compared
with old version, result shows that inserting at the end provides a
linear time cost when block count per RowSet or RowSet count
increasing.
And also, we ran benchmarks to compare std::vector with std::list
and std::deque, both of them have worse efficiency.
Result details as follow:
(The first column is FLAGS_test_row_set_count/FLAGS_test_block_count_per_rs,
the other columns show 10 times total TabletMetadata::CollectBlockIds()
time cost, in millisecond.)
           vector          list    deque   vector
           (head insert)                   (rear insert)
1000/1000  3464            687     334     321
2000/1000  15238           1390    647     613
4000/1000  75139           3057    1392    1212
8000/1000  328808          6593    2805    2736
4000/2000  159517          6488    2965    2462
4000/4000  348471          11967   5513    5141
4000/8000  -(too long)     23706   11704   10806

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Reviewed-on: http://gerrit.cloudera.org:8080/14445
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/fs/block_id.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
11 files changed, 80 insertions(+), 38 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14445/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/4//COMMIT_MSG@14
PS4, Line 14: ectBlockIds()
            : oper
> Linear time cost for what operations?
Done


http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc@195
PS4, Line 195:   BlockIdContainer rowset
> Did you try to see what if instead of using deques a vector of pre-allocate
We have to cache all rowsets_'s GetAllBlocks() return values to calculate total block id count to reserve 'block_ids', after some tests, it's worse than not reserve.


http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata.cc@198
PS4, Line 198: rowset_block_ids.
> I think this was one of the culprits of the bad performance with std::vecto
I searched all of CollectBlockIds()'s caller, didn't find any requirement of inserting at the beginning.
Now I try to modify to insert at the ending, and benchmark show that it's as good as deque, or even better!
I think it's because BlockId just has an uint64_t member, it's easy and fast to allocate/reallocate a large continuous memory space to hold BlockId items.
Now I will revert to use vector and just insert at the ending!



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 04:12:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#7).

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................

[tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Container size may reallocate and elements will be moved or copied
several times if using std::vector and inserting elements at the
beginning in TabletMetadata::CollectBlockIds, we'd better to insert
elements at the end.

We did some simple benchmarks for TabletMetadata::CollectBlockIds()
operation in TestTabletMetadata.BenchmarkCollectBlockIds, compared
with old version, result shows that inserting at the end provides a
linear time cost when block count per RowSet or RowSet count
increasing.
And also, we ran benchmarks to compare std::vector with std::list
and std::deque, both of them have worse efficiency.
Result details as follow:
(The first column is FLAGS_test_row_set_count/FLAGS_test_block_count_per_rs,
the other columns show 10 times total TabletMetadata::CollectBlockIds()
time cost, in millisecond.)
           vector          list    deque   vector
           (head insert)                   (rear insert)
1000/1000  3464            687     334     321
2000/1000  15238           1390    647     613
4000/1000  75139           3057    1392    1212
8000/1000  328808          6593    2805    2736
4000/2000  159517          6488    2965    2462
4000/4000  348471          11967   5513    5141
4000/8000  -(too long)     23706   11704   10806

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/fs/block_id.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
11 files changed, 80 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#6).

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................

[tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Container size may reallocate and elements will be moved or copied
several times if using std::vector and inserting elements at the
beginning in TabletMetadata::CollectBlockIds, we'd better to insert
elements at the end.

We did some simple benchmarks for TabletMetadata::CollectBlockIds()
operation in TestTabletMetadata.BenchmarkCollectBlockIds, compared
with old version, result shows that inserting at the end provides a
linear time cost when block count per RowSet or RowSet count
increasing.
Result details as follow:
(The first column is FLAGS_test_row_set_count/FLAGS_test_block_count_per_rs,
the other columns show 10 times total TabletMetadata::CollectBlockIds()
time cost, in millisecond.)
           old           new
1000/1000  3464          321
2000/1000  15238         613
4000/1000  75139         1212
8000/1000  328808        2736
4000/2000  159517        2462
4000/4000  348471        5141
4000/8000  -(too long)   10806

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/fs/block_id.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
11 files changed, 81 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 4:

> Might be nice to typedef a BlockIdContainer (as std::deque<BlockId>) in block_id.h and use it universally. Then if we ever change our minds we only have to update one place.

I think you missed this suggestion.


-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 19:10:26 +0000
Gerrit-HasComments: No

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14445/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/1//COMMIT_MSG@9
PS1, Line 9: Container size may reallocate and elements will be copied several
           : times if use std::vector in TabletMetadata::CollectBlockIds and
           : associate functions, st
> Yeah, but there are downsides to using std::list too:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 03:40:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#5).

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................

[tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Container size may reallocate and elements will be moved or copied
several times if using std::vector and inserting elements at the
beginning in TabletMetadata::CollectBlockIds, we'd better to insert
elements at the end.

We did some simple benchmarks for TabletMetadata::CollectBlockIds()
operation in TestTabletMetadata.BenchmarkCollectBlockIds, compared
with old version, result shows that inserting at the end provides a
linear time cost when block count per RowSet or RowSet count
increasing.
Result details as follow:
(The first column is FLAGS_test_row_set_count/FLAGS_test_block_count_per_rs,
the other columns show 10 times total TabletMetadata::CollectBlockIds()
time cost, in millisecond.)
           old           new
1000/1000  3464          321
2000/1000  15238         613
4000/1000  75139         1212
8000/1000  328808        2736
4000/2000  159517        2462
4000/4000  348471        5141
4000/8000  -(too long)   10806

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/fs/block_id.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
11 files changed, 87 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 5
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 3:

(4 comments)

Might be nice to typedef a BlockIdContainer (as std::deque<BlockId>) in block_id.h and use it universally. Then if we ever change our minds we only have to update one place.

http://gerrit.cloudera.org:8080/#/c/14445/3//COMMIT_MSG
Commit Message:

PS3: 
Could you summarize your perf results into the commit message?


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@87
PS3, Line 87:     TabletMetadata::Load(harness_->fs_manager(),
            :                          harness_->tablet()->tablet_id(),
            :                          &tablet_meta_);
Couldn't you just reference tablet()->tablet_metadata() to get the already-loaded TabletMetadata?


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@100
PS3, Line 100:     CHECK_OK(RowSetMetadata::CreateNew(tablet_meta_.get(), i, &meta));
Nit: ASSERT_OK.


http://gerrit.cloudera.org:8080/#/c/14445/3/src/kudu/tablet/tablet_metadata-test.cc@102
PS3, Line 102: std::
Add a using for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 06:27:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] TabletMetadata::CollectBlockIds use list instead of vector

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] TabletMetadata::CollectBlockIds use list instead of vector
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14445/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/1//COMMIT_MSG@9
PS1, Line 9: Container size may reallocate and elements will be copied several times
           : if use std::vector in TabletMetadata::CollectBlockIds, std::list is more
           : efficient in this case.
Yeah, but there are downsides to using std::list too:
- More internal heap fragmentation due to per-block allocations.
- Allocation/deallocation overhead for the addition/removal of one entry.
- Iterating the list means traversing a bunch of different heap allocations, which is bad for CPU cache locality.

I'd expect a std::deque to be a better choice. https://stackoverflow.com/questions/18449038/is-there-ever-a-reason-to-use-stdlist (and the linked questions) has some more information.

Either way, you need to demonstrate that this change yields an improvement. The trade-offs are subtle and may fluctuate depending on number of blocks in question. Is there a benchmark you could run?



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 1
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 15 Oct 2019 20:44:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 20:33:58 +0000
Gerrit-HasComments: No

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................


Patch Set 4:

(1 comment)

> Patch Set 4:
> 
> > Might be nice to typedef a BlockIdContainer (as std::deque<BlockId>) in block_id.h and use it universally. Then if we ever change our minds we only have to update one place.
> 
> I think you missed this suggestion.

Missed it indeed :(  Done.

http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14445/4/src/kudu/tablet/tablet_metadata-test.cc@219
PS4, Line 219:       LOG(INFO) << "block_ids size: " << block_ids.size();
> It would be nice to move this out of the timing scope.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 02:10:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#3).

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................

[tablet] Use std::deque instead of std::vector to collect blocks

Container size may reallocate and elements will be copied several
times if use std::vector in TabletMetadata::CollectBlockIds and
associate functions, std::deque is more efficient in this case.

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 109 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>

[kudu-CR] [tablet] Optimize TabletMetadata::CollectBlockIds efficiency

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14445 )

Change subject: [tablet] Optimize TabletMetadata::CollectBlockIds efficiency
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14445/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14445/6//COMMIT_MSG@23
PS6, Line 23:            old           new
For future readers, perhaps you could also include the list and deque results? That way the table is comprehensive and we can see exactly how each data structure fared.


http://gerrit.cloudera.org:8080/#/c/14445/6/src/kudu/fs/block_id.h
File src/kudu/fs/block_id.h:

http://gerrit.cloudera.org:8080/#/c/14445/6/src/kudu/fs/block_id.h@23
PS6, Line 23: #include <deque>
Probably don't need this anymore.



-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 04:26:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#2).

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................

[tablet] Use std::deque instead of std::vector to collect blocks

Container size may reallocate and elements will be copied several
times if use std::vector in TabletMetadata::CollectBlockIds and
associate functions, std::deque is more efficient in this case.

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 109 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tablet] Use std::deque instead of std::vector to collect blocks

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14445

to look at the new patch set (#4).

Change subject: [tablet] Use std::deque instead of std::vector to collect blocks
......................................................................

[tablet] Use std::deque instead of std::vector to collect blocks

Container size may reallocate and elements will be copied several
times if use std::vector in TabletMetadata::CollectBlockIds and
associate functions, std::deque is more efficient in this case.

We did some simple benchmarks, compared std::deque with std::vector
and std::list, which show that std::vector provides a linear time
cost when block count per RowSet increasing, and an exponential time
cost when RowSet count increasing. However, std::deque and std::list
provide a linear time cost when RowSet count or block count per RowSet
increasing, further more, std::deque use about half time of std::list.

Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 94 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14445/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ce853e35eb7dfa9f9a099e465ea8edfaa7c4aa9
Gerrit-Change-Number: 14445
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <40...@qq.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>