You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/12/12 17:51:16 UTC

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14897


Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

WIP [master] KUDU-3016 don't lump together updates on all tablet reports

WIP:
  * collect feedback on the approach used
  * add a test

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
6 files changed, 315 insertions(+), 146 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc@4282
PS1, Line 4282:         std::move(actions), FLAGS_consensus_max_batch_size_bytes);
> Yes, sxactly -- the default value for --consensus_max_batch_size_bytes is j
Yeah I don't know why consensus_max_batch_size_bytes exists, or why it is as low as it is. I couldn't find anything useful about it in git history or in the original code review. Maybe ask Todd?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 13 Dec 2019 00:29:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#9).

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................

[master] KUDU-3016 flag for chunking tablet report updates

This patch introduces a flag to chunk updates on the system tablet
generated by master while processing tablet reports.  When the flag
is set to 'false' (that's the default setting), masters reject tablet
reports which would lead to oversized write requests to the system
tablet.  When the flag is set to 'true', masters chunk the updates
on the system tablets which otherwise would be oversized.  With either
setting, masters avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB is
over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
7 files changed, 497 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 15 Jan 2020 00:08:51 +0000
Gerrit-HasComments: No

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#11).

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................

[master] KUDU-3016 flag for chunking tablet report updates

This patch introduces a flag to chunk updates on the system tablet
generated by master while processing tablet reports.  When the flag
is set to 'true' (that's the default setting), masters chunk the updates
on the system tablets which otherwise would be oversized.  When the flag
is set to 'false', masters reject tablet reports which would lead
to the oversized write requests on the system tablet.  With either
setting, masters avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB is
over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
7 files changed, 501 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838
PS6, Line 838:   size_t req_size = req->ByteSizeLong();
> Yup, it seems calling ByteSizeLong() every iteration is expensive since it'
Oh I missed that; yeah something like Pop() seems like it'd be OK.



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 10 Jan 2020 22:24:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................

[master] KUDU-3016 flag for chunking tablet report updates

This patch introduces a flag to chunk updates on the system tablet
generated by master while processing tablet reports.  When the flag
is set to 'true' (that's the default setting), masters chunk the updates
on the system tablets which otherwise would be oversized.  When the flag
is set to 'false', masters reject tablet reports which would lead
to the oversized write requests on the system tablet.  With either
setting, masters avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB is
over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Reviewed-on: http://gerrit.cloudera.org:8080/14897
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
7 files changed, 501 insertions(+), 169 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 12
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#3).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 609 insertions(+), 184 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#6).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 630 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 1:

(2 comments)

Seems like a reasonable approach to me. I'd like to understand the perf impact though; could you deploy it to a cluster with a large number of replicas and gauge the impact on a beefy TSHeartbeat?

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc@4282
PS1, Line 4282:         std::move(actions), FLAGS_consensus_max_batch_size_bytes);
This seems rather low (1 MB). How does the consensus queue behave if this is value exceeds --consensus_max_batch_size_bytes? Seems like it'll read just one message from the log cache as that message reflects this Write, right?

Is the concern with using --rpc_max_message_size that there's enough bookkeeping overhead that we could very well exceed the maximum transfer size even if the underlying chunking respects it?


http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/sys_catalog.cc@508
PS1, Line 508:       generator(*this, max_chunk_size, std::move(input), &left, req);
The multi-level lambda stuff makes this tricky to read. Could you make chunked_writer a private SysCatalogTable function instead of a lambda?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 12 Dec 2019 18:05:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: 
              :     const SysCatalogTable::WriteMode write_mode =
> Right; I was asking from a performance standpoint though: if chunk N fails 
It seems that according to this code

https://github.com/apache/kudu/blob/0d7ce6906d42a17a7cfabc958e672ddc39e9ea7b/src/kudu/master/sys_catalog.cc#L830-L835

tablets which were updated by the visitor after first chunk had come through are not generating any updates on already updated rows upon processing the same tablet report again.


http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@282
PS9, Line 282: DEFINE_bool(catalog_manager_enable_chunked_tablet_reports, false,
> Why shouldn't this default to true from day one?
During offline discussion with the team I heard a concern w.r.t. the code paths when a chunked update fails in the middle.  Also, simply rejecting whole tablet report seems to be in-line with rejecting DDL requests under similar conditions.

However, after giving it another thought, I think that 100% rejecting a whole tablet report is even worse because there is no way the information in the system tablet can become consistent without updating the rpc_max_message_size, and it's hard to see the rejections are happening unless actively monitoring the newly introduced 'sys_catalog_oversized_write_requests' metric.

Let's make it enabled by default.

Thanks!


http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@4307
PS9, Line 4307:         PREDICT_FALSE(FLAGS_catalog_manager_enable_chunked_tablet_reports)
> PREDICT_FALSE seems like it'd become stale if/when the gflag's default valu
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 06:43:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#7).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 583 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#10).

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................

[master] KUDU-3016 flag for chunking tablet report updates

This patch introduces a flag to chunk updates on the system tablet
generated by master while processing tablet reports.  When the flag
is set to 'true' (that's the default setting), masters chunk the updates
on the system tablets which otherwise would be oversized.  When the flag
is set to 'false', masters reject tablet reports which would lead
to the oversized write requests on the system tablet.  With either
setting, masters avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB is
over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
7 files changed, 501 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#5).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 635 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc@533
PS8, Line 533: 
> Not used?
Done


http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc@933
PS8, Line 933: 
> generated
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 05:36:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14897/2//COMMIT_MSG@9
PS2, Line 9: WIP: I haven't gotten a good repro scenario for conditions described
           :      in KUDU-3016.  It seems the assumption of receiving a huge
           :      full tablet report was not the case or there should be many more
           :      replicas per tablet server to trigger that.  The latter seems
           :      to be unlikely, so probably the huge replicated Raft message
           :      from leader master to followers was triggered by something else,
           :      like an DDL operation altering too many tables at once..
Or maybe making an enormous range partition alteration to one table? Like, adding a thousand partitions in one DDL or something astronomical like that?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 30 Dec 2019 18:48:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 9:

(2 comments)

There's still an outstanding question from PS6.

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@282
PS9, Line 282: DEFINE_bool(catalog_manager_enable_chunked_tablet_reports, false,
Why shouldn't this default to true from day one?


http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@4307
PS9, Line 4307:         PREDICT_FALSE(FLAGS_catalog_manager_enable_chunked_tablet_reports)
PREDICT_FALSE seems like it'd become stale if/when the gflag's default value changes.



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 05:47:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@282
PS9, Line 282: DEFINE_bool(catalog_manager_enable_chunked_tablet_reports, false,
> Looks like it's still false by default as of PS10.
Whoops, good catch!

Fixed.



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 20:02:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc@4282
PS1, Line 4282:         std::move(actions), FLAGS_consensus_max_batch_size_bytes);
> Yeah I don't know why consensus_max_batch_size_bytes exists, or why it is a
Asked Todd for more context on this, and that's the summary:

* The --rpc_max_message_size helps at different level when some bit is flipped or payload for RPC has become big for some other reason

* Sending too big Raft batches doesn't make much sense when memory pressure hits -- to much of wasted network bandwidth in that case

* Larger size of Raft batches consume more buffer/memory to prepare and process transactions on both server and client sides

* In general, a RPC might include some other extra cruft, so a separate control over the size of Raft transactions is needed anyways



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 16 Dec 2019 19:10:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838
PS6, Line 838:   size_t req_size = req->ByteSizeLong();
> Could we just recompute this after adding each tablet instead of all that e
Yup, it seems calling ByteSizeLong() every iteration is expensive since it's O(n^2) computational complexity w.r.t. size of the packet.

This approach already computes deltas of the size change, and that's fast and the sizes are computed anyways while performing actual memcopy() in Add() implementation.

I think I'll switch to an approach I outlined in https://gerrit.cloudera.org/#/c/14897/4/src/kudu/common/row_operations.h@56



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 10 Jan 2020 20:16:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305:  release the locks.
              :   tablets_lock.Commit();
> These are UPDATE statements, so the chunk 1 of 2 in the retry report will r
Right; I was asking from a performance standpoint though: if chunk N fails and all chunks are retried in the subsequent report, does the writing of chunks 0..N-1 no-op?

I know there's code in SysCatalog to try and recognize that case and short-circuit; I just wasn't sure whether it applied here too.


http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc@533
PS8, Line 533:   WriteResponsePB resp;
Not used?


http://gerrit.cloudera.org:8080/#/c/14897/8/src/kudu/master/sys_catalog.cc@933
PS8, Line 933: genereated
generated



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 01:02:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 6:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@105
PS6, Line 105: bool RowOperationsPBEncoder::empty() const {
             :   return pb_->mutable_rows()->empty();
             : }
> Can be inlined in the header.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@192
PS6, Line 192: rows_size_delta
> Nit: 'row_size_delta'?
The field in PB is called 'rows', so I think this name is more appropriate.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@214
PS6, Line 214:   if (isset_bitmap_size) {
             :     *isset_bitmap_size = isset_size;
             :   }
             :   if (isnull_bitmap_size) {
             :     *isnull_bitmap_size = isnull_size;
             :   }
> Do we need these to be nullable or can we assume (and DCHECK) that they're 
Indeed.  Done!


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@596
PS6, Line 596: Kudu master
> Nit: can elide this.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@599
PS6, Line 599: result
> resulting
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@604
PS6, Line 604: class MasterRpcSizeLimitStressTest : public KuduTest {
> Would the ExternalMiniClusterITestBase fixture allow you to shorten Prepare
I thought about that, but it unfortunately it doesn't make much sense since ExternalMiniClusterITestBase assumes there is only a single master in the cluster, otherwise methods such as StartClusterWithOpts fail.  Also, I don't need cluster inspector.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@611
PS6, Line 611:   void TearDown() override {
             :     if (cluster_) {
             :       cluster_->Shutdown();
             :     }
             :     client_.reset();
             :     KuduTest::TearDown();
             :   }
> Doesn't all this happen naturally?
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@624
PS6, Line 624: buidls
> builds
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@635
PS6, Line 635:       // Set custom
> Finish writing this?
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@682
PS6, Line 682:         .wait(true)
> This is the default and can be omitted.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@697
PS6, Line 697: TEST_F(MasterRpcSizeLimitStressTest, TabletReports) {
> I don't think this test makes sense here. Stress tests (i.e. *-stress-test)
The reason it was here because there were many iterations for this test.  I started with a pretty exhaustive, long running test that created huge number of replicas to verify this condition happens in larger clusters (i.e. I kept the default limit on the RPC size).

With current distilled scenario, we should move it elsewhere, indeed.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@709
PS6, Line 709: re-electing 
> re-elect
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@712
PS6, Line 712:   for (auto idx = 0; idx < kNumTabletServers; ++idx) {
> Could we do one loop to pause all tservers, then sleep, then another loop t
That wouldn't be effective in terms of achieving the desired result.  Current approach (1) tends to distribute all leaders replicas between two, not three servers (2) all tablets have their Raft config updated because of new term.  With (1) and (2), the current approach seems to produce more updates per tablet server than the proposed alternative.

I added a comment about that.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: transitional
> transient
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: next successfully processed
              :     // tablet report from the tablet server will fix the partial update.
> This raises an interesting question: if chunk 2 of 2 fails and the full tab
These are UPDATE statements, so the chunk 1 of 2 in the retry report will rewrite already written data.  The crucial point is that eventually there will be a consistent snapshot of tablet replicas states in the system catalog tablet.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838
PS6, Line 838:   size_t req_size = req->ByteSizeLong();
> Oh I missed that; yeah something like Pop() seems like it'd be OK.
Thank you for the feedback!


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@850
PS6, Line 850:         do_encode = false;
> Could we update excess_tablets and break here? Then maybe avoid needing do_
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@937
PS6, Line 937:     Generator generator,
> warning: the parameter 'generator' is copied for each invocation but only u
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@942
PS6, Line 942: decltype(tablets_info)
> This seems unnecessarily magical; why not just write out 'vector<scoped_ref
To me decltype() is simpler to comprehend (there's just input and excess) in the code below and it's shorter to write.

If you feel strongly against decltype() here, please let me know: I'll update this.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@949
PS6, Line 949: should
> Nit: will ?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 11 Jan 2020 03:38:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#8).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 583 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14897/2//COMMIT_MSG@9
PS2, Line 9: WIP: I haven't gotten a good repro scenario for conditions described
           :      in KUDU-3016.  It seems the assumption of receiving a huge
           :      full tablet report was not the case or there should be many more
           :      replicas per tablet server to trigger that.  The latter seems
           :      to be unlikely, so probably the huge replicated Raft message
           :      from leader master to followers was triggered by something else,
           :      like an DDL operation altering too many tables at once..
> Or maybe making an enormous range partition alteration to one table? Like, 
Yup, it's a good idea.  Another thing I'm looking at is receiving the same request multiple times (e.g., in case of client retrying a DDL request that contains multiple operations or alike).



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 07 Jan 2020 01:53:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 1:

(1 comment)

> (2 comments)
 > 
 > Seems like a reasonable approach to me. I'd like to understand the
 > perf impact though; could you deploy it to a cluster with a large
 > number of replicas and gauge the impact on a beefy TSHeartbeat?

Thank you for the feedback!  I'll work on a couple of tests to cover this functionality and to gauge the impact of this change.

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/1/src/kudu/master/catalog_manager.cc@4282
PS1, Line 4282:         std::move(actions), FLAGS_consensus_max_batch_size_bytes);
> This seems rather low (1 MB). How does the consensus queue behave if this i
Yes, sxactly -- the default value for --consensus_max_batch_size_bytes is just 1MB and it seems be a bit low.  However this is what used in the path of ConsensusUpdate(), so I tried to stay close to that here as well.  Yep, and a single ConsensusUpdate message might be arbitrarily large if that's the only message pushed to followers.

And yes: all these estimates are fuzzy, so suing --rpc_max_message_size doesn't seem save enough.  Maybe, using (rpc_max_message_size / 2) or (3 * rpc_max_message_size / 4) would be a safe threshold?  From the other side, why do we use  --consensus_max_batch_size_bytes at all then?

Is it a good time to revise the --consensus_max_batch_size_bytes vs --rpc_max_message_size approach even for the consensus messages itself?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 13 Dec 2019 00:24:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/4/src/kudu/common/row_operations.h
File src/kudu/common/row_operations.h:

http://gerrit.cloudera.org:8080/#/c/14897/4/src/kudu/common/row_operations.h@56
PS4, Line 56: Estimate
An alternative would be having sort of Pop() method that removes the last actually added row.

Probably, if targeting for success most of the times (i.e. not going over a limit on the size of the result protobuf), the alternative approach is better if thinking about the performance.  From the other side, adding a big row might involve costly memory re-allocations and memcpy() which will be wasted if removing the last added element.

What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 09 Jan 2020 21:04:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 flag for chunking tablet report updates

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 flag for chunking tablet report updates
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/9/src/kudu/master/catalog_manager.cc@282
PS9, Line 282: DEFINE_bool(catalog_manager_enable_chunked_tablet_reports, false,
> During offline discussion with the team I heard a concern w.r.t. the code p
Looks like it's still false by default as of PS10.



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 14 Jan 2020 18:40:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#2).

Change subject: WIP [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

WIP [master] KUDU-3016 don't lump together updates on all tablet reports

WIP: I haven't gotten a good repro scenario for conditions described
     in KUDU-3016.  It seems the assumption of receiving a huge
     full tablet report was not the case or there should be many more
     replicas per tablet server to trigger that.  The latter seems
     to be unlikely, so probably the huge replicated Raft message
     from leader master to followers was triggered by something else,
     like an DDL operation altering too many tables at once..

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 479 insertions(+), 147 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................


Patch Set 6:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@105
PS6, Line 105: bool RowOperationsPBEncoder::empty() const {
             :   return pb_->mutable_rows()->empty();
             : }
Can be inlined in the header.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@192
PS6, Line 192: rows_size_delta
Nit: 'row_size_delta'?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@214
PS6, Line 214:   if (isset_bitmap_size) {
             :     *isset_bitmap_size = isset_size;
             :   }
             :   if (isnull_bitmap_size) {
             :     *isnull_bitmap_size = isnull_size;
             :   }
Do we need these to be nullable or can we assume (and DCHECK) that they're always provided?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@596
PS6, Line 596: Kudu master
Nit: can elide this.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@599
PS6, Line 599: result
resulting


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@604
PS6, Line 604: class MasterRpcSizeLimitStressTest : public KuduTest {
Would the ExternalMiniClusterITestBase fixture allow you to shorten Prepare()?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@611
PS6, Line 611:   void TearDown() override {
             :     if (cluster_) {
             :       cluster_->Shutdown();
             :     }
             :     client_.reset();
             :     KuduTest::TearDown();
             :   }
Doesn't all this happen naturally?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@624
PS6, Line 624: buidls
builds


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@635
PS6, Line 635:       // Set custom
Finish writing this?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@682
PS6, Line 682:         .wait(true)
This is the default and can be omitted.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@697
PS6, Line 697: TEST_F(MasterRpcSizeLimitStressTest, TabletReports) {
I don't think this test makes sense here. Stress tests (i.e. *-stress-test) are intended to be black box tests that run for a fixed amount of time (usually pretty long) and place stress on various components; here you're testing correctness and a particular bug fix.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@709
PS6, Line 709: re-electing 
re-elect


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@712
PS6, Line 712:   for (auto idx = 0; idx < kNumTabletServers; ++idx) {
Could we do one loop to pause all tservers, then sleep, then another loop to resume all of them? That'd be net less sleeping so the test would run faster.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: transitional
transient


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: next successfully processed
              :     // tablet report from the tablet server will fix the partial update.
This raises an interesting question: if chunk 2 of 2 fails and the full tablet report is retried, will chunk 1 of 2 no-op the second time around? Or will it be rewritten?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838
PS6, Line 838:   size_t req_size = req->ByteSizeLong();
Could we just recompute this after adding each tablet instead of all that estimation logic? It's probably more expensive but it's a lot simpler, and DDLs aren't really a hot path. Unless you think N calls to ByteSizeLong() on a huge tablet report is too expensive...

I also wonder whether we can use the previous row's size as a proxy for the next row's size, thus avoiding the recomputation of both Estimate() and Add(). How different can they be?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@850
PS6, Line 850:         do_encode = false;
Could we update excess_tablets and break here? Then maybe avoid needing do_encode altogether?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@942
PS6, Line 942: decltype(tablets_info)
This seems unnecessarily magical; why not just write out 'vector<scoped_refptr<TabletInfo>>'?


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@949
PS6, Line 949: should
Nit: will ?



-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 10 Jan 2020 10:29:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] KUDU-3016 don't lump together updates on all tablet reports

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14897

to look at the new patch set (#4).

Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports
......................................................................

[master] KUDU-3016 don't lump together updates on all tablet reports

This patch introduces chunked write of tablet reports into the
system table to avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB
gets over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
---
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/rpc/serialization.cc
M src/kudu/rpc/transfer.cc
9 files changed, 628 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/14897/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)