You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2018/06/07 06:13:59 UTC

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10633


Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................

KUDU-1861 add range-partitioning of loadgen tables

This patch adds the ability to generate a range-partitioned table to the
laodgen tool. The range partitioning schema is designed such that the
generated insert workload will be monotonically increasing if the number
of threads is equal to the number of tablets.

Below are illustrations of some tablet partitioning and thread
non-random insert workloads. The y-axis for both the threads and the
tablets is the keyspace, increasing going downwards.

--num_threads=2 --table_num_buckets=2

  Threads sequentially
  insert to their keyspaces
  in non-random insert mode.
     +  +---------+         ^
     |  | thread1 | tabletA |  Tablets' range partitions are
     |  |         |         |  set to match the desired total
     v  +---------+---------+  number of inserted rows for the
     |  | thread2 | tabletB |  entire workload, but leaving the
     |  |         |         |  outermost tablets unbounded.
     v  +---------+         v

If the number of tablets is not a multiple of the number of threads when
using an auto-generated range-partitioned table, we lose the guarantee
that we always write to a monotonically increasing range on each tablet.

--num_threads=2 --table_num_buckets=3
     +  +---------+         ^
     |  | thread1 | tabletA |
     |  |         +---------+
     v  +---------| tabletB |
     |  | thread2 +---------+
     |  |         | tabletC |
     v  +---------+         v

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/tool_action_perf.cc
1 file changed, 36 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#4).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 229 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1487
PS5, Line 1487: 
              :     // Keep the tab
> Actually I did mean to use `use_random` to make sure nothing break _even if
Thank you for the explanation.  Could you add those details into the comment itself?  I think it would be clearer for the reader than '... nothing funny happens ...'.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512:   args.emplace_back("--table_num_range_partitions=2");
              :   args.emplace_back("--table_num_hash_partitio
> We're not deleting the tables as we go so we can do this check. That also m
I'm not sure I understand.  My comment was about reducing

ASSERT_OK(WaitForNumTabletsOnTS(ts, expected_tablets, kTimeout, &tablets));
ASSERT_EQ(expected_tablets, tablets.size());

to

SSERT_OK(WaitForNumTabletsOnTS(ts, expected_tablets, kTimeout));

The idea is that you don't need the second assert because of the way how WaitForNumTabletsOnTS() works.


http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc@325
PS6, Line 325: static 
nit: why do you need the 'static' specifier for the function within an anonymous namespace?



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 23 Jun 2018 04:14:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#5).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 234 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Reviewed-on: http://gerrit.cloudera.org:8080/10633
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 244 insertions(+), 16 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1487
PS5, Line 1487: 
              :     // Keep the tab
> Thank you for the explanation.  Could you add those details into the commen
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512:   args = base_args;
              :   args.emplace_back("--table_num_range_partiti
> I'm not sure I understand.  My comment was about reducing
Ah my bad, I misunderstood. Done


http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc@197
PS6, Line 197: #include "kudu/util/flag_validators.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc@325
PS6, Line 325: 
> nit: why do you need the 'static' specifier for the function within an anon
Done


http://gerrit.cloudera.org:8080/#/c/10633/6/src/kudu/tools/tool_action_perf.cc@325
PS6, Line 325: 
> warning: 'ValidatePartitionFlags' is a static definition in anonymous names
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 23 Jun 2018 05:29:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#8).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 244 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#2).

Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................

KUDU-1861 add range-partitioning of loadgen tables

This patch adds the ability to generate a range-partitioned table to the
laodgen tool. The range partitioning schema is designed such that the
generated insert workload will be monotonically increasing if the number
of threads is equal to the number of tablets.

Below are illustrations of some tablet partitioning and thread
non-random insert workloads. The y-axis for both the threads and the
tablets is the keyspace, increasing going downwards.

--num_threads=2 --table_num_buckets=2

  Threads sequentially
  insert to their keyspaces
  in non-random insert mode.
     +  +---------+         ^
     |  | thread1 | tabletA |  Tablets' range partitions are
     |  |         |         |  set to match the desired total
     v  +---------+---------+  number of inserted rows for the
     |  | thread2 | tabletB |  entire workload, but leaving the
     |  |         |         |  outermost tablets unbounded.
     v  +---------+         v

If the number of tablets is not a multiple of the number of threads when
using an auto-generated range-partitioned table, we lose the guarantee
that we always write to a monotonically increasing range on each tablet.

--num_threads=2 --table_num_buckets=3
     +  +---------+         ^
     |  | thread1 | tabletA |
     |  |         +---------+
     v  +---------| tabletB |
     |  | thread2 +---------+
     |  |         | tabletC |
     v  +---------+         v

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/tool_action_perf.cc
1 file changed, 37 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

I'd like Alexey to take another look.

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc@1584
PS4, Line 1584:   ASSERT_LT(0, bloom_lookups);
How strong of a guarantee is this? Since it's a random workload, what are the odds that we end up with something sequential by chance?

If that's at all likely, might want to surround this block in an ASSERT_EVENTUALLY. But it may not be (looping the test may help here).



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 19:39:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/kudu-tool-test.cc@1460
PS3, Line 1460: TEST_F(ToolTest, TestLoadgenAutoGenTablePartitioning) {
Can we test that the number of resulting tablets is what we'd expect each time?

Also, could you add a test for a purely sequential insertion sequence that also looks at the bloom lookup metric value and tests that it's 0?


http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/tool_action_perf.cc@25
PS3, Line 25: // See below for examples of usage.
Might want to update this with an example or two of partitioning.



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 03:32:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc@1584
PS4, Line 1584:   // Since this indicator for non-sequential inserts is based on the random
> How strong of a guarantee is this? Since it's a random workload, what are t
Done, the new revision passed 1000/1000 times



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 12 Jun 2018 06:12:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................


Patch Set 2:

(6 comments)

Could you add a test for this in kudu-tool-test?

http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@10
PS2, Line 10: laodgen
loadgen


http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@11
PS2, Line 11: generated insert workload will be monotonically increasing if the number
I think you're trying to see that the value of the _primary key_ will be monotonically increasing, so as to avoid compactions, right? Could you clarify?


http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@48
PS2, Line 48: number of buckets, the number of bloom lookups was non-zero.
Can you clarify why non-zero bloom lookups mean we're not inserting perfectly sequentially?


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@253
PS2, Line 253: table_use_range_partition
Nit: maybe "table_use_range_partitions"?


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@405
PS2, Line 405: size_t gen_num
I don't think this is used anymore.


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@606
PS2, Line 606:       for (int i = 1; i < FLAGS_table_num_buckets; i++) {
So we're not going to use hash partitioning anymore if range partitions were requested? Why not both? I think it'd be useful to be able to apply one, the other, or both if requested.



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 07 Jun 2018 17:24:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 25 Jun 2018 13:53:04 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc@605
PS1, Line 605:       table_creator->set_range_partition_columns({ kKeyColumnName });
             :       for (int i = 1; i < FLAGS_table_num_buckets; i++) {
             :         unique_ptr<KuduPartialRow> split(schema.NewRow());
             :         int64_t split_val = FLAGS_seq_start + i * span_per_tablet;
             :         RETURN_NOT_OK(split->SetInt64(kKeyColumnName, split_val));
             :        
Does it work as intended in case of --table_num_buckets=1 ?



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 07 Jun 2018 06:54:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#3).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

Below are illustrations of some tablet partitioning and non-random
write workloads. The y-axis for both the threads and the tablets is the
keyspace, increasing going downwards.

--num_threads=2 --table_num_range_partitions=2

  Threads sequentially
  insert to their keyspaces
  in non-random insert mode.
     +  +---------+         ^
     |  | thread1 | tabletA |  Tablets' range partitions are
     |  |         |         |  set to match the desired total
     v  +---------+---------+  number of inserted rows for the
     |  | thread2 | tabletB |  entire workload, but leaving the
     |  |         |         |  outermost tablets unbounded.
     v  +---------+         v

If the number of tablets is not a multiple of the number of threads when
using an auto-generated range-partitioned table, we lose the guarantee
that we always write to a monotonically increasing range on each tablet.

--num_threads=2 --table_num_range_partitions=3
     +  +---------+         ^
     |  | thread1 | tabletA |
     |  |         +---------+
     v  +---------| tabletB |
     |  | thread2 +---------+
     |  |         | tabletC |
     v  +---------+         v

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero.

Note: I use the number of bloom lookups as a loose indicator of whether
writes are sequential or not. If row A is being inserted to a range of
the keyspace that has already been inserted to, the interval tree that
backs the Kudu tablet will be unable to say with certainty that row A
does or doesn't already exist, necessitating a bloom lookup. As such, if
there are bloom lookups for a tablet for a given workload, we can say
that that workload is not sequential.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 105 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 12 Jun 2018 15:15:49 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 3:

(6 comments)

Added a small test and verified the bloom lookups on a single tserver cluster again.

http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@10
PS2, Line 10: the loa
> loadgen
Done


http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@11
PS2, Line 11: the non-random write workload will insert sequentially on the primary
> I think you're trying to see that the value of the _primary key_ will be mo
Done


http://gerrit.cloudera.org:8080/#/c/10633/2//COMMIT_MSG@48
PS2, Line 48: --table_num_range_partitions if desired.
> Can you clarify why non-zero bloom lookups mean we're not inserting perfect
Done


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@253
PS2, Line 253:  "The number of range par
> Nit: maybe "table_use_range_partitions"?
Done


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@405
PS2, Line 405: 
> I don't think this is used anymore.
Done


http://gerrit.cloudera.org:8080/#/c/10633/2/src/kudu/tools/tool_action_perf.cc@606
PS2, Line 606:       const int64_t span_per_range = total_inserted_span / FLAGS_table_num_range_partitions;
> So we're not going to use hash partitioning anymore if range partitions wer
Done in the form of --table_num_range_partitions and --table_num_hash_partitions



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Fri, 08 Jun 2018 01:35:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#6).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 247 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 5:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1473
PS5, Line 1473:   ExternalMiniClusterOptions opts;
              :   opts.num_tablet_servers = 1;
              :   NO_FATALS(StartExternalMiniCluster(std::move(opts)));
nit: since std::move() is used for 'opts' which invalidates 'opts' for the rest of the scope, consider placing this code into a separate scope.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1476
PS5, Line 1476:   vector<string> base_args = {
nit: const ?


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1479
PS5, Line 1479: "--num_rows_per_thread=2048",
Why default 1000 is not good enough here?


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1480
PS5, Line 1480:     "--flush_per_n_rows=1",
Could you add a comment explaining why manual flushing per row is essential in this case?


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1481
PS5, Line 1481: "--table_num_replicas=1"
It's 1 by default, so this option can be omitted.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1482
PS5, Line 1482:     "--num_threads=3",
Why default is not good enough?  If it's not indeed, I think it's worth adding a comment about that.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1487
PS5, Line 1487:     // Let's ensure nothing funny happens inserting across the entire keyspace.
              :     "--use_random",
By my experience, funny things happen when using random :)  Maybe, you meant to not using random here?


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512: kTimeout
Is it stable enough for TSAN/ASAN builds with 10ms timeout?


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512:   ASSERT_OK(WaitForNumTabletsOnTS(ts, expected_tablets, kTimeout, &tablets));
              :   ASSERT_EQ(expected_tablets, tablets.size());
Here and below: it seems these two lines of code can be reduced to

ASSERT_OK(WaitForNumTabletsOnTS(ts, expected_tablets, kTimeout));

So, the 'tablets' variable can be dropped.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1534
PS5, Line 1534: workload result
nit: 'workloads result' or 'workload results'


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1556
PS5, Line 1556:     "--table_num_replicas=1",
              :     "--keep_auto_table=false",
              :     "--run_scan=false",
              :     "--use_random=false",
I think that can be dropped because that are the defaults values for those parameters.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1568
PS5, Line 1568:     "--num_rows_per_thread=10000",
I think you could use '--string_len=32768' in addition if you want to generate 'heavier' rows to reach the flush threshold sooner.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/tool_action_perf.cc@303
PS5, Line 303: DEFINE_int32(table_num_hash_partitions, 8,
             :              "The number of hash partitions to create when this tool creates "
             :              "a new table. Note: The total number of partitions must be "
             :              "greater than 1.");
             : DEFINE_int32(table_num_range_partitions, 1,
             :              "The number of range partitions to create when this tool creates "
             :              "a new table. A range partitioning schema will be determined to "
             :              "evenly split a sequential workload across ranges, leaving "
             :              "the outermost ranges unbounded to ensure coverage of the entire "
             :              "keyspace. Note: The total number of partitions must be greater "
             :              "than 1.");
Maybe, it's worth adding a group validator to verify that the total number of partitions is greater than 1?



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 22 Jun 2018 21:56:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/kudu-tool-test.cc@1460
PS3, Line 1460:       "bench_manual_flush"));
> Can we test that the number of resulting tablets is what we'd expect each t
Done


http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/3/src/kudu/tools/tool_action_perf.cc@25
PS3, Line 25: // See below for examples of usage.
> Might want to update this with an example or two of partitioning.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 19:22:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc@605
PS1, Line 605:       table_creator->set_range_partition_columns({ kKeyColumnName });
             :       for (int i = 1; i < FLAGS_table_num_buckets; i++) {
             :         unique_ptr<KuduPartialRow> split(schema.NewRow());
             :         int64_t split_val = FLAGS_seq_start + i * span_per_tablet;
             :         RETURN_NOT_OK(split->SetInt64(kKeyColumnName, split_val));
             :        
> Does it work as intended in case of --table_num_buckets=1 ?
It doesn't, but I don't think hash partitioning does either. I'll doc that somewhere.



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 07 Jun 2018 14:54:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861 add range-partitioning of loadgen tables

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861 add range-partitioning of loadgen tables
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/1/src/kudu/tools/tool_action_perf.cc@605
PS1, Line 605:       table_creator->set_range_partition_columns({ kKeyColumnName });
             :       for (int i = 1; i < FLAGS_table_num_buckets; i++) {
             :         unique_ptr<KuduPartialRow> split(schema.NewRow());
             :         int64_t split_val = FLAGS_seq_start + i * span_per_tablet;
             :         RETURN_NOT_OK(split->SetInt64(kKeyColumnName, split_val));
             :        
> It doesn't, but I don't think hash partitioning does either. I'll doc that 
Yep, indeed: hash bucketing also requires 2 buckets at least: 


Invalid argument: Error creating table loadgen_auto_5f74ece81ad0475089fb4c993f6ce8fe on the master: must have at least two hash buckets


OK, sgtm -- just wanted to know what is the desired behaviour.



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 07 Jun 2018 16:32:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#7).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero. See
TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 242 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 )

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/4/src/kudu/tools/kudu-tool-test.cc@1584
PS4, Line 1584:   args.emplace_back("--table_num_hash_partitions=1");
> Done, the new revision passed 1000/1000 times
I only tested in DEBUG mode, and it's surprisingly very flaky in TSAN mode. I don't think this is quite as important to test, so I'm going to remove this hashing.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1473
PS5, Line 1473:   {
              :     ExternalMiniClusterOptions opts;
              :     opts.num_tablet_servers = 1;
> nit: since std::move() is used for 'opts' which invalidates 'opts' for the 
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1476
PS5, Line 1476:     NO_FATALS(StartExternalMiniCluster(std::move(opts)));
> nit: const ?
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1479
PS5, Line 1479: "perf", "loadgen",
> Why default 1000 is not good enough here?
Ah, good points, neither the size of the workload nor the flushing behavior in this case are quite as important as in the other test. I'll switch to use the defaults.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1480
PS5, Line 1480:     cluster_->master()->bou
> Could you add a comment explaining why manual flushing per row is essential
See other comment


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1481
PS5, Line 1481: // Use a number of threa
> It's 1 by default, so this option can be omitted.
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1482
PS5, Line 1482:     // so the insertio
> Why default is not good enough?  If it's not indeed, I think it's worth add
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1487
PS5, Line 1487: 
              :     // Keep the tab
> By my experience, funny things happen when using random :)  Maybe, you mean
Actually I did mean to use `use_random` to make sure nothing break _even if_ using random inserts. If we didn't use `use_random`, it wouldn't insert across the entire keyspace, and instead be limited by the number of rows.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512: 
> Is it stable enough for TSAN/ASAN builds with 10ms timeout?
Yep, ran with TSAN and it passed 500/500 times, ASAN 300/300


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1512
PS5, Line 1512:   args.emplace_back("--table_num_range_partitions=2");
              :   args.emplace_back("--table_num_hash_partitio
> Here and below: it seems these two lines of code can be reduced to
We're not deleting the tables as we go so we can do this check. That also means that we have to take into account the previous tables created during this test. I'll add a comment to that effect.


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1534
PS5, Line 1534: able_num_range_
> nit: 'workloads result' or 'workload results'
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1556
PS5, Line 1556:     opts.extra_tserver_flags = {
              :       "--flush_threshold_mb=1",
              :       "--flush_threshold_secs=1",
              :     };
> I think that can be dropped because that are the defaults values for those 
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/kudu-tool-test.cc@1568
PS5, Line 1568:     // each thread will be writing
> I think you could use '--string_len=32768' in addition if you want to gener
Done


http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/10633/5/src/kudu/tools/tool_action_perf.cc@303
PS5, Line 303:               "the existing table nor its data is never dropped/deleted.");
             : DEFINE_int32(table_num_hash_partitions, 8,
             :              "The number of hash partitions to create when this tool creates "
             :              "a new table. Note: The total number of partitions must be "
             :              "greater than 1.");
             : DEFINE_int32(table_num_range_partitions, 1,
             :              "The number of range partitions to create when this tool creates "
             :              "a new table. A range partitioning schema will be determined to "
             :              "evenly split a sequential workload across ranges, leaving "
             :              "the outermost ranges unbounded to ensure coverage of the entire "
             :              "keyspace. 
> Maybe, it's worth adding a group validator to verify that the total number 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 23 Jun 2018 02:44:03 +0000
Gerrit-HasComments: Yes