You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2018/06/08 01:33:49 UTC
[kudu-CR] KUDU-1861: add range-partitions to loadgen tables

Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10633

to look at the new patch set (#3).

Change subject: KUDU-1861: add range-partitions to loadgen tables
......................................................................

KUDU-1861: add range-partitions to loadgen tables

This patch adds the ability to generate a range-partitioned table with
the loadgen tool. The range partitioning schema is designed such that
the non-random write workload will insert sequentially on the primary
key, provided the number of threads is equal to the number of tablets.
This sequential workload per tablet both reduces the number of
compactions and avoids bloom filter lookups.

Below are illustrations of some tablet partitioning and non-random
write workloads. The y-axis for both the threads and the tablets is the
keyspace, increasing going downwards.

--num_threads=2 --table_num_range_partitions=2

  Threads sequentially
  insert to their keyspaces
  in non-random insert mode.
     +  +---------+         ^
     |  | thread1 | tabletA |  Tablets' range partitions are
     |  |         |         |  set to match the desired total
     v  +---------+---------+  number of inserted rows for the
     |  | thread2 | tabletB |  entire workload, but leaving the
     |  |         |         |  outermost tablets unbounded.
     v  +---------+         v

If the number of tablets is not a multiple of the number of threads when
using an auto-generated range-partitioned table, we lose the guarantee
that we always write to a monotonically increasing range on each tablet.

--num_threads=2 --table_num_range_partitions=3
     +  +---------+         ^
     |  | thread1 | tabletA |
     |  |         +---------+
     v  +---------| tabletB |
     |  | thread2 +---------+
     |  |         | tabletC |
     v  +---------+         v

This patch also renames --table_num_buckets to
--table_num_hash_partitions, which can be combined with
--table_num_range_partitions if desired.

I tested this out on a singler-tserver cluster and verified via the
metrics logs that the number of bloom lookups for a non-random workload
where the number of insert threads and the number of tablets were equal
stayed at zero. When the number of threads was not a factor of the
number of buckets, the number of bloom lookups was non-zero.

Note: I use the number of bloom lookups as a loose indicator of whether
writes are sequential or not. If row A is being inserted to a range of
the keyspace that has already been inserted to, the interval tree that
backs the Kudu tablet will be unable to say with certainty that row A
does or doesn't already exist, necessitating a bloom lookup. As such, if
there are bloom lookups for a tablet for a given workload, we can say
that that workload is not sequential.

Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_perf.cc
2 files changed, 105 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/10633/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013
Gerrit-Change-Number: 10633
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins