You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2019/03/22 15:23:56 UTC

[kudu-CR] [spark-tools] DistributedDataGenerator repartition support

Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12411

to look at the new patch set (#4).

Change subject: [spark-tools] DistributedDataGenerator repartition support
......................................................................

[spark-tools] DistributedDataGenerator repartition support

This patch adds support to the DistributedDataGenerator
to repartition the data to match the Kudu partitioning.

Because data generation is now decoupled from data
loading, this patch changes the collision handling
behavior. Instead of generating new data on collision,
now the collision is only tracked in the metrics.

Additionally this patch changes the default generation
type from random to sequential given that has been
shown to be the more common option and the type
of workload Kudu is better suited for.

Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995
---
M java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala
M java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
3 files changed, 206 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/12411/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995
Gerrit-Change-Number: 12411
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>