You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2022/02/01 04:46:58 UTC

[kudu-CR] Adding repartitioning logic along with coalesce logic to backup output

Hello Attila Bukor, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18174

to look at the new patch set (#5).

Change subject: Adding repartitioning logic along with coalesce logic to backup output
......................................................................

Adding repartitioning logic along with coalesce logic to backup output

Jira: https://issues.apache.org/jira/browse/KUDU-3309

We optionally use the coalesce and repartitions options in the BackupKuduTable Spark command.
For every release we have to add this commit to our internal release. 
Request to get this commit in apache/kudu to avoid having to add this commit for every new kudu release

Adding repartition logic along with coalesce to output files
Both the above parameterss are optional.
Coalesce takes precedence over repartition if both of them are defined.

Testing

sudo /mnt/services/spark/bin/run-transform-cluster-mode-on report-center-batch-driver --stack rcspark_envoy --executor-cores 8 --total-executor-cores 32 --executor-memory 55g --driver-memory 55g --conf spark.log4j.logger.org.apache.spark=WARN --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.speculation=false --class com.twilio.backup.BackupKuduTable /mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar --kuduMasterAddresses report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com --splitSizeBytes 1000000000 --scanRequestTimeoutMs 60000000 --coalesceOutputPartitions 32 --rootPath s3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/ BillableItemUsageCategories
2022-01-27 08:07:01,015 - root - INFO:  TIMEOUT is None, status check interval 60, job file None and connection retry to spark REST API 5 and arguments to job ['--executor-cores', '8', '--total-executor-cores', '32', '--executor-memory', '55g', '--driver-memory', '55g', '--conf', 'spark.log4j.logger.org.apache.spark=WARN', '--conf', 'spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2', '--conf', 'spark.speculation=false', '--class', 'com.twilio.backup.BackupKuduTable', '/mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar', '--kuduMasterAddresses', 'report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com', '--splitSizeBytes', '1000000000', '--scanRequestTimeoutMs', '60000000', '--coalesceOutputPartitions', '32', '--rootPath', 's3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/', 'BillableItemUsageCategories']
2022-01-27 08:07:01,772 - root - INFO:  Job submitted as driver-20220127080701-17528
2022-01-27 08:08:02,680 - root - INFO:  Job submission [driver-20220127080701-17528] alive with state RUNNING on worker-20220127062817-172.25.72.200-7078
2022-01-27 08:09:03,707 - root - INFO:  Job submission [driver-20220127080701-17528] completed (state: FINISHED)
Finished: SUCCESS


Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestOptions.scala
3 files changed, 35 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/18174/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
Gerrit-Change-Number: 18174
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <mk...@twilio.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)