You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/09/06 05:43:00 UTC
[jira] [Commented] (HUDI-4656) Test COW: Deltastreamer metadata-only and full-record bootstrap operation
[ https://issues.apache.org/jira/browse/HUDI-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600579#comment-17600579 ]
Ethan Guo commented on HUDI-4656:
---------------------------------
Found the same issues as Spark datasource testing in HUDI-4655.
> Test COW: Deltastreamer metadata-only and full-record bootstrap operation
> -------------------------------------------------------------------------
>
> Key: HUDI-4656
> URL: https://issues.apache.org/jira/browse/HUDI-4656
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Fix For: 0.13.0
>
>
>
> {code:java}
> export TEST_HUDI_UT_JAR=/repo/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar
> export TEST_HUDI_SPARK_JAR=/repo/hudi/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar
> export TEST_BASE_DIR=<>/bootstrap-testing/ds-1
> export SPARK_HOME=/Users/ethan/Work/lib/spark-3.2.1-bin-hadoop3.2
> /Users/ethan/Work/lib/spark-3.2.1-bin-hadoop3.2/bin/spark-submit \
> --master local[6] \
> --driver-memory 6g --executor-memory 2g --num-executors 6 --executor-cores 1 \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --conf spark.sql.catalogImplementation=hive \
> --conf spark.driver.maxResultSize=1g \
> --conf spark.speculation=true \
> --conf spark.speculation.multiplier=1.0 \
> --conf spark.speculation.quantile=0.5 \
> --conf spark.ui.port=6679 \
> --conf spark.eventLog.enabled=true \
> --conf spark.eventLog.dir=/Users/ethan/Work/data/hudi/spark-logs \
> --jars $TEST_HUDI_SPARK_JAR \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
> $TEST_HUDI_UT_JAR \
> --run-bootstrap \
> --props $TEST_BASE_DIR/ds_cow.properties \
> --target-base-path file:$TEST_BASE_DIR/test_table \
> --target-table test_table \
> --table-type COPY_ON_WRITE \
> --op INSERT \
> --hoodie-conf hoodie.bootstrap.base.path=/Users/ethan/Work/scripts/bootstrap-testing/partitioned-parquet-table-date \
> --hoodie-conf hoodie.datasource.write.recordkey.field=key \
> --hoodie-conf hoodie.datasource.write.partitionpath.field=partition \
> --hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.SimpleKeyGenerator \
> --hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
> --hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
> --hoodie-conf hoodie.bootstrap.mode.selector.regex="2022/1/2[4-8]" \
> --hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=METADATA_ONLY >> ds.log 2>&1 {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)