You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/09/06 05:43:00 UTC

[jira] [Commented] (HUDI-4656) Test COW: Deltastreamer metadata-only and full-record bootstrap operation

    [ https://issues.apache.org/jira/browse/HUDI-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600579#comment-17600579 ] 

Ethan Guo commented on HUDI-4656:
---------------------------------

Found the same issues as Spark datasource testing in HUDI-4655.

> Test COW: Deltastreamer metadata-only and full-record bootstrap operation
> -------------------------------------------------------------------------
>
>                 Key: HUDI-4656
>                 URL: https://issues.apache.org/jira/browse/HUDI-4656
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
>  
> {code:java}
> export TEST_HUDI_UT_JAR=/repo/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar
> export TEST_HUDI_SPARK_JAR=/repo/hudi/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar
> export TEST_BASE_DIR=<>/bootstrap-testing/ds-1
> export SPARK_HOME=/Users/ethan/Work/lib/spark-3.2.1-bin-hadoop3.2
> /Users/ethan/Work/lib/spark-3.2.1-bin-hadoop3.2/bin/spark-submit \
>         --master local[6] \
>         --driver-memory 6g --executor-memory 2g --num-executors 6 --executor-cores 1 \
>         --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>         --conf spark.sql.catalogImplementation=hive \
>         --conf spark.driver.maxResultSize=1g \
>         --conf spark.speculation=true \
>         --conf spark.speculation.multiplier=1.0 \
>         --conf spark.speculation.quantile=0.5 \
>         --conf spark.ui.port=6679 \
>         --conf spark.eventLog.enabled=true \
>         --conf spark.eventLog.dir=/Users/ethan/Work/data/hudi/spark-logs \
>         --jars $TEST_HUDI_SPARK_JAR \
>         --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
>         $TEST_HUDI_UT_JAR \
>         --run-bootstrap \
>         --props $TEST_BASE_DIR/ds_cow.properties \
>         --target-base-path file:$TEST_BASE_DIR/test_table \
>         --target-table test_table \
>         --table-type COPY_ON_WRITE \
>         --op INSERT \
>         --hoodie-conf hoodie.bootstrap.base.path=/Users/ethan/Work/scripts/bootstrap-testing/partitioned-parquet-table-date \
>         --hoodie-conf hoodie.datasource.write.recordkey.field=key \
>         --hoodie-conf hoodie.datasource.write.partitionpath.field=partition \
>         --hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.SimpleKeyGenerator \
>         --hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
>         --hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
>         --hoodie-conf hoodie.bootstrap.mode.selector.regex="2022/1/2[4-8]" \
>         --hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=METADATA_ONLY >> ds.log 2>&1 {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)