You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Luning Wang (Jira)" <ji...@apache.org> on 2023/01/11 06:39:00 UTC

[jira] [Updated] (HUDI-5527) Can't set keygen class in bootstrap

     [ https://issues.apache.org/jira/browse/HUDI-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luning Wang updated HUDI-5527:
------------------------------
    Description: 
When I execute the following bootstrap command, it throws an error about SimpleKeyGenerator. I have set the keygen class to `NonpartitionedKeyGenerator`, but it is invalid.

 
{code:java}
bin/spark-submit --master yarn \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /opt/hudi-utilities-bundle_2.12-0.12.2.jar \
--run-bootstrap \
--target-base-path /tpcds_hudi_3.db/call_center \
--target-table call_center \
--table-type COPY_ON_WRITE \
--hoodie-conf hoodie.bootstrap.base.path=h/tpcds_bin_partitioned_parquet_3.db/call_center \
--hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
--hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
--hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
--hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=true {code}
Error message:
{code:java}
Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.keygen.SimpleKeyGenerator

Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.partitionpath.field not found {code}
 

 

  was:
When I execute the following bootstrap command, it throws an error about SimpleKeyGenerator. I have set the keygen class to `NonpartitionedKeyGenerator`, but it is invalid.

 

 
{code:java}
bin/spark-submit --master yarn \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /opt/hudi-utilities-bundle_2.12-0.12.2.jar \
--run-bootstrap \
--target-base-path /tpcds_hudi_3.db/call_center \
--target-table call_center \
--table-type COPY_ON_WRITE \
--hoodie-conf hoodie.bootstrap.base.path=h/tpcds_bin_partitioned_parquet_3.db/call_center \
--hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
--hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
--hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
--hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=true {code}
Error message:
{code:java}
Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.keygen.SimpleKeyGenerator

Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.partitionpath.field not found {code}
 

 


> Can't set keygen class in bootstrap
> -----------------------------------
>
>                 Key: HUDI-5527
>                 URL: https://issues.apache.org/jira/browse/HUDI-5527
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: bootstrap, hudi-utilities
>    Affects Versions: 0.12.2
>         Environment: Spark 3.3.1
>            Reporter: Luning Wang
>            Priority: Major
>
> When I execute the following bootstrap command, it throws an error about SimpleKeyGenerator. I have set the keygen class to `NonpartitionedKeyGenerator`, but it is invalid.
>  
> {code:java}
> bin/spark-submit --master yarn \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /opt/hudi-utilities-bundle_2.12-0.12.2.jar \
> --run-bootstrap \
> --target-base-path /tpcds_hudi_3.db/call_center \
> --target-table call_center \
> --table-type COPY_ON_WRITE \
> --hoodie-conf hoodie.bootstrap.base.path=h/tpcds_bin_partitioned_parquet_3.db/call_center \
> --hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
> --hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
> --hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
> --hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
> --hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
> --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true {code}
> Error message:
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.keygen.SimpleKeyGenerator
> Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.partitionpath.field not found {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)