You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/06 14:45:21 UTC

[GitHub] [hudi] SureshK-T2S edited a comment on issue #2406: [SUPPORT] Deltastreamer - Property hoodie.datasource.write.partitionpath.field not found

SureshK-T2S edited a comment on issue #2406:
URL: https://github.com/apache/hudi/issues/2406#issuecomment-774483216


   Hello, thank you guys for giving me time with this. I have since had an issue with MultiTableDeltaStreamer, in particular getting it to work with ParquetDFS Data Source. Getting an issue due to the SchemaProvider or lack of one.
   
   Command:
   ```
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer \
   --packages org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4\
     --master yarn --deploy-mode client \
   /usr/lib/hudi/hudi-utilities-bundle.jar --table-type COPY_ON_WRITE \
    --props s3:///temp/config/s3-source.properties \
     --config-folder s3:///temp/hudi-ingestion-config/\
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
    --continuous --source-ordering-field updated_at \
     --base-path-prefix s3://hudi-data-lake --target-table dummy_table --op UPSERT
   ```
   
   S3 properties:
   ```
   hoodie.deltastreamer.ingestion.tablesToBeIngested=db.table1,db.table2
   hoodie.deltastreamer.ingestion.db.table1.configFile=s3://hudi-data-lake/configs/db/table1.properties
   hoodie.deltastreamer.ingestion.db.table2.configFile=s3://hudi-data-lake/configs/db/table2.properties
   ```
   
   Table1 properties:
   ```
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.deltastreamer.source.dfs.root=s3://root_folder_1
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=year,month,day
   ```
   
   Table2 properties:
   ```
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.deltastreamer.source.dfs.root=s3://root_folder_2
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=year,month,day
   ```
   
   Error:
   ```
   Exception in thread "main" java.lang.NullPointerException
   	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateSchemaProviderProps(HoodieMultiTableDeltaStreamer.java:148)
   	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:128)
   	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.<init>(HoodieMultiTableDeltaStreamer.java:78)
   	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:201)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   Reaching final steps of my setup, really hoping to be able to get this resolved and go live soon!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org