You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/16 19:54:58 UTC

[GitHub] [hudi] nandurj edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

nandurj edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-659633790


   I am working with HUDI 0.5.2 on EMR 5.30. I am running the job using the Delta streamer. Below is how I am running the spark job.
   
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --jars /usr/lib/spark/external/lib/spark-avro_2.11-2.4.5-amzn-0.jar \
     --master yarn --deploy-mode client \
     --executor-memory 10G --executor-cores 4 \
     file:///usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type COPY_ON_WRITE \
     --source-ordering-field TIMESTAMP \
     --continuous \
     --enable-hive-sync \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
     --target-base-path s3://mybucket/CoWex --target-table table_test \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --hoodie-conf hoodie.datasource.write.recordkey.field="Field1, Field2, Field3" \
     --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \
     --hoodie-conf hoodie.datasource.write.partitionpath.field="Field1" \
     --hoodie-conf hoodie.datasource.hive_sync.database=testdb \
     --hoodie-conf hoodie.datasource.hive_sync.table=test_table\
     --hoodie-conf hoodie.datasource.hive_sync.partition_fields="datefield" \
     --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor \
     --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://mybucket/input
   
   
   Spark-shell output:
   scala> spark.sql("""select _hoodie_record_key from testdb.test_table""").show(false)
   +--------------------------------------------------------------------+          
   |_hoodie_record_key                                                  |
   +--------------------------------------------------------------------+
   |Field1:[0, 0]|
   +--------------------------------------------------------------------+
   
   From the above output Field2, Field3 are missing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org