You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/02 21:12:45 UTC

[GitHub] [hudi] tooptoop4 opened a new issue #1586: [SUPPORT] DMS with 2 key example

tooptoop4 opened a new issue #1586:
URL: https://github.com/apache/hudi/issues/1586


   would u be able to add an example to https://cwiki.apache.org/confluence/display/HUDI/2020/01/20/Change+Capture+Using+AWS+Database+Migration+Service+and+Hudi using 2 column key?
   
   can it just be done calling pre-built   'spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer'  without writing any class ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
pratyakshsharma edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-656573278


   Hmm strange @nandurj , after setting the above 2 properties, things should work as expected. Can you help me with which version of Hudi you are using and how are you trying to run the job? Can you add logs in DeltaSync constructor to see which keyGenerator class is actually getting initialised. This looks like more of misconfiguration issue than code issue. Can you confirm you are adding the properties in the right config file which is getting used by the job?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683853156


   I have been facing 1 more problem. The hudi dataset generated by spark job contains a .hoodie and default directory. Now I want to view it in AWS data cataogue and crawling this data through glue crawler.
   I have hit a roadblock now that my crawled table is having those extra metadata columns such as _hoodie_commit_time, _hoodie_coomit_seq_no, _hoodie_record_key etc. 
   
   Is there a way to just get the data which contains only data columns and not these metadata columns ? Although I can read the dataset as df in another spark job  and drop these extra columns but I want to avoid that.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-656573278


   Hmm strange @nandurj , after setting the above 2 properties, things should work as expected. Can you help me with which version of Hudi you are using and how are you trying to run the job? Can you add logs in DeltaSync constructor to see which keyGenerator class is actually getting initialised. This looks like more of mis configuration issue than code issue. Can you confirm you are adding the properties in the right config file which is getting used by the job?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683269914


   @bvaradar 
   is this issue resolved in any of Hudi release ? 
   I am currently using 0.6.0 with scala 2.11 and previously tried with 0.5.2-incubating/0.5.3 + Scala 2.11 as well the result is same Below is the description of the spark job that I wrote.
   
   My usecase is somewhat same where I am using 2 columns as RecordKey of a dataset to make an upsert. but the comma separated values don't work. Below is the code I am using to make an upsert in a dataset where albumId and trackId are 2 keys which identifies a unique record : 
   
   albumDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "albumId, trackId")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)      
         .option("hoodie.upsert.shuffle.parallelism", "2")
         .mode(SaveMode.Append)
         .save(s"$basePath/$tableName/")
   
   And below is the error that I get : 
   	... 5 more
   Caused by: org.apache.hudi.exception.HoodieKeyException: recordKey value: "null" for field: "albumId, trackId" cannot be null or empty.
   	at org.apache.hudi.keygen.SimpleKeyGenerator.getKe
   
   
   
   Can you please help ? 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
nandurj commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-653527159


   I am using multiple keys to create CoW tables by using below properties hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.recordkey.field=customer_id,product_id
   
   But the delta streamer is not picking up the second key, It is only picking up the first key customer_id. I have verified this by _hoodie_redord_key value on the table only shows customer_id


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683849662


   Thanks for replying @bvaradar and @tooptoop4 .
   Though this solves the problem, the exception trace is quite confusing. Also keys such as these can not be handled through 1 class only ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683821525


   @noobarcitect : This is a different issue.  You need to use ComplexKeyGenerator instead of SimpleKeyGenerator if you are using more than 1 columns as record key.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-665107888


   Will take a look at it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683862001


   @noobarcitect : We would be happy to help you in answering your queries. Kindly open a new ticket as this is a closed ticket.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-653222555


   Reopened.. and let me understand what's going on.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-667037723


   Opened https://issues.apache.org/jira/browse/HUDI-1140 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683860484


   @noobarcitect : SimpleKeyGenerator is intended to handle singe key column as the name suggests. To avoid confusion, it is best to retain the current behavior :)
   
   Regarding data catalog, Hudi columns are needed for your incremental queries. If you want to omit them, I am not sure how the query integration would work. If this simply needed for tracking columns, the way you suggested is the one that works.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-653270837


   @nandurj 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-668657095


   https://github.com/apache/hudi/pull/1898


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-668656807


   Closing this as we have a PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
zhedoubushishi edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-665856107






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683269914


   @bvaradar 
   is this issue resolved in any of Hudi release ? 
   I am currently using 0.6.0 with scala 2.11 and previously tried with 0.5.2-incubating/0.5.3 + Scala 2.11 as well the result is same Below is the description of the spark job that I wrote.
   
   My usecase is somewhat same where I am using 2 columns of a dataset to make an upsert. but the comma separated values don't work. Below is the code I am using to make an upsert in a dataset where albumId and trackId are 2 keys which identifies a unique record : 
   
   albumDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "albumId, trackId")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)      
         .option("hoodie.upsert.shuffle.parallelism", "2")
         .mode(SaveMode.Append)
         .save(s"$basePath/$tableName/")
   
   And below is the error that I get : 
   	... 5 more
   Caused by: org.apache.hudi.exception.HoodieKeyException: recordKey value: "null" for field: "albumId, trackId" cannot be null or empty.
   	at org.apache.hudi.keygen.SimpleKeyGenerator.getKe
   
   
   
   Can you please help ? 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-667504257


   https://www.baeldung.com/jcommander-parsing-command-line-parameters#separated-lists might be useful. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
nandurj commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-659633790


   I am working with HUDI 0.5.2 on EMR 5.30. I am running the job using the Delta streamer. Below is how I am running the spark job.
   
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --jars /usr/lib/spark/external/lib/spark-avro_2.11-2.4.5-amzn-0.jar \
     --master yarn --deploy-mode client \
     --executor-memory 10G --executor-cores 4 \
     file:///usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type COPY_ON_WRITE \
     --source-ordering-field TIMESTAMP \
     --continuous \
     --enable-hive-sync \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
     --target-base-path s3://mybucket/CoWex --target-table table_test \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
     --hoodie-conf hoodie.datasource.write.recordkey.field="Field1, Field2, Field3" \
     --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \
     --hoodie-conf hoodie.datasource.write.partitionpath.field="Field1" \
     --hoodie-conf hoodie.datasource.hive_sync.database=testdb \
     --hoodie-conf hoodie.datasource.hive_sync.table=test_table\
     --hoodie-conf hoodie.datasource.hive_sync.partition_fields="datefield" \
     --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor \
     --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://mybucket/input


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683853156


   I have been facing 1 more problem. The hudi dataset generated by spark job contains a .hoodie and default directory. Now I want to view it in AWS data cataogue and crawling this data through glue crawler.
   I have hit a roadblock now that my crawled table is having those extra metadata columns such as _hoodie_commit_time, _hoodie_coomit_seq_no, _hoodie_record_key etc. 
   
   Is there a way to just get the data which contains only data columns and not these metadata columns ? Although I can read the dataset as df and another spark job  and drop these extra columns but I want to avoid that.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-660424332


   @pratyakshsharma : Would you be able to debug  this issue ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] noobarcitect commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
noobarcitect commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683870453


   @bvaradar .. That was really helpful.
   Will surely open a ticket If I need. I have got all the answers for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-666419674


   @zhedoubushishi Interesting! Were you able to deep dive into how can we ensure JCommander does not split by comma or any work around we could do? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
nandurj commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-650238181


   We are still facing this issue after setting the hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.recordkey.field=comma_seperated_list_of_primary_keys
   
   The fist key is only considered as a recordkey.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nandurj edited a comment on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
nandurj edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-659633790


   I am working with HUDI 0.5.2 on EMR 5.30. I am running the job using the Delta streamer. Below is how I am running the spark job.
   
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --jars /usr/lib/spark/external/lib/spark-avro_2.11-2.4.5-amzn-0.jar \
     --master yarn --deploy-mode client \
     --executor-memory 10G --executor-cores 4 \
     file:///usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type COPY_ON_WRITE \
     --source-ordering-field TIMESTAMP \
     --continuous \
     --enable-hive-sync \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
     --target-base-path s3://mybucket/CoWex --target-table table_test \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --hoodie-conf hoodie.datasource.write.recordkey.field="Field1, Field2, Field3" \
     --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \
     --hoodie-conf hoodie.datasource.write.partitionpath.field="Field1" \
     --hoodie-conf hoodie.datasource.hive_sync.database=testdb \
     --hoodie-conf hoodie.datasource.hive_sync.table=test_table\
     --hoodie-conf hoodie.datasource.hive_sync.partition_fields="datefield" \
     --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor \
     --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://mybucket/input
   
   
   Spark-shell output:
   scala> spark.sql("""select _hoodie_record_key from testdb.test_table""").show(false)
   +--------------------------------------------------------------------+          
   |_hoodie_record_key                                                  |
   +--------------------------------------------------------------------+
   |Field1:[0, 0]|
   +--------------------------------------------------------------------+
   
   From the above output Field2, Field3 are missing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-683846614


   @bvaradar why not point SimpleKeyGenerator class to ComplexKeyGenerator ? seems no need to maintain a limited 1 key class if ComplexKeyGenerator can do 1 or multi keys without the null bug


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1586:
URL: https://github.com/apache/hudi/issues/1586


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-665856107


   I added a log print to checked the config read by JCommander:
   
   ```
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.write.recordkey.field=f1
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is f2,,f3
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.write.partitionpath.field=dt
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.hive_sync.database=default
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.hive_sync.table=hudi_table
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.hive_sync.partition_fields=dt
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.deltastreamer.source.dfs.root=s3://hudi/data
   ```
   
   It shows that JCommander will automatically split the string by ",":
   ```
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is hoodie.datasource.write.recordkey.field=f1
   20/07/25 00:50:06 INFO HoodieDeltaStreamer: debug => config is f2,,f3
   ```
   That's why it unable to read multiple fields as record key.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-653255995


   Okay.. DMS or not, this just seems like a DeltaStreamer config, that should work.. what error are you facing? can you give me more details?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-667423657


   > @zhedoubushishi Interesting! Were you able to deep dive into how can we ensure JCommander does not split by comma or any work around we could do?
   
   Sure. Will take a look later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-655205874


   cc @pratyakshsharma  any ideas? (since you are actively looking at this code) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-650338788


   @bvaradar pls reopen


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1586:
URL: https://github.com/apache/incubator-hudi/issues/1586#issuecomment-623983824


   You would need to set the below configs for composite keys.
   
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.recordkey.field=comma_seperated_list_of_primary_keys
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar closed issue #1586: [SUPPORT] DMS with 2 key example

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1586:
URL: https://github.com/apache/incubator-hudi/issues/1586


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org