You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/13 10:53:39 UTC

[GitHub] [hudi] LeoHsu0802 commented on issue #933: Support for multiple level partitioning in Hudi

LeoHsu0802 commented on issue #933:
URL: https://github.com/apache/hudi/issues/933#issuecomment-707660374


   > I found the way to do this, For anyone's reference this can be achieved by
   > 
   > 1. Use org.apache.hudi.ComplexKeyGenerator as key generator class instead of SimpleKeyGenerator.
   > 2. Provide the fields that you want to partition based on as comma separated string as PARITION_FIELD_OPT_KEY
   > 
   > Reference :
   > https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java#L42
   
   Hi @afeldman1 , I have a question about point 2, I try to partition by year/month/day in pyspark but didn't work and below is what I setting.
   
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'id',
     'hoodie.datasource.write.partitionpath.field': {"year","month","day"},
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'insert',
     'hoodie.datasource.write.precombine.field': 'country',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2
   }
   
   May I ask why?
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org