You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/05/20 17:54:28 UTC

[GitHub] [incubator-hudi] bvaradar commented on pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

bvaradar commented on pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643#issuecomment-631630091


   @garyli1019 : There are 2 parts to it : The ticket was originally created to track making hive-style partitioning scheme as default in Hudi.  Spark supports this same style.  Given the adoption, changing the default partition style has implication on backwards compatibility and needs to have a discussion.
   
   The other part is about how to make use of partition configuration spark captures in partitionBy(..) and use it directly configure KeyGenerator. Let me know if this makes sense. 
   
   
   SlashEncodedDayPartitionValueExtractor is the default being used. This is not a common format outside Uber.
   
   
   Also, Spark DataSource provides partitionedBy clauses which has not been integrated for Hudi Data Source.  We need to investigate how we can leverage partitionBy clause for partitioning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org