You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/08 15:43:52 UTC

[GitHub] [incubator-hudi] Antauri edited a comment on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

Antauri edited a comment on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394#issuecomment-611032718
 
 
   Present in 0.5.2-incubating on EMR 6.x which we're using. We're in development of a framework that does S3 to S3 ingestion using Hudi and using Spark SQL writers (not RDDs). We have year=x/month=y/day=z/bin=q partitioning. For 3 days and 575 paths each it takes 3+ minutes between repetitive "listing leaf files and directories". In total some 9 minutes for just 3 days.
   
   Any idea when 0.6.0 will be released? And does adding "Hive" as the metastore helps in reducing this listing or it doesn't matter?
   
   Thank you kind!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services