You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/15 17:32:33 UTC

[GitHub] [hudi] bvaradar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

bvaradar commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-658902047


   @zuyanton : HoodieParquetInputFormat relies on hadoop-mapreduce FileInputFormat listing implementation to perform listing. There is a knob in base FileInputFormat to tune listing parallelism.  
   
   "mapreduce.input.fileinputformat.list-status.num-threads"
   
   The above config is set to 1 by default. Can you try increasing it to achieve speedup.
   
   @zuyanton : We are also working on RFC-15 https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements to holistically eliminate file listing and improve query performance. 
   
   cc @umehrot2  for any other suggestions. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org