You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/17 19:59:05 UTC

[GitHub] [hudi] umehrot2 commented on issue #3581: [SUPPORT] Slow snapshot query performance

umehrot2 commented on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-922046864


   @codejoyan since you are observing `Listing leaf files...` it means that your code is using `InMemoryFileIndex` instead of `HoodieFileIndex`. I think you are using an older version of Hudi and not Hudi 0.9.0 for your testing. In Hudi 0.9.0 to enable metadata listing you can just do `SET hoodie.metadata.enable=true` in Spark SQL.
   
   If you are using earlier version of Hudi i.e 0.8.0 or 0.7.0 it does have `HoodieFileIndex`. To obtain best listing performance you should use the Hoodie RO Path Filter (if using COW table) https://hudi.apache.org/docs/querying_data/#spark-sql. To further enable metadata listing in release 0.8.0 or 0.7.0 (either COW or MOR) you need to pass it as a hadoop conf: `spark.hadoop.hoodie.metadata.enable`. But main benefits of metadata listing you would observe only since Hudi 0.9.0 with the introduction of HoodieFileIndex.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org