You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/01 15:18:58 UTC

[GitHub] [hudi] vinothchandar commented on issue #3581: [SUPPORT] Slow snapshot query performance

vinothchandar commented on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-910387627


   @codejoyan Compression is a key thing to align for ensuring apples-apples comparison, glad that got the storage issue under control. 
   
   So the time for approach 2, seems more like 2 mins? (assume the first column is submission time). 
   
   To reduce the listing cost, Hudi does have a [metadata table ](http://hudi.apache.org/docs/configurations#hoodiemetadataenable) that can fetch listings without going to cloud storage for listings. We can try this out. I think even with file index, listing is fetched/refreshed at-least once.  
   
   when writing,querying the dataset, use  `hoodie.metadata.enable=true`
   
   but the bigger cost is the 1 minute between 0 and 1.  That is puzzling. 
   
   cc @umehrot2 , I know you tested all this out. wondering if you have insights.
   cc @nsivabalan as FYI given you are looking into all things metadata table 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org