You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/12 22:55:55 UTC

[GitHub] [hudi] umehrot2 commented on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

umehrot2 commented on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-840145405


   @pengzhiwei2018 @vinothchandar I have further re-factored `HoodieFileIndex` for more efficient integration in case of MOR real time queries. Earlier we were just listing base files using the file index and later it would again perform listing for log files in `MergeOnReadSnapshotRelation` using `groupLogsByBaseFile`. Now, I will be storing and fetching both base and log files in-case of real time queries.
   
   This ensures that filesystem is listed just once if filesystem listing is used. In case of metadata, it ensures the it will be read just once and no addition listing or reading is done to fetch log files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org