You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/09 19:54:32 UTC

[GitHub] [hudi] umehrot2 commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

umehrot2 commented on issue #1798:
URL: https://github.com/apache/hudi/issues/1798#issuecomment-656320283


   @zherenyu831 yes I am also confused by the difference in number of files in the two experiments you have provided. Are both these queries on the same dataset and have same number of files underneath ?
   
   Regardless, the listing happens internally through Spark's `parquet` data source. The only difference is Hudi passes `HoodieROTablePathFilter` to spark's implementation to list only the latest files. At this point I don't understand why that would cause difference in these two queries which you have mentioned, but we would be happy to look into it.
   
   Can you provide a snapshot of your Spark history server showing the difference in time in Spark's listing for these two queries on the same table ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org