You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/20 06:42:40 UTC

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

bvaradar commented on issue #1847:
URL: https://github.com/apache/hudi/issues/1847#issuecomment-660836081


   @zuyanton : Thanks for the detailed write-up.  This is very interesting. If you look at the base implementation of FileStatus  getLen() method, it returns a cached copy of the length. So, I wouldnt expect it to be the cause of such high variance. Also, 100 milliseconds you had observed would definitely making some blocking operations like RPC calls.  Does the EMR/S3 implementation of filesystem overrides these classes ? 
   
   ```
   
     /**
      * Get the length of this file, in bytes.
      * @return the length of this file, in bytes.
      */
     public long getLen() {
       return length;
     }
   ```
   
   @zuyanton : Can you track the class type for the incoming file-status object ?
   
   cc @umehrot2 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org