You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/03 15:32:31 UTC

[GitHub] [arrow] rdettai commented on pull request #8525: ARROW-10387: [Rust][Parquet] Avoid call for file size metadata to read footer

rdettai commented on pull request #8525:
URL: https://github.com/apache/arrow/pull/8525#issuecomment-721201569


   @sunchao with most object storage, you will select the bytes you want to read with the http [Range](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) header, which can read from end. You can use this to implement `ChunkMode::FromEnd` without knowing the length of the file.
   
   Getting the length is expensive as it adds an extra GET request, and I guess that with HDFS it also implies a network round trip which is not free. But as @alamb mentioned earlier, you will often have the length around beforehand because you get it at the same time as you list your files/objects.
   
   **I would be in favor of stalling this PR until at least one other person expresses his interest for the ability to read without knowing the length of the file.** 
   
   Meanwhile, one of your [comments](https://github.com/apache/arrow/pull/8525#discussion_r515420243) on the PR made me think, and it's true that there is some buffering logic that is managed in the footer parser that should be left to the `ChunkReader` implementation. I'll try to see if I can find an interface that better separates concerns.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org