You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/05 14:40:29 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1905: Avoid repeated `open` for one single file and simplify object reader API on the `sync` part

alamb commented on a change in pull request #1905:
URL: https://github.com/apache/arrow-datafusion/pull/1905#discussion_r820111433



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -82,23 +112,12 @@ impl ObjectReader for LocalFileReader {
         )
     }
 
-    fn sync_chunk_reader(
-        &self,
-        start: u64,
-        length: usize,
-    ) -> Result<Box<dyn Read + Send + Sync>> {
-        // A new file descriptor is opened for each chunk reader.
-        // This okay because chunks are usually fairly large.
-        let mut file = File::open(&self.file.path)?;

Review comment:
       I probably misunderstand something here and I am sorry I don't quite follow all the comments on this PR. 
   
   If the issue you are trying to solve is that `File::open` is called too often, would it be possible to "memoize" the open here with a mutex inside of the FileReader?
   
   Something like
   
   ```rust
   struct LocalFileReader { 
   ...
       /// Keep the open file descriptor to avoid reopening it
      cache: Mutex<Option<Box<dyn Read + Send + Sync + Clone>>>
   }
   
   impl LocalFileReader { 
   ...
       fn sync_chunk_reader(
           &self,
           start: u64,
           length: usize,
       ) -> Result<Box<dyn Read + Send + Sync>> {
       let mut cache = self.cache.lock();
       if let Some(cache) = cache {
         return Ok(cache.clone())
       };
       *cache = File::open(...);
       return cache.clone();
   }
   ```
         




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org