You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/28 20:45:40 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4156: Cleanup ChunkReader (#4118)

tustvold commented on code in PR #4156:
URL: https://github.com/apache/arrow-rs/pull/4156#discussion_r1180807171


##########
parquet/src/file/reader.rs:
##########
@@ -44,19 +46,47 @@ pub trait Length {
 }
 
 /// The ChunkReader trait generates readers of chunks of a source.
-/// For a file system reader, each chunk might contain a clone of File bounded on a given range.
-/// For an object store reader, each read can be mapped to a range request.
+///
+/// For more information see [`File::try_clone`]
 pub trait ChunkReader: Length + Send + Sync {
-    type T: Read + Send;
-    /// Get a serially readable slice of the current reader
-    /// This should fail if the slice exceeds the current bounds
-    fn get_read(&self, start: u64, length: usize) -> Result<Self::T>;
+    type T: Read;
+
+    /// Get a [`Read`] starting at the provided file offset
+    ///
+    /// Subsequent or concurrent calls to [`Self::get_read`] or [`Self::get_bytes`] may

Review Comment:
   FileSource provided protection against subsequent calls to get_read, by calling Seek on every read, but provided no protection against concurrent access. I think it is less risky to just clearly not support non-serial usage, than to only break on concurrent usage.
   
   Whilst one option would be to add `Mutex` to synchronise access, I think if it is fine for the standard library it is fine for us, there are no safety implications of not synchronising this access, you just might read gibberish - which you may do anyway :sweat_smile: 
   
   Similarly a user could just call `File::try_clone` on their own and feed both into separate readers, there is no real way to prevent this, file IO is just exciting :sweat_smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org