You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/06/26 11:42:30 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4434: fix: Allow reading of arrow files with more than one million columns

tustvold commented on code in PR #4434:
URL: https://github.com/apache/arrow-rs/pull/4434#discussion_r1242061745


##########
arrow-ipc/src/reader.rs:
##########
@@ -647,9 +649,18 @@ impl<R: Read + Seek> FileReader<R> {
         reader.seek(SeekFrom::End(-10 - footer_len as i64))?;
         reader.read_exact(&mut footer_data)?;
 
-        let footer = crate::root_as_footer(&footer_data[..]).map_err(|err| {
-            ArrowError::IoError(format!("Unable to get root as footer: {err:?}"))
-        })?;
+        // construct verifier options that reflect actual number of columns
+        // in file and avoid an error if the file contains more than 1M rows
+        let verifier_options = VerifierOptions {

Review Comment:
   I wonder if we should create a FileReaderOptions, that contains both the projection, and also `VerifierOptions`, to allow people to set these as they deem fit



##########
arrow-ipc/src/reader.rs:
##########
@@ -647,9 +649,18 @@ impl<R: Read + Seek> FileReader<R> {
         reader.seek(SeekFrom::End(-10 - footer_len as i64))?;
         reader.read_exact(&mut footer_data)?;
 
-        let footer = crate::root_as_footer(&footer_data[..]).map_err(|err| {
-            ArrowError::IoError(format!("Unable to get root as footer: {err:?}"))
-        })?;
+        // construct verifier options that reflect actual number of columns
+        // in file and avoid an error if the file contains more than 1M rows
+        let verifier_options = VerifierOptions {
+            max_depth: 128,
+            max_tables: footer_len as usize * 8,

Review Comment:
   I think some explanation of where this number is coming from is probably not remiss



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org