You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ghuls (via GitHub)" <gi...@apache.org> on 2023/06/30 13:04:18 UTC

[GitHub] [arrow-rs] ghuls commented on pull request #4434: fix: Allow reading of arrow files with more than one million columns

ghuls commented on PR #4434:
URL: https://github.com/apache/arrow-rs/pull/4434#issuecomment-1614623064

   > I am a little bit concerned that the flatbuffer table limit exists for a reason, e.g. to prevent a DOS vector. I don't feel confident that we should change the default settings, as I don't feel I have a good enough grasp as to why it is present. I therefore wonder if we can instead allow users to opt-in to looser validation behaviour?
   > 
   > As an aside I would not expect million column schemas to be a good idea in general, in the absence of extremely aggressive projection pushdown the performance will likely be poor. I would definitely encourage people with such schema to perhaps reconsider their schema design...
   
   As far as I understand the pyarrow fix for this issue checks the size of the footer to calculate the maximum number of tables that could be encoded in the footer (https://issues.apache.org/jira/browse/ARROW-11559, on which the arrow2 implementation is based).
   
   I have quite a few (real) files that reach this limit of 1milion columns (max < 2.5 million). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org