You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/04/26 21:00:42 UTC

[GitHub] [arrow] westonpace commented on issue #35300: Arrow IPC format: question on length of record batches

westonpace commented on issue #35300:
URL: https://github.com/apache/arrow/issues/35300#issuecomment-1524039687

   Yes.  I think the most common case is the end of a file.  For example, if a file has 1 million rows and, for whatever reason, the data producer decided to batch in groups of 300k then the last batch would have only 100k items in it.
   
   However, I'm pretty sure producers are allowed to vary this however they feel like.  For example, if some kind of filtering is applied to a read then you might get a stream of different sized batches based on how many rows happened to meet the particular filter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org