You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "amcnulty-fermat (via GitHub)" <gi...@apache.org> on 2023/04/24 10:17:16 UTC

[GitHub] [arrow] amcnulty-fermat opened a new issue, #35300: Arrow IPC format: question on length of record batches

amcnulty-fermat opened a new issue, #35300:
URL: https://github.com/apache/arrow/issues/35300

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Hi Arrow community,
   
   I have a question about RecordBatch messages in the Arrow IPC format (both the streaming and random access/file variants). In a given stream of Arrow IPC messages will the length (which is to say the number of records, not t e number of bytes) ever vary?
   
   For example, is it possible in a given stream of Arrow IPC messages to receive a RecordBatch message containing 10 records, and then subsequently in the same stream, receive a RecordBatch message containing 20 records?
   
   Thanks in advance.
   
   ### Component(s)
   
   Format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35300: Arrow IPC format: question on length of record batches

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35300:
URL: https://github.com/apache/arrow/issues/35300#issuecomment-1524039687

   Yes.  I think the most common case is the end of a file.  For example, if a file has 1 million rows and, for whatever reason, the data producer decided to batch in groups of 300k then the last batch would have only 100k items in it.
   
   However, I'm pretty sure producers are allowed to vary this however they feel like.  For example, if some kind of filtering is applied to a read then you might get a stream of different sized batches based on how many rows happened to meet the particular filter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amcnulty-fermat commented on issue #35300: Arrow IPC format: question on length of record batches

Posted by "amcnulty-fermat (via GitHub)" <gi...@apache.org>.
amcnulty-fermat commented on issue #35300:
URL: https://github.com/apache/arrow/issues/35300#issuecomment-1525004322

   Thanks for following up. I was subsequently able to reproduce the behaviour that I was asking about quite easily using pretty basic inputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amcnulty-fermat closed issue #35300: Arrow IPC format: question on length of record batches

Posted by "amcnulty-fermat (via GitHub)" <gi...@apache.org>.
amcnulty-fermat closed issue #35300: Arrow IPC format: question on length of record batches
URL: https://github.com/apache/arrow/issues/35300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org