You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/21 10:34:43 UTC

[GitHub] [arrow-rs] tustvold commented on issue #3142: AppendableRecordBatch

tustvold commented on issue #3142:
URL: https://github.com/apache/arrow-rs/issues/3142#issuecomment-1321842532

   Like the general idea, just a couple of comments/questions:
   
   * If the extend payload is RecordBatch, what do you gain by concatenating them together? Why not just store them separately and periodically compact them? What do you gain from a single RecordBatch over sayb`Vec<RecordBatch>`?
   * Similar to the above, why is this an AppendableRecordBatch and not say AppendablePrimitiveArray, etc... This would be more flexible and avoid creating array temporaries
   * I'm not sure how support for booleans would work, unless you can only append multiples of 8
   * You need to know the maximum buffer lengths up front, as you can't realloc the buffers
   * Your comment suggests arrow2 supports this but can't see how, could you point me to it?
   
   
   One potentially simpler way to implement something similar to this would be to add a non-consuming finish method to the builders. This would entail copying the buffers, but in my experience of implementing the write path for IOx, this copy is insignificant in the grand scheme of query execution - even ignoring the heavy hitters like sorts and groups, all kernels involve copying values to an output array. This is especially true if after a certain number of rows you rotate the builders and just keep the immutable RecordBatch, thereby bounding the copy. What do you think?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org