You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/09/02 05:03:00 UTC

[jira] [Comment Edited] (ARROW-13690) [Python] Use IPC writing code for pickling RecordBatches

    [ https://issues.apache.org/jira/browse/ARROW-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404169#comment-17404169 ] 

Micah Kornfield edited comment on ARROW-13690 at 9/2/21, 5:02 AM:
------------------------------------------------------------------

This Jira was intended solely for RecrodBatch objects, I think we might want to revisit this for Arrays/ChunkedArrays, but the tradeoffs as noted are less obvious and implementation is more complicated.  RecordBatches shouldn't really need any transformation.


was (Author: emkornfield):
This Jira was intended solely for RecrodBatch objects, I think we might want to revisit this for Arrays/ChunkedArrays, but the tradeoffs as noted are less obvious and implementation is more complicated.  RecordBatches should really need any transformation.

> [Python] Use IPC writing code for pickling RecordBatches
> --------------------------------------------------------
>
>                 Key: ARROW-13690
>                 URL: https://issues.apache.org/jira/browse/ARROW-13690
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Micah Kornfield
>            Priority: Major
>
> For wide schemas in particular the the recursive nature of the currently pickling algorithm for record batches makes it less efficient then using the IPC format (which can be done entirely in C++).
>  
> Consider switching the mechanism to use the IPC format.  I think this can be a backwards compatible change if the current leaving: _reconstruct_record_batch in place if we care about that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)