You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/09/18 20:41:00 UTC

[jira] [Comment Edited] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization

    [ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932814#comment-16932814 ] 

Wes McKinney edited comment on ARROW-5377 at 9/18/19 8:40 PM:
--------------------------------------------------------------

This is still incomplete -- having {{IpcPayload}} has gotten us most of the way there. I think we only need to implement a function to return the exact encapsulated message size given an {{IpcPayload}}, so that an appropriate piece of memory can be allocated. 


was (Author: wesmckinn):
This is still incomplete -- having {{IpcPayload} has gotten us most of the way there. I think we only need to implement a function to return the exact encapsulated message size given an {{IpcPayload}}, so that an appropriate piece of memory can be allocated. 

> [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5377
>                 URL: https://issues.apache.org/jira/browse/ARROW-5377
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> As discussed in recent mailing list thread
> https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E
> The only viable process at the moment for getting an accurate report of stream size is to write a simulated stream using {{MockOutputStream}}. This is suboptimal for a couple of reasons:
> * Flatbuffers metadata must be created twice
> * Record batch disassembly into IpcPayload must be performed twice
> It seems like an interface with a very constrained public API could be provided to deconstruct a sequence of RecordBatches and report the size of the produced IPC stream (based on metadata sizes, and padding), and then this deconstructed set of IPC payloads can be written out to a stream (e.g. using {{FixedSizeBufferWriter}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)