You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2020/04/16 18:09:00 UTC

[jira] [Commented] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization

    [ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085144#comment-17085144 ] 

David Li commented on ARROW-5377:
---------------------------------

That would require exposing IpcPayload in the public API, right? Or else, coming up with some set of writer APIs like PrepareWrite/ConfirmWrite to give the application a chance to allocate memory/slice the record batch/etc.

> [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5377
>                 URL: https://issues.apache.org/jira/browse/ARROW-5377
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> As discussed in recent mailing list thread
> https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E
> The only viable process at the moment for getting an accurate report of stream size is to write a simulated stream using {{MockOutputStream}}. This is suboptimal for a couple of reasons:
> * Flatbuffers metadata must be created twice
> * Record batch disassembly into IpcPayload must be performed twice
> It seems like an interface with a very constrained public API could be provided to deconstruct a sequence of RecordBatches and report the size of the produced IPC stream (based on metadata sizes, and padding), and then this deconstructed set of IPC payloads can be written out to a stream (e.g. using {{FixedSizeBufferWriter}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)