You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/05/17 19:19:05 UTC
[jira] [Created] (ARROW-1047) [Java] Add generalized stream writer
and reader interfaces that are decoupled from IO / message framing
Wes McKinney created ARROW-1047:
-----------------------------------
Summary: [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
Key: ARROW-1047
URL: https://issues.apache.org/jira/browse/ARROW-1047
Project: Apache Arrow
Issue Type: New Feature
Components: Java - Vectors
Reporter: Wes McKinney
cc [~julienledem] [~elahrvivaz] [~nongli]
The ArrowWriter https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java accepts a WriteableByteChannel where the stream is written
It would be useful to be able to support other kinds of message framing and transport, like GRPC or HTTP. So rather than writing a complete Arrow stream as a single contiguous byte stream, the component messages (schema, dictionaries, and record batches) would be framed as separate messages in the underlying protocol.
So if we were using ProtocolBuffers and gRPC as the underlying transport for the stream, we could encapsulate components of an Arrow stream in objects like:
{code:language=protobuf}
message ArrowMessagePB {
required bytes serialized_data;
}
{code}
If the transport supports zero copy, that is obviously better than serializing then parsing a protocol buffer.
We should do this work in C++ as well to support more flexible stream transport.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)