You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Nate Bauernfeind <na...@deephaven.io> on 2021/07/24 00:28:05 UTC

WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

In flight.proto [1] it states that the encoded bytes are as described in
the flatbuffer schema.

```
/*
 * Wrap the result of a getSchema call
 */
message SchemaResult {
  // schema of the dataset as described in Schema.fbs::Schema.
  bytes schema = 1;
}
```

However, both this schema and the schema on the flight info are actually
encoded like an IPC file stream message.

They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a 4-byte
message size, followed by a Message wrapping a Schema. See [2] and [3] for
evidence in the Java client.

Is this an accidental bug that has propagated to all client implementations?
Or was it intentional and I should submit a PR to update the comments?

Thanks,
Nate
[1] https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
[2]
https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
[3]
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
--

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by David Li <li...@apache.org>.
Sure, I've added you to the JIRA contributors role (so you can self-assign in the future) and assigned the JIRA to you. Thanks for taking this on.

-David

On Mon, Jul 26, 2021, at 11:06, Nate Bauernfeind wrote:
> David, do you mind assigning to me? I'll get a PR up within the next two
> days. The JIRA username is 'nbauernfeind'.
> 
> On Mon, Jul 26, 2021 at 8:41 AM David Li <li...@apache.org> wrote:
> 
> > I filed ARROW-13449 for this, thanks for the clarification.
> >
> > -David
> >
> > On Mon, Jul 26, 2021, at 10:25, Wes McKinney wrote:
> > > IIRC, we did intend to encapsulate the Schema in an IPC message.
> > > Updating the docs / comments in the Flight.proto is the right thing to
> > > do.
> > >
> > > On Sun, Jul 25, 2021 at 12:47 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> > > >
> > > > I think if all reference implementations are doing this the same way we
> > > > should update the docs and I don't think a vote is necessary.
> > > >
> > > > On Sun, Jul 25, 2021 at 10:27 AM David Li <li...@apache.org> wrote:
> > > >
> > > > > Hey Nate,
> > > > >
> > > > > Good catch. I would say it was intentional to use an IPC message
> > (while
> > > > > the length is redundant, it also contains the metadata version and
> > custom
> > > > > metadata), and the comment is a little vague. I'm not sure if we
> > need a
> > > > > vote to update this since it is changing files in the format dir.
> > > > >
> > > > > I did check the Java implementation quickly and even in the initial
> > > > > version, the schema is IPC-encapsulated[1].
> > > > >
> > > > > -David
> > > > >
> > > > > [1]:
> > > > >
> > https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106
> > > > >
> > > > > On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> > > > > > In flight.proto [1] it states that the encoded bytes are as
> > described in
> > > > > > the flatbuffer schema.
> > > > > >
> > > > > > ```
> > > > > > /*
> > > > > > * Wrap the result of a getSchema call
> > > > > > */
> > > > > > message SchemaResult {
> > > > > >   // schema of the dataset as described in Schema.fbs::Schema.
> > > > > >   bytes schema = 1;
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > However, both this schema and the schema on the flight info are
> > actually
> > > > > > encoded like an IPC file stream message.
> > > > > >
> > > > > > They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a
> > 4-byte
> > > > > > message size, followed by a Message wrapping a Schema. See [2] and
> > [3]
> > > > > for
> > > > > > evidence in the Java client.
> > > > > >
> > > > > > Is this an accidental bug that has propagated to all client
> > > > > implementations?
> > > > > > Or was it intentional and I should submit a PR to update the
> > comments?
> > > > > >
> > > > > > Thanks,
> > > > > > Nate
> > > > > > [1]
> > https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> > > > > > [2]
> > > > > >
> > > > >
> > https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> > > > > > [3]
> > > > > >
> > > > >
> > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> > > > > > --
> > > > > >
> > > > >
> > >
> >
> 
> 
> --
> 

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by Nate Bauernfeind <na...@deephaven.io>.
David, do you mind assigning to me? I'll get a PR up within the next two
days. The JIRA username is 'nbauernfeind'.

On Mon, Jul 26, 2021 at 8:41 AM David Li <li...@apache.org> wrote:

> I filed ARROW-13449 for this, thanks for the clarification.
>
> -David
>
> On Mon, Jul 26, 2021, at 10:25, Wes McKinney wrote:
> > IIRC, we did intend to encapsulate the Schema in an IPC message.
> > Updating the docs / comments in the Flight.proto is the right thing to
> > do.
> >
> > On Sun, Jul 25, 2021 at 12:47 PM Micah Kornfield <em...@gmail.com>
> wrote:
> > >
> > > I think if all reference implementations are doing this the same way we
> > > should update the docs and I don't think a vote is necessary.
> > >
> > > On Sun, Jul 25, 2021 at 10:27 AM David Li <li...@apache.org> wrote:
> > >
> > > > Hey Nate,
> > > >
> > > > Good catch. I would say it was intentional to use an IPC message
> (while
> > > > the length is redundant, it also contains the metadata version and
> custom
> > > > metadata), and the comment is a little vague. I'm not sure if we
> need a
> > > > vote to update this since it is changing files in the format dir.
> > > >
> > > > I did check the Java implementation quickly and even in the initial
> > > > version, the schema is IPC-encapsulated[1].
> > > >
> > > > -David
> > > >
> > > > [1]:
> > > >
> https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106
> > > >
> > > > On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> > > > > In flight.proto [1] it states that the encoded bytes are as
> described in
> > > > > the flatbuffer schema.
> > > > >
> > > > > ```
> > > > > /*
> > > > > * Wrap the result of a getSchema call
> > > > > */
> > > > > message SchemaResult {
> > > > >   // schema of the dataset as described in Schema.fbs::Schema.
> > > > >   bytes schema = 1;
> > > > > }
> > > > > ```
> > > > >
> > > > > However, both this schema and the schema on the flight info are
> actually
> > > > > encoded like an IPC file stream message.
> > > > >
> > > > > They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a
> 4-byte
> > > > > message size, followed by a Message wrapping a Schema. See [2] and
> [3]
> > > > for
> > > > > evidence in the Java client.
> > > > >
> > > > > Is this an accidental bug that has propagated to all client
> > > > implementations?
> > > > > Or was it intentional and I should submit a PR to update the
> comments?
> > > > >
> > > > > Thanks,
> > > > > Nate
> > > > > [1]
> https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> > > > > [2]
> > > > >
> > > >
> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> > > > > [3]
> > > > >
> > > >
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> > > > > --
> > > > >
> > > >
> >
>


--

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by David Li <li...@apache.org>.
I filed ARROW-13449 for this, thanks for the clarification.

-David

On Mon, Jul 26, 2021, at 10:25, Wes McKinney wrote:
> IIRC, we did intend to encapsulate the Schema in an IPC message.
> Updating the docs / comments in the Flight.proto is the right thing to
> do.
> 
> On Sun, Jul 25, 2021 at 12:47 PM Micah Kornfield <em...@gmail.com> wrote:
> >
> > I think if all reference implementations are doing this the same way we
> > should update the docs and I don't think a vote is necessary.
> >
> > On Sun, Jul 25, 2021 at 10:27 AM David Li <li...@apache.org> wrote:
> >
> > > Hey Nate,
> > >
> > > Good catch. I would say it was intentional to use an IPC message (while
> > > the length is redundant, it also contains the metadata version and custom
> > > metadata), and the comment is a little vague. I'm not sure if we need a
> > > vote to update this since it is changing files in the format dir.
> > >
> > > I did check the Java implementation quickly and even in the initial
> > > version, the schema is IPC-encapsulated[1].
> > >
> > > -David
> > >
> > > [1]:
> > > https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106
> > >
> > > On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> > > > In flight.proto [1] it states that the encoded bytes are as described in
> > > > the flatbuffer schema.
> > > >
> > > > ```
> > > > /*
> > > > * Wrap the result of a getSchema call
> > > > */
> > > > message SchemaResult {
> > > >   // schema of the dataset as described in Schema.fbs::Schema.
> > > >   bytes schema = 1;
> > > > }
> > > > ```
> > > >
> > > > However, both this schema and the schema on the flight info are actually
> > > > encoded like an IPC file stream message.
> > > >
> > > > They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a 4-byte
> > > > message size, followed by a Message wrapping a Schema. See [2] and [3]
> > > for
> > > > evidence in the Java client.
> > > >
> > > > Is this an accidental bug that has propagated to all client
> > > implementations?
> > > > Or was it intentional and I should submit a PR to update the comments?
> > > >
> > > > Thanks,
> > > > Nate
> > > > [1] https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> > > > [2]
> > > >
> > > https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> > > > [3]
> > > >
> > > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> > > > --
> > > >
> > >
> 

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by Wes McKinney <we...@gmail.com>.
IIRC, we did intend to encapsulate the Schema in an IPC message.
Updating the docs / comments in the Flight.proto is the right thing to
do.

On Sun, Jul 25, 2021 at 12:47 PM Micah Kornfield <em...@gmail.com> wrote:
>
> I think if all reference implementations are doing this the same way we
> should update the docs and I don't think a vote is necessary.
>
> On Sun, Jul 25, 2021 at 10:27 AM David Li <li...@apache.org> wrote:
>
> > Hey Nate,
> >
> > Good catch. I would say it was intentional to use an IPC message (while
> > the length is redundant, it also contains the metadata version and custom
> > metadata), and the comment is a little vague. I'm not sure if we need a
> > vote to update this since it is changing files in the format dir.
> >
> > I did check the Java implementation quickly and even in the initial
> > version, the schema is IPC-encapsulated[1].
> >
> > -David
> >
> > [1]:
> > https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106
> >
> > On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> > > In flight.proto [1] it states that the encoded bytes are as described in
> > > the flatbuffer schema.
> > >
> > > ```
> > > /*
> > > * Wrap the result of a getSchema call
> > > */
> > > message SchemaResult {
> > >   // schema of the dataset as described in Schema.fbs::Schema.
> > >   bytes schema = 1;
> > > }
> > > ```
> > >
> > > However, both this schema and the schema on the flight info are actually
> > > encoded like an IPC file stream message.
> > >
> > > They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a 4-byte
> > > message size, followed by a Message wrapping a Schema. See [2] and [3]
> > for
> > > evidence in the Java client.
> > >
> > > Is this an accidental bug that has propagated to all client
> > implementations?
> > > Or was it intentional and I should submit a PR to update the comments?
> > >
> > > Thanks,
> > > Nate
> > > [1] https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> > > [2]
> > >
> > https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> > > [3]
> > >
> > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> > > --
> > >
> >

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by Micah Kornfield <em...@gmail.com>.
I think if all reference implementations are doing this the same way we
should update the docs and I don't think a vote is necessary.

On Sun, Jul 25, 2021 at 10:27 AM David Li <li...@apache.org> wrote:

> Hey Nate,
>
> Good catch. I would say it was intentional to use an IPC message (while
> the length is redundant, it also contains the metadata version and custom
> metadata), and the comment is a little vague. I'm not sure if we need a
> vote to update this since it is changing files in the format dir.
>
> I did check the Java implementation quickly and even in the initial
> version, the schema is IPC-encapsulated[1].
>
> -David
>
> [1]:
> https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106
>
> On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> > In flight.proto [1] it states that the encoded bytes are as described in
> > the flatbuffer schema.
> >
> > ```
> > /*
> > * Wrap the result of a getSchema call
> > */
> > message SchemaResult {
> >   // schema of the dataset as described in Schema.fbs::Schema.
> >   bytes schema = 1;
> > }
> > ```
> >
> > However, both this schema and the schema on the flight info are actually
> > encoded like an IPC file stream message.
> >
> > They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a 4-byte
> > message size, followed by a Message wrapping a Schema. See [2] and [3]
> for
> > evidence in the Java client.
> >
> > Is this an accidental bug that has propagated to all client
> implementations?
> > Or was it intentional and I should submit a PR to update the comments?
> >
> > Thanks,
> > Nate
> > [1] https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> > [2]
> >
> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> > [3]
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> > --
> >
>

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

Posted by David Li <li...@apache.org>.
Hey Nate,

Good catch. I would say it was intentional to use an IPC message (while the length is redundant, it also contains the metadata version and custom metadata), and the comment is a little vague. I'm not sure if we need a vote to update this since it is changing files in the format dir.

I did check the Java implementation quickly and even in the initial version, the schema is IPC-encapsulated[1].

-David

[1]: https://github.com/apache/arrow/commit/b3cd616f6ce39e9a820b658ba2f2e2fd3b153555#diff-80e8aef2d6b475949641689fe8d686db2cc1d6703cafe8f6f0dc120b04f964b7R106

On Fri, Jul 23, 2021, at 20:28, Nate Bauernfeind wrote:
> In flight.proto [1] it states that the encoded bytes are as described in
> the flatbuffer schema.
> 
> ```
> /*
> * Wrap the result of a getSchema call
> */
> message SchemaResult {
>   // schema of the dataset as described in Schema.fbs::Schema.
>   bytes schema = 1;
> }
> ```
> 
> However, both this schema and the schema on the flight info are actually
> encoded like an IPC file stream message.
> 
> They start with a 4-byte IPC_CONTINUATION_TOKEN, followed by a 4-byte
> message size, followed by a Message wrapping a Schema. See [2] and [3] for
> evidence in the Java client.
> 
> Is this an accidental bug that has propagated to all client implementations?
> Or was it intentional and I should submit a PR to update the comments?
> 
> Thanks,
> Nate
> [1] https://github.com/apache/arrow/blob/master/format/Flight.proto#L195
> [2]
> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightInfo.java#L141
> [3]
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L161
> --
>