You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Nate Jones <Na...@twosigma.com> on 2023/02/03 21:50:42 UTC

[FLIGHT] Question about Flight Protocol Usage

Hello,

We've been using the Flight protocol similar to the way that the read path is described in documentation<https://arrow.apache.org/docs/format/Flight.html#downloading-data>. That is, services have a separate metadata server (at least logically separated such that a network round trip occurs for GetFlightInfo), which returns FlightInfo to be used to access data server(s). We follow a similar pattern for writes and exchanges, as well.

While the separate metadata concept is crucial for certain applications, we think other use cases could be made much simpler by skipping the metadata step altogether - in this case, clients would craft their own Tickets and talk directly to data servers for reads, writes, and exchanges. This would be nice when we're just looking for a "normal gRPC streaming call but with the benefits of Flight." For example, some services have a metadata server that returns FlightInfo that simply points clients back to itself, resulting in an unnecessary round trip since the GetFlightInfo is essentially a “noop” here.

I notice in the docs the statement "Of course, applications may ignore compatibility and simply treat the Flight RPC methods as low-level building blocks for their own purposes." Despite this, I wanted to reach out to see if there are any reference use cases that use Flight in this way. Are there any concerns that come to mind when adapting the Flight pattern like this?

Thanks,
Nate

Re: [FLIGHT] Question about Flight Protocol Usage

Posted by Paul Whalen <pg...@gmail.com>.
My team uses Flight this way as well.  We just use DoGet, DoPut, or
DoExchange, depending on whether we're reading a stream of record batches,
writing a stream, or need some sort of bidirectional communication.  We
also have full control over clients and servers, and simply use JSON in the
Flight metadata to define our protocol.

It took some discussion in order for everyone to get comfortable using
Flight, as at first look it does attempt to solve for much more than we
need it to.  I almost wish Flight was less ambitious, and instead a thin
wrapper on top of core RPC patterns that people know from gRPC (unary,
server streaming, client streaming, and bidi streaming), as that might make
it seem more accessible.

That said, we are believers in the concepts Flight encourages, where a
client asks one server where data is, and retrieves it from another server
- we just don't see value in using Flight to implement these, instead
preferring "vanilla" gRPC.

Paul


On Fri, Feb 3, 2023 at 5:44 PM David Li <li...@apache.org> wrote:

> Yes, there's really no problem if you just want to use Flight as a pipe
> for Arrow data.
>
> (There's also a proposal to inline data directly into FlightInfo, if you
> want to keep the separate metadata.)
>
> On Fri, Feb 3, 2023, at 17:44, Aldrin wrote:
> > I am planning on doing something similar, but I don't have a concrete
> > reference to point you to.
> >
> > In my case, I am treating the metadata as something that is re-usable
> (e.g.
> > the FlightInfo is not something that expires after the first use).
> > This makes the most sense when the information in FlightInfo is something
> > that directly names the data objects you need to retrieve. For
> > me, I am only expecting Flight communications to come from a library
> that I
> > have written for use by others, therefore I will have full control
> > of the interactions between a FlightClient and a FlightServer.
> >
> > As you point out, your main concern should probably be protocol
> > compatibility. If you will have control of the client side of
> > communications,
> > then I think there are minimal concerns other than how you design what a
> > Ticket or FlightInfo contains.
> >
> > Aldrin Montana
> > Computer Science PhD Student
> > UC Santa Cruz
> >
> >
> > On Fri, Feb 3, 2023 at 2:18 PM Nate Jones <Na...@twosigma.com>
> wrote:
> >
> >> Hello,
> >>
> >> We've been using the Flight protocol similar to the way that the read
> path
> >> is described in documentation<
> >> https://arrow.apache.org/docs/format/Flight.html#downloading-data>.
> That
> >> is, services have a separate metadata server (at least logically
> separated
> >> such that a network round trip occurs for GetFlightInfo), which returns
> >> FlightInfo to be used to access data server(s). We follow a similar
> pattern
> >> for writes and exchanges, as well.
> >>
> >> While the separate metadata concept is crucial for certain applications,
> >> we think other use cases could be made much simpler by skipping the
> >> metadata step altogether - in this case, clients would craft their own
> >> Tickets and talk directly to data servers for reads, writes, and
> exchanges.
> >> This would be nice when we're just looking for a "normal gRPC streaming
> >> call but with the benefits of Flight." For example, some services have a
> >> metadata server that returns FlightInfo that simply points clients back
> to
> >> itself, resulting in an unnecessary round trip since the GetFlightInfo
> is
> >> essentially a “noop” here.
> >>
> >> I notice in the docs the statement "Of course, applications may ignore
> >> compatibility and simply treat the Flight RPC methods as low-level
> building
> >> blocks for their own purposes." Despite this, I wanted to reach out to
> see
> >> if there are any reference use cases that use Flight in this way. Are
> there
> >> any concerns that come to mind when adapting the Flight pattern like
> this?
> >>
> >> Thanks,
> >> Nate
> >>
>

Re: [FLIGHT] Question about Flight Protocol Usage

Posted by David Li <li...@apache.org>.
Yes, there's really no problem if you just want to use Flight as a pipe for Arrow data.

(There's also a proposal to inline data directly into FlightInfo, if you want to keep the separate metadata.)

On Fri, Feb 3, 2023, at 17:44, Aldrin wrote:
> I am planning on doing something similar, but I don't have a concrete
> reference to point you to.
>
> In my case, I am treating the metadata as something that is re-usable (e.g.
> the FlightInfo is not something that expires after the first use).
> This makes the most sense when the information in FlightInfo is something
> that directly names the data objects you need to retrieve. For
> me, I am only expecting Flight communications to come from a library that I
> have written for use by others, therefore I will have full control
> of the interactions between a FlightClient and a FlightServer.
>
> As you point out, your main concern should probably be protocol
> compatibility. If you will have control of the client side of
> communications,
> then I think there are minimal concerns other than how you design what a
> Ticket or FlightInfo contains.
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Fri, Feb 3, 2023 at 2:18 PM Nate Jones <Na...@twosigma.com> wrote:
>
>> Hello,
>>
>> We've been using the Flight protocol similar to the way that the read path
>> is described in documentation<
>> https://arrow.apache.org/docs/format/Flight.html#downloading-data>. That
>> is, services have a separate metadata server (at least logically separated
>> such that a network round trip occurs for GetFlightInfo), which returns
>> FlightInfo to be used to access data server(s). We follow a similar pattern
>> for writes and exchanges, as well.
>>
>> While the separate metadata concept is crucial for certain applications,
>> we think other use cases could be made much simpler by skipping the
>> metadata step altogether - in this case, clients would craft their own
>> Tickets and talk directly to data servers for reads, writes, and exchanges.
>> This would be nice when we're just looking for a "normal gRPC streaming
>> call but with the benefits of Flight." For example, some services have a
>> metadata server that returns FlightInfo that simply points clients back to
>> itself, resulting in an unnecessary round trip since the GetFlightInfo is
>> essentially a “noop” here.
>>
>> I notice in the docs the statement "Of course, applications may ignore
>> compatibility and simply treat the Flight RPC methods as low-level building
>> blocks for their own purposes." Despite this, I wanted to reach out to see
>> if there are any reference use cases that use Flight in this way. Are there
>> any concerns that come to mind when adapting the Flight pattern like this?
>>
>> Thanks,
>> Nate
>>

Re: [FLIGHT] Question about Flight Protocol Usage

Posted by Aldrin <ak...@ucsc.edu.INVALID>.
I am planning on doing something similar, but I don't have a concrete
reference to point you to.

In my case, I am treating the metadata as something that is re-usable (e.g.
the FlightInfo is not something that expires after the first use).
This makes the most sense when the information in FlightInfo is something
that directly names the data objects you need to retrieve. For
me, I am only expecting Flight communications to come from a library that I
have written for use by others, therefore I will have full control
of the interactions between a FlightClient and a FlightServer.

As you point out, your main concern should probably be protocol
compatibility. If you will have control of the client side of
communications,
then I think there are minimal concerns other than how you design what a
Ticket or FlightInfo contains.

Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


On Fri, Feb 3, 2023 at 2:18 PM Nate Jones <Na...@twosigma.com> wrote:

> Hello,
>
> We've been using the Flight protocol similar to the way that the read path
> is described in documentation<
> https://arrow.apache.org/docs/format/Flight.html#downloading-data>. That
> is, services have a separate metadata server (at least logically separated
> such that a network round trip occurs for GetFlightInfo), which returns
> FlightInfo to be used to access data server(s). We follow a similar pattern
> for writes and exchanges, as well.
>
> While the separate metadata concept is crucial for certain applications,
> we think other use cases could be made much simpler by skipping the
> metadata step altogether - in this case, clients would craft their own
> Tickets and talk directly to data servers for reads, writes, and exchanges.
> This would be nice when we're just looking for a "normal gRPC streaming
> call but with the benefits of Flight." For example, some services have a
> metadata server that returns FlightInfo that simply points clients back to
> itself, resulting in an unnecessary round trip since the GetFlightInfo is
> essentially a “noop” here.
>
> I notice in the docs the statement "Of course, applications may ignore
> compatibility and simply treat the Flight RPC methods as low-level building
> blocks for their own purposes." Despite this, I wanted to reach out to see
> if there are any reference use cases that use Flight in this way. Are there
> any concerns that come to mind when adapting the Flight pattern like this?
>
> Thanks,
> Nate
>