You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jacques Nadeau <ja...@apache.org> on 2019/04/03 15:01:25 UTC

[DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Right now, the ListFlights method returns a stream of FlightGetInfo (to be
renamed FlightInfo). This actually turns out to be quite expensive in many
cases since splits have to be generated. I'd like to propose changing this
method to return a stream of FlightDescriptors instead. What do people
think?

rpc ListFlights(Criteria) returns (stream FlightGetInfo) {}

to

rpc ListFlights(Criteria) returns (stream FlightDescriptor) {}

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Wes McKinney <we...@gmail.com>.
I'm won't be around much the next couple weeks to help, but this
change should be voted on soon so that the implementations can get
refactored sooner rather than later

On Thu, Apr 4, 2019 at 2:59 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Brian,
>
> I would guess that ListFlights would mainly return named datasets
> having a particular PATH per
>
> https://github.com/apache/arrow/blob/master/format/Flight.proto#L176
>
> Presumably if a server provides support for the CMD-type
> FlightDescriptor, that they would have agreed on a .proto or other
> serialization format for their commands.
>
> Shall we put this one to a vote?
>
> Thanks
>
> On Thu, Apr 4, 2019 at 11:29 AM Bryan Cutler <cu...@gmail.com> wrote:
> >
> > This sounds good to me, but if the FlightDescriptors are opaque commands,
> > e.g. protobufs, will the client have to parse a bunch of unknown commands
> > in order to find the right one?
> >
> > On Wed, Apr 3, 2019 at 9:20 AM Jacques Nadeau <ja...@apache.org> wrote:
> >
> > > >
> > > > Can you explain what you call "splits"?
> > > >
> > >
> > > Per Wes's comments, FlightItineraries inside FlightGetInfo.
> > >
> > > also is it possible a service have tons of flights?
> > >
> > >
> > > Yes
> > >
> > >
> > > > if so, some kind of
> > > > pagination need to be done here?
> > >
> > >
> > > Criteria and the stream interface should be sufficient. We need to work on
> > > a formal definition of what criteria means generically.
> > >

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

I would guess that ListFlights would mainly return named datasets
having a particular PATH per

https://github.com/apache/arrow/blob/master/format/Flight.proto#L176

Presumably if a server provides support for the CMD-type
FlightDescriptor, that they would have agreed on a .proto or other
serialization format for their commands.

Shall we put this one to a vote?

Thanks

On Thu, Apr 4, 2019 at 11:29 AM Bryan Cutler <cu...@gmail.com> wrote:
>
> This sounds good to me, but if the FlightDescriptors are opaque commands,
> e.g. protobufs, will the client have to parse a bunch of unknown commands
> in order to find the right one?
>
> On Wed, Apr 3, 2019 at 9:20 AM Jacques Nadeau <ja...@apache.org> wrote:
>
> > >
> > > Can you explain what you call "splits"?
> > >
> >
> > Per Wes's comments, FlightItineraries inside FlightGetInfo.
> >
> > also is it possible a service have tons of flights?
> >
> >
> > Yes
> >
> >
> > > if so, some kind of
> > > pagination need to be done here?
> >
> >
> > Criteria and the stream interface should be sufficient. We need to work on
> > a formal definition of what criteria means generically.
> >

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Bryan Cutler <cu...@gmail.com>.
This sounds good to me, but if the FlightDescriptors are opaque commands,
e.g. protobufs, will the client have to parse a bunch of unknown commands
in order to find the right one?

On Wed, Apr 3, 2019 at 9:20 AM Jacques Nadeau <ja...@apache.org> wrote:

> >
> > Can you explain what you call "splits"?
> >
>
> Per Wes's comments, FlightItineraries inside FlightGetInfo.
>
> also is it possible a service have tons of flights?
>
>
> Yes
>
>
> > if so, some kind of
> > pagination need to be done here?
>
>
> Criteria and the stream interface should be sufficient. We need to work on
> a formal definition of what criteria means generically.
>

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Jacques Nadeau <ja...@apache.org>.
>
> Can you explain what you call "splits"?
>

Per Wes's comments, FlightItineraries inside FlightGetInfo.

also is it possible a service have tons of flights?


Yes


> if so, some kind of
> pagination need to be done here?


Criteria and the stream interface should be sufficient. We need to work on
a formal definition of what criteria means generically.

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by ming zhang <mi...@gmail.com>.
imho, list just show what a service has. does not need to provide detail
information about each one. we already have a separate method to fetch
detail about a flight. optionally we could
change GetFlightInfo(FlightDescriptor) returns (FlightGetInfo) {} to input
a stream so someone could fetch a batch of info.

also is it possible a service have tons of flights? if so, some kind of
pagination need to be done here?

ming

On Wed, Apr 3, 2019 at 11:41 AM Wes McKinney <we...@gmail.com> wrote:

> hi Jacques,
>
> I agree with you -- I had this concern also during implementation that
> the query plans have to be generated both in ListFlights and
> GetFlightInfo.
>
> Antoine -- I think "splits" here means the pieces of a distributed
> dataset. So if a Flight is spread across multiple hosts, then when you
> call ListFlights currently the server has to compute for each flight
> the endpoints and location(s) for each piece.
>
> - Wes
>
> On Wed, Apr 3, 2019 at 10:05 AM Antoine Pitrou <so...@pitrou.net>
> wrote:
> >
> > On Wed, 3 Apr 2019 08:01:25 -0700
> > Jacques Nadeau <ja...@apache.org> wrote:
> > > Right now, the ListFlights method returns a stream of FlightGetInfo
> (to be
> > > renamed FlightInfo). This actually turns out to be quite expensive in
> many
> > > cases since splits have to be generated. I'd like to propose changing
> this
> > > method to return a stream of FlightDescriptors instead. What do people
> > > think?
> > >
> > > rpc ListFlights(Criteria) returns (stream FlightGetInfo) {}
> > >
> > > to
> > >
> > > rpc ListFlights(Criteria) returns (stream FlightDescriptor) {}
> >
> > Can you explain what you call "splits"?
> >
> > Regards
> >
> > Antoine.
> >
> >
>

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Wes McKinney <we...@gmail.com>.
hi Jacques,

I agree with you -- I had this concern also during implementation that
the query plans have to be generated both in ListFlights and
GetFlightInfo.

Antoine -- I think "splits" here means the pieces of a distributed
dataset. So if a Flight is spread across multiple hosts, then when you
call ListFlights currently the server has to compute for each flight
the endpoints and location(s) for each piece.

- Wes

On Wed, Apr 3, 2019 at 10:05 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> On Wed, 3 Apr 2019 08:01:25 -0700
> Jacques Nadeau <ja...@apache.org> wrote:
> > Right now, the ListFlights method returns a stream of FlightGetInfo (to be
> > renamed FlightInfo). This actually turns out to be quite expensive in many
> > cases since splits have to be generated. I'd like to propose changing this
> > method to return a stream of FlightDescriptors instead. What do people
> > think?
> >
> > rpc ListFlights(Criteria) returns (stream FlightGetInfo) {}
> >
> > to
> >
> > rpc ListFlights(Criteria) returns (stream FlightDescriptor) {}
>
> Can you explain what you call "splits"?
>
> Regards
>
> Antoine.
>
>

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

Posted by Antoine Pitrou <so...@pitrou.net>.
On Wed, 3 Apr 2019 08:01:25 -0700
Jacques Nadeau <ja...@apache.org> wrote:
> Right now, the ListFlights method returns a stream of FlightGetInfo (to be
> renamed FlightInfo). This actually turns out to be quite expensive in many
> cases since splits have to be generated. I'd like to propose changing this
> method to return a stream of FlightDescriptors instead. What do people
> think?
> 
> rpc ListFlights(Criteria) returns (stream FlightGetInfo) {}
> 
> to
> 
> rpc ListFlights(Criteria) returns (stream FlightDescriptor) {}

Can you explain what you call "splits"?

Regards

Antoine.