You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by David Li <li...@gmail.com> on 2019/10/13 20:24:28 UTC

[Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Hi all,

We've been using Flight quite successfully so far, but we have
identified a new use case on the horizon: being able to both send and
retrieve Arrow data within a single RPC call. To that end, I've
written up a proposal for a new RPC method:
https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing

Please let me know if you can't view or comment on the document. I'd
appreciate any feedback; I think this is a relatively straightforward
addition - it is essentially "DoPutThenGet".

This is a format change and would require a vote. I've decided to
table the other format change I had proposed (on DoPut), as it doesn't
functionally change Flight, just the interpretation of the semantics.

Thanks,
David

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Ryan Murray <ry...@dremio.com>.
Cool, makes a ton of sense now. Thanks!

On Tue, Oct 15, 2019 at 3:11 PM David Li <li...@gmail.com> wrote:

> Hey Ryan,
>
> Thanks for the comments.
>
> Concrete example: I've edited the doc to provide a Python strawman.
>
> Sync vs async: while I don't touch on it, you could interleave uploads
> and downloads if you were so inclined. Right now, synchronous APIs
> make this error-prone, e.g. if both client and server wait for each
> other due to an application logic bug. (gRPC doesn't give us the
> ability to have per-read timeouts, only an overall timeout.) As an
> example of this happening with DoPut, see ARROW-6063:
> https://issues.apache.org/jira/browse/ARROW-6063
>
> This is mostly tangential though, eventually we will want to design
> asynchronous APIs for Flight as a whole. A bidirectional stream like
> this (and like DoPut) just makes these pitfalls easier to run into.
>
> Using DoPut+DoGet: I discussed this in the proposal, but the main
> concern is that depending on how you deploy, two separate calls could
> get routed to different instances. Additionally, gRPC has some
> reconnection behaviors; if the server goes away in between the two
> calls, but it then restarts or there is another instance available,
> the client will happily reconnect to the new server without warning.
>
> Thanks,
> David
>
> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > Hey David,
> >
> > I think this proposal makes a lot of sense. I like it and the possibility
> > of remote compute via arrow buffers. One thing that would help me would
> be
> > a concrete example of the API in a real life use case. Also, what would
> the
> > client experience be in terms of sync vs asyc? Would the client block
> till
> > the bidirectional call return ie c = flight.vector_mult(a, b) or would
> the
> > client wait to be signaled that computation was done. If the later how is
> > that different from a DoPut then DoGet? I suppose that this could be
> > implemented without extending the RPC interface but rather by a
> > function/util?
> >
> >
> > Best,
> >
> > Ryan
> >
> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> We've been using Flight quite successfully so far, but we have
> >> identified a new use case on the horizon: being able to both send and
> >> retrieve Arrow data within a single RPC call. To that end, I've
> >> written up a proposal for a new RPC method:
> >>
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>
> >> Please let me know if you can't view or comment on the document. I'd
> >> appreciate any feedback; I think this is a relatively straightforward
> >> addition - it is essentially "DoPutThenGet".
> >>
> >> This is a format change and would require a vote. I've decided to
> >> table the other format change I had proposed (on DoPut), as it doesn't
> >> functionally change Flight, just the interpretation of the semantics.
> >>
> >> Thanks,
> >> David
> >>
> >
> >
> > --
> >
> > Ryan Murray  | Principal Consulting Engineer
> >
> > +447540852009 | rymurr@dremio.com
> >
> > <https://www.dremio.com/>
> > Check out our GitHub <https://www.github.com/dremio>, join our community
> > site <https://community.dremio.com/> & Download Dremio
> > <https://www.dremio.com/download>
> >
>


-- 

Ryan Murray  | Principal Consulting Engineer

+447540852009 | rymurr@dremio.com

<https://www.dremio.com/>
Check out our GitHub <https://www.github.com/dremio>, join our community
site <https://community.dremio.com/> & Download Dremio
<https://www.dremio.com/download>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
I'm supportive of having a bidirectional API for the reasons stated in
the document.

There seem to be some details to work out but probably nothing
insurmountable. It seems that how errors and metadata are handled are
one of the open questions

On Wed, Oct 16, 2019 at 7:13 PM David Li <li...@gmail.com> wrote:
>
> I was definitely considering having control messages without data, and
> I thought that could be encoded by a FlightData with only app_metadata
> set. I think I understand your position now: FlightData should always
> carry (some) data (with optional metadata)?
>
> That makes sense to me, and is consistent with the documentation on
> FlightData in the Protobuf file. I was worried about having a
> redundant metadata field, but oneof prevents that from happening, and
> overall having a clear separation between data and control messages is
> cleaner.
>
> As for using Protobuf's Any: so far, we've refrained from exposing
> Protobuf by using bytes, would we want to change that now?
>
> Best,
> David
>
> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > Hey David,
> >
> > RE: Async: I was trying to match the pattern we use for doget/doput for
> > async. Yes, more thinking java given java grpc's async always pattern.
> >
> > On the comment around the FlightData, I think it is overloading the message
> > to use metadata for this. If I want to send a control message independently
> > of the data message, I would have to define something like an empty flight
> > data message that has custom metadata. Why not support a container object
> > with a oneof{FlightData, Any} in it instead so users can add more data as
> > desired. The default impl could be a noop for the Any messages.
> >
> > On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:
> >
> >> Hi Jacques,
> >>
> >> Thanks for the comments.
> >>
> >> - I do agree DoExchange is a better name!
> >> - FlightData already has metadata fields as a result of prior
> >> proposals, so I don't think we need a new message to carry that kind
> >> of information.
> >> - I like the suggestion of an async handler to handle incoming
> >> messages as the fundamental API; it would actually be quite natural to
> >> implement in Flight/Java. I will note that it's not possible in
> >> C++/Python without spawning a thread, though. (In essence, gRPC-Java
> >> is async-always and gRPC-C++ is sync-always.) There are experimental
> >> C++ APIs that would let us do something similar to Java, but those are
> >> only in relatively recent gRPC versions and are still under
> >> development (contrary to the interceptor APIs which have been around
> >> for quite a while).
> >>
> >> Thanks,
> >> David
> >>
> >> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > I like it. Added some comments to the doc. Might worth discussion here
> >> > depending on your thoughts.
> >> >
> >> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com> wrote:
> >> >
> >> >> Hey Ryan,
> >> >>
> >> >> Thanks for the comments.
> >> >>
> >> >> Concrete example: I've edited the doc to provide a Python strawman.
> >> >>
> >> >> Sync vs async: while I don't touch on it, you could interleave uploads
> >> >> and downloads if you were so inclined. Right now, synchronous APIs
> >> >> make this error-prone, e.g. if both client and server wait for each
> >> >> other due to an application logic bug. (gRPC doesn't give us the
> >> >> ability to have per-read timeouts, only an overall timeout.) As an
> >> >> example of this happening with DoPut, see ARROW-6063:
> >> >> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>
> >> >> This is mostly tangential though, eventually we will want to design
> >> >> asynchronous APIs for Flight as a whole. A bidirectional stream like
> >> >> this (and like DoPut) just makes these pitfalls easier to run into.
> >> >>
> >> >> Using DoPut+DoGet: I discussed this in the proposal, but the main
> >> >> concern is that depending on how you deploy, two separate calls could
> >> >> get routed to different instances. Additionally, gRPC has some
> >> >> reconnection behaviors; if the server goes away in between the two
> >> >> calls, but it then restarts or there is another instance available,
> >> >> the client will happily reconnect to the new server without warning.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> >> > Hey David,
> >> >> >
> >> >> > I think this proposal makes a lot of sense. I like it and the
> >> >> > possibility
> >> >> > of remote compute via arrow buffers. One thing that would help me
> >> would
> >> >> be
> >> >> > a concrete example of the API in a real life use case. Also, what
> >> would
> >> >> the
> >> >> > client experience be in terms of sync vs asyc? Would the client
> >> >> > block
> >> >> till
> >> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or
> >> >> > would
> >> >> the
> >> >> > client wait to be signaled that computation was done. If the later
> >> >> > how
> >> >> > is
> >> >> > that different from a DoPut then DoGet? I suppose that this could be
> >> >> > implemented without extending the RPC interface but rather by a
> >> >> > function/util?
> >> >> >
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Ryan
> >> >> >
> >> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Hi all,
> >> >> >>
> >> >> >> We've been using Flight quite successfully so far, but we have
> >> >> >> identified a new use case on the horizon: being able to both send
> >> >> >> and
> >> >> >> retrieve Arrow data within a single RPC call. To that end, I've
> >> >> >> written up a proposal for a new RPC method:
> >> >> >>
> >> >> >>
> >> >>
> >> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >> >>
> >> >> >> Please let me know if you can't view or comment on the document.
> >> >> >> I'd
> >> >> >> appreciate any feedback; I think this is a relatively
> >> >> >> straightforward
> >> >> >> addition - it is essentially "DoPutThenGet".
> >> >> >>
> >> >> >> This is a format change and would require a vote. I've decided to
> >> >> >> table the other format change I had proposed (on DoPut), as it
> >> doesn't
> >> >> >> functionally change Flight, just the interpretation of the
> >> >> >> semantics.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> David
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Ryan Murray  | Principal Consulting Engineer
> >> >> >
> >> >> > +447540852009 | rymurr@dremio.com
> >> >> >
> >> >> > <https://www.dremio.com/>
> >> >> > Check out our GitHub <https://www.github.com/dremio>, join our
> >> >> > community
> >> >> > site <https://community.dremio.com/> & Download Dremio
> >> >> > <https://www.dremio.com/download>
> >> >> >
> >> >>
> >> >
> >>
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
I agree we should handle the issue of potentially-multiple-streams
separately from the BiDirectional RPC design / implementation

On Thu, Dec 12, 2019 at 2:20 PM David Li <li...@gmail.com> wrote:
>
> Just following up here again, any other thoughts?
>
> I think we do have justifications for potentially separate streams in
> a call, but that's more of an orthogonal question - it doesn't need to
> be addressed here. I do agree that it very much complicates things.
>
> Thanks,
> David
>
> On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > I would generally agree with this. Note that you have the possibility
> > to use unions-of-structs to send record batches with different schemas
> > in the same stream, though with some added complexity on each side
> >
> > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
> >>
> >> I'd vote for explicitly not supported. We should keep our primitives
> >> narrow.
> >>
> >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> >>
> >> > Thanks for the feedback.
> >> >
> >> > I do think if we had explicitly embraced gRPC from the beginning,
> >> > there are a lot of places where things could be made more ergonomic,
> >> > including with the metadata fields. But it would also have locked out
> >> > us of potential future transports.
> >> >
> >> > On another note: I hesitate to put too much into this method, but we
> >> > are looking at use cases where potentially, a client may want to
> >> > upload multiple distinct datasets (with differing schemas). (This is a
> >> > little tentative, and I can get more details...) Right now, each
> >> > logical stream in Flight must have a single, consistent schema; would
> >> > it make sense to look at ways to relax this, or declare this
> >> > explicitly out of scope (and require multiple calls and coordination
> >> > with the deployment topology) in order to accomplish this?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > > Fair enough. I'm okay with the bytes approach and the proposal looks
> >> > > good
> >> > > to me.
> >> > >
> >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I've updated the proposal.
> >> > >>
> >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> > >> errors/metadata, I still think using bytes is preferable:
> >> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> >> > types,
> >> > >> - We wouldn't be able to practically expose the Protobuf field to
> >> > >> C++
> >> > >> users without causing build pains,
> >> > >> - We can't let Python users take advantage of the Protobuf field
> >> > >> without somehow being compatible with the Protobuf wheels (by
> >> > >> linking
> >> > >> to the same version, and doing magic to turn the C++ Protobufs into
> >> > >> the Python ones),
> >> > >> - All our other application-defined fields are already bytes.
> >> > >>
> >> > >> Applications that want structure can encode JSON or Protobuf Any
> >> > >> into
> >> > >> the bytes field themselves, much as you can already do for Ticket,
> >> > >> commands in FlightDescriptors, and application metadata in
> >> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> > >> Any directly, since Any itself is a bytes field with a tag, and must
> >> > >> invoke the Protobuf deserializer again to read the actual message.
> >> > >>
> >> > >> If we decide on using bytes, then I don't think it makes sense to
> >> > >> define a new message with a oneof either, since it would be
> >> > >> redundant.
> >> > >>
> >> > >> Thanks,
> >> > >> David
> >> > >>
> >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > >> > I've been extremely backlogged, I will update the proposal when I
> >> > >> > get
> >> > >> > a chance and reply here when done.
> >> > >> >
> >> > >> > Best,
> >> > >> > David
> >> > >> >
> >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> > >> >> Bumping this discussion since a couple of weeks have passed. It
> >> > >> >> seems
> >> > >> >> there are still some questions here, could we summarize what are
> >> > >> >> the
> >> > >> >> alternatives along with any public API implications so we can try
> >> > >> >> to
> >> > >> >> render a decision?
> >> > >> >>
> >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> >> > >> >> wrote:
> >> > >> >>>
> >> > >> >>> Hi Wes,
> >> > >> >>>
> >> > >> >>> Responses inline:
> >> > >> >>>
> >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > >> >>>
> >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> > >> >>> > <li...@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > >
> >> > >> >>> > > The question is whether to repurpose the existing FlightData
> >> > >> >>> > > structure, and allow for the metadata field to be filled in
> >> > >> >>> > > and
> >> > >> data
> >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> >> > >> FlightData
> >> > >> >>> > > structure in another structure that explicitly distinguishes
> >> > >> between
> >> > >> >>> > > control and data messages.
> >> > >> >>> >
> >> > >> >>> > I'm not super against having metadata-only FlightData with
> >> > >> >>> > empty
> >> > >> body.
> >> > >> >>> > One question to consider is what changes (if any) would need to
> >> > >> >>> > be
> >> > >> >>> > made to public APIs in either scenario.
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> > >> >>> messages
> >> > >> >>> in
> >> > >> >>> the future. This would be a breaking change, but wouldn't change
> >> > >> >>> the
> >> > >> >>> wire
> >> > >> >>> format. I think the APIs could be changed backwards compatibly,
> >> > >> >>> though.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > The other question is how to handle the metadata fields. So
> >> > >> >>> > > far,
> >> > >> >>> > > we've
> >> > >> >>> > > used bytestring fields for application-defined data. This is
> >> > >> >>> > > workable
> >> > >> >>> > > if you want to use Protobuf to define the contents of those
> >> > >> >>> > > fields,
> >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> > >> >>> > > bytestring
> >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> > >> >>> > > dynamically
> >> > >> >>> > > typed
> >> > >> >>> > > field, this would be more convenient, but then we'd be
> >> > >> >>> > > exposing
> >> > >> >>> > > Protobuf types. We could alternatively use a combination of
> >> > >> >>> > > a
> >> > >> >>> > > type
> >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> >> > >> >>> > > Any
> >> > >> >>> > > type
> >> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner
> >> > >> >>> > > in
> >> > >> any
> >> > >> >>> > > of the language APIs, though.
> >> > >> >>> >
> >> > >> >>> > Leaving the deserialization of the app metadata to the
> >> > >> >>> > particular
> >> > >> >>> > Flight implementation seems on first principles like the most
> >> > >> flexible
> >> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >> > >> >>> > protobuf?
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> >> > would
> >> > >> >>> have
> >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >> > >> >>> question
> >> > >> >>> would just be how to expose this (would be easier in Java, harder
> >> > >> >>> in
> >> > >> >>> C++).
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > David
> >> > >> >>> > >
> >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> > >> >>> > > >
> >> > >> >>> > > > Can one of you explain what is being proposed in
> >> > >> >>> > > > non-protobuf
> >> > >> >>> > > > terms?
> >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >> > >> >>> > > >
> >> > >> >>> > > > Regards
> >> > >> >>> > > >
> >> > >> >>> > > > Antoine.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> >> > just
> >> > >> be
> >> > >> >>> > > >> application-level logic. (The official guide doesn't even
> >> > >> mention
> >> > >> >>> > > >> it
> >> > >> >>> > > >> in the encoding docs; I found
> >> > >> >>> > > >>
> >> > >> >>> >
> >> > >>
> >> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> > >> >>> > > >> as well.)
> >> > >> >>> > > >>
> >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> > >> >>> > > >> essentially
> >> > >> >>> > > >> inlining
> >> > >> >>> > > >> the definition of Any, e.g.
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightMessage {
> >> > >> >>> > > >>   oneof message {
> >> > >> >>> > > >>     FlightData data = 1;
> >> > >> >>> > > >>     FlightAny metadata = 2;
> >> > >> >>> > > >>   }
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightAny {
> >> > >> >>> > > >>   string type = 1;
> >> > >> >>> > > >>   bytes data = 2;
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> Is this correct?
> >> > >> >>> > > >>
> >> > >> >>> > > >> It might be nice to consider the wrapper message for
> >> > >> >>> > > >> DoGet/DoPut
> >> > >> >>> > > >> as
> >> > >> >>> > > >> well, but at that point, I'd rather we be consistent with
> >> > >> >>> > > >> all
> >> > >> >>> > > >> of
> >> > >> >>> > > >> them,
> >> > >> >>> > > >> rather than have one of the three methods do its own
> >> > >> >>> > > >> thing.
> >> > >> >>> > > >>
> >> > >> >>> > > >> Thanks,
> >> > >> >>> > > >> David
> >> > >> >>> > > >>
> >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> >> > >> >>> > > >>> without
> >> > >> >>> > > >>> exposing
> >> > >> >>> > the
> >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> > >> >>> > > >>> expose
> >> > >> >>> > > >>> as
> >> > >> >>> > > >>> two
> >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >> > >> >>> > > >>> whatever
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> people
> >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
> >> > >> >>> > > >>> user
> >> > >> >>> > > >>> could
> >> > >> >>> > > >>> use
> >> > >> >>> > > >>> any
> >> > >> >>> > > >>> with it easily but they could also use any other
> >> > >> >>> > > >>> mechanism).
> >> > >> >>> > > >>> At
> >> > >> >>> > least in
> >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> > >> language
> >> > >> >>> > > >>> bindings
> >> > >> >>> > > >>> less diy?
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >> > >> >>> > > >>> metadata
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> it
> >> > >> >>> > just
> >> > >> >>> > > >>> seemed a bit janky.
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> Thinking about the control message/wrapper object thing,
> >> > >> >>> > > >>> I
> >> > >> >>> > > >>> wonder
> >> > >> >>> > > >>> if
> >> > >> >>> > we
> >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> we
> >> > >> >>> > think it
> >> > >> >>> > > >>> is
> >> > >> >>> > > >>> a good idea...
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> > >> li.davidm96@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > > >>>
> >> > >> >>> > > >>>> I was definitely considering having control messages
> >> > without
> >> > >> >>> > > >>>> data,
> >> > >> >>> > and
> >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> >> > >> >>> > > >>>> only
> >> > >> >>> > app_metadata
> >> > >> >>> > > >>>> set. I think I understand your position now: FlightData
> >> > >> >>> > > >>>> should
> >> > >> >>> > always
> >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> > >> >>> > > >>>> documentation
> >> > >> >>> > > >>>> on
> >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> >> > >> >>> > > >>>> having
> >> > >> >>> > > >>>> a
> >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> > >> >>> > > >>>> happening,
> >> > >> >>> > and
> >> > >> >>> > > >>>> overall having a clear separation between data and
> >> > >> >>> > > >>>> control
> >> > >> >>> > > >>>> messages
> >> > >> >>> > is
> >> > >> >>> > > >>>> cleaner.
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> >> > >> >>> > > >>>> from
> >> > >> >>> > > >>>> exposing
> >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
> >> > >> >>> > > >>>> now?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> Best,
> >> > >> >>> > > >>>> David
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>>>> Hey David,
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> doget/doput
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> >> > >> >>> > > >>>>> always
> >> > >> >>> > pattern.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> > >> >>> > > >>>>> overloading
> >> > >> >>> > > >>>>> the
> >> > >> >>> > > >>>> message
> >> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> > >> >>> > > >>>>> message
> >> > >> >>> > > >>>> independently
> >> > >> >>> > > >>>>> of the data message, I would have to define something
> >> > >> >>> > > >>>>> like
> >> > >> >>> > > >>>>> an
> >> > >> >>> > > >>>>> empty
> >> > >> >>> > > >>>> flight
> >> > >> >>> > > >>>>> data message that has custom metadata. Why not support
> >> > >> >>> > > >>>>> a
> >> > >> >>> > > >>>>> container
> >> > >> >>> > > >>>>> object
> >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
> >> > >> >>> > > >>>>> can
> >> > >> >>> > > >>>>> add
> >> > >> >>> > > >>>>> more
> >> > >> >>> > data
> >> > >> >>> > > >>>>> as
> >> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> > >> >>> > > >>>>> messages.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> > >> >>> > > >>>>> <li...@gmail.com>
> >> > >> >>> > > >>>>> wrote:
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>>> Hi Jacques,
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
> >> > >> >>> > > >>>>>> of
> >> > >> prior
> >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> >> > carry
> >> > >> >>> > > >>>>>> that
> >> > >> >>> > kind
> >> > >> >>> > > >>>>>> of information.
> >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >> > >> >>> > > >>>>>> incoming
> >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> >> > >> >>> > > >>>>>> quite
> >> > >> >>> > natural
> >> > >> >>> > > >>>>>> to
> >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> > >> >>> > > >>>>>> possible
> >> > >> >>> > > >>>>>> in
> >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> >> > essence,
> >> > >> >>> > gRPC-Java
> >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > experimental
> >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> >> > >> >>> > > >>>>>> Java,
> >> > >> >>> > > >>>>>> but
> >> > >> >>> > > >>>>>> those
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> >> > >> >>> > > >>>>>> under
> >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
> >> > >> >>> > > >>>>>> have
> >> > >> been
> >> > >> >>> > around
> >> > >> >>> > > >>>>>> for quite a while).
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks,
> >> > >> >>> > > >>>>>> David
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> >> > >> >>> > > >>>>>>> worth
> >> > >> >>> > > >>>>>>> discussion
> >> > >> >>> > > >>>>>>> here
> >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> > >> >>> > > >>>> wrote:
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> > >> >>> > > >>>>>>>> Python
> >> > >> >>> > strawman.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >> > >> >>> > > >>>>>>>> interleave
> >> > >> >>> > > >>>> uploads
> >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> > >> >>> > > >>>>>>>> synchronous
> >> > >> >>> > APIs
> >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> >> > >> >>> > > >>>>>>>> server
> >> > >> >>> > > >>>>>>>> wait
> >> > >> >>> > > >>>>>>>> for
> >> > >> >>> > each
> >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >> > >> >>> > > >>>>>>>> doesn't
> >> > >> >>> > > >>>>>>>> give
> >> > >> >>> > > >>>>>>>> us
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> > >> >>> > > >>>>>>>> timeout.)
> >> > >> >>> > > >>>>>>>> As
> >> > >> >>> > an
> >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> >> > >> >>> > > >>>>>>>> will
> >> > >> >>> > > >>>>>>>> want
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > design
> >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> > bidirectional
> >> > >> >>> > > >>>>>>>> stream
> >> > >> >>> > > >>>>>>>> like
> >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> >> > >> >>> > > >>>>>>>> easier
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > > >>>>>>>> run
> >> > >> >>> > into.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> > >> >>> > > >>>>>>>> proposal,
> >> > but
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > main
> >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> > >> >>> > > >>>>>>>> separate
> >> > >> >>> > > >>>>>>>> calls
> >> > >> >>> > > >>>>>>>> could
> >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> >> > >> >>> > > >>>>>>>> gRPC
> >> > >> >>> > > >>>>>>>> has
> >> > >> >>> > > >>>>>>>> some
> >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> > >> >>> > > >>>>>>>> between
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > two
> >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> >> > instance
> >> > >> >>> > available,
> >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> >> > >> without
> >> > >> >>> > > >>>>>>>> warning.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>> David
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> > >> >>> > > >>>>>>>>> Hey David,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like
> >> > >> >>> > > >>>>>>>>> it
> >> > >> >>> > > >>>>>>>>> and
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> possibility
> >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>>> help
> >> > >> >>> > me
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> be
> >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
> >> > >> >>> > > >>>>>>>>> case.
> >> > >> >>> > > >>>>>>>>> Also,
> >> > >> >>> > what
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> >> > >> >>> > > >>>>>>>>> Would
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client
> >> > >> >>> > > >>>>>>>>> block
> >> > >> >>> > > >>>>>>>> till
> >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> > >> flight.vector_mult(a,
> >> > >> >>> > > >>>>>>>>> b)
> >> > >> >>> > or
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> >> > >> >>> > > >>>>>>>>> done.
> >> > >> >>> > > >>>>>>>>> If
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> later
> >> > >> >>> > > >>>>>>>>> how
> >> > >> >>> > > >>>>>>>>> is
> >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> this
> >> > >> >>> > could
> >> > >> >>> > > >>>> be
> >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> >> > >> >>> > > >>>>>>>>> but
> >> > >> rather
> >> > >> >>> > > >>>>>>>>> by a
> >> > >> >>> > > >>>>>>>>> function/util?
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Best,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> > >> >>> > li.davidm96@gmail.com>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Hi all,
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> >> > >> >>> > > >>>>>>>>>> far,
> >> > but
> >> > >> we
> >> > >> >>> > > >>>>>>>>>> have
> >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
> >> > >> >>> > > >>>>>>>>>> able
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> both
> >> > >> >>> > > >>>>>>>>>> send
> >> > >> >>> > > >>>>>>>>>> and
> >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
> >> > >> >>> > > >>>>>>>>>> that
> >> > >> >>> > > >>>>>>>>>> end,
> >> > >> >>> > I've
> >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> >
> >> > >>
> >> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
> >> > >> >>> > > >>>>>>>>>> on
> >> > the
> >> > >> >>> > document.
> >> > >> >>> > > >>>>>>>>>> I'd
> >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >> > >> >>> > > >>>>>>>>>> relatively
> >> > >> >>> > > >>>>>>>>>> straightforward
> >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> >> > I've
> >> > >> >>> > > >>>>>>>>>> decided
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> > >> >>> > > >>>>>>>>>> DoPut),
> >> > >> >>> > > >>>>>>>>>> as
> >> > >> >>> > > >>>>>>>>>> it
> >> > >> >>> > > >>>>>> doesn't
> >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> > >> >>> > > >>>>>>>>>> interpretation
> >> > of
> >> > >> >>> > > >>>>>>>>>> the
> >> > >> >>> > > >>>>>>>>>> semantics.
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>>>> David
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> --
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> > >> join
> >> > >> >>> > > >>>>>>>>> our
> >> > >> >>> > > >>>>>>>>> community
> >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> >> > Dremio
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>
> >> > >> >>> > > >
> >> > >> >>> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
Looks like there is consensus about this. I'll start a vote about the
format change soon if no further comments.

On Mon, Mar 23, 2020 at 7:41 AM David Li <li...@gmail.com> wrote:
>
> Hey Wes,
>
> Thanks for the review. I've broken out the format change into this PR:
> https://github.com/apache/arrow/pull/6686
>
> Best,
> David
>
> On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> > hi David,
> >
> > I did a preliminary view and things look to be on the right track
> > there. What do you think about breaking out the protocol changes (and
> > adding appropriate comments) so we can have a vote on that in
> > relatively short order?
> >
> > - Wes
> >
> > On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
> >>
> >> Following up here, I've submitted a draft implementation for C++:
> >> https://github.com/apache/arrow/pull/6656
> >>
> >> The core functionality is there, but there are still holes that I need
> >> to implement. Compared to the draft spec, the client also sends a
> >> FlightDescriptor to begin with, though it's currently not exposed.
> >> This provides consistency with DoGet/DoPut which also send a message
> >> to begin with to describe the stream to the server.
> >>
> >> Andy, I hope this helps clarify whether it meets your needs.
> >>
> >> Best,
> >> David
> >>
> >> On 2/25/20, David Li <li...@gmail.com> wrote:
> >> > Hey Andy,
> >> >
> >> > I've been rather busy unfortunately. I had started on an
> >> > implementation in C++ to provide as part of this discussion, but it's
> >> > not complete. I'm hoping to have more done in March.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> >> I was wondering if there had been any momentum on this (the
> >> >> BiDirectional
> >> >> RPC design)?
> >> >>
> >> >> I'm interested in this for the use case of Apache Spark sending a
> >> >> stream
> >> >> of
> >> >> data to another process to invoke custom code and then receive a
> >> >> stream
> >> >> back with the transformed data.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Andy.
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> >> wrote:
> >> >>
> >> >>> I support moving forward with the current proposal.
> >> >>>
> >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Just following up here again, any other thoughts?
> >> >>> >
> >> >>> > I think we do have justifications for potentially separate streams
> >> >>> > in
> >> >>> > a call, but that's more of an orthogonal question - it doesn't need
> >> >>> > to
> >> >>> > be addressed here. I do agree that it very much complicates things.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > David
> >> >>> >
> >> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > > I would generally agree with this. Note that you have the
> >> >>> > > possibility
> >> >>> > > to use unions-of-structs to send record batches with different
> >> >>> > > schemas
> >> >>> > > in the same stream, though with some added complexity on each
> >> >>> > > side
> >> >>> > >
> >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
> >> >>> > > <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >>
> >> >>> > >> I'd vote for explicitly not supported. We should keep our
> >> >>> > >> primitives
> >> >>> > >> narrow.
> >> >>> > >>
> >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > Thanks for the feedback.
> >> >>> > >> >
> >> >>> > >> > I do think if we had explicitly embraced gRPC from the
> >> >>> > >> > beginning,
> >> >>> > >> > there are a lot of places where things could be made more
> >> >>> > >> > ergonomic,
> >> >>> > >> > including with the metadata fields. But it would also have
> >> >>> > >> > locked
> >> >>> out
> >> >>> > >> > us of potential future transports.
> >> >>> > >> >
> >> >>> > >> > On another note: I hesitate to put too much into this method,
> >> >>> > >> > but
> >> >>> > >> > we
> >> >>> > >> > are looking at use cases where potentially, a client may want
> >> >>> > >> > to
> >> >>> > >> > upload multiple distinct datasets (with differing schemas).
> >> >>> > >> > (This
> >> >>> is a
> >> >>> > >> > little tentative, and I can get more details...) Right now,
> >> >>> > >> > each
> >> >>> > >> > logical stream in Flight must have a single, consistent
> >> >>> > >> > schema;
> >> >>> would
> >> >>> > >> > it make sense to look at ways to relax this, or declare this
> >> >>> > >> > explicitly out of scope (and require multiple calls and
> >> >>> > >> > coordination
> >> >>> > >> > with the deployment topology) in order to accomplish this?
> >> >>> > >> >
> >> >>> > >> > Best,
> >> >>> > >> > David
> >> >>> > >> >
> >> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
> >> >>> > >> > > proposal
> >> >>> looks
> >> >>> > >> > > good
> >> >>> > >> > > to me.
> >> >>> > >> > >
> >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >> >>> > >> > > <li...@gmail.com>
> >> >>> > >> > > wrote:
> >> >>> > >> > >
> >> >>> > >> > >> I've updated the proposal.
> >> >>> > >> > >>
> >> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >> >>> Protobuf
> >> >>> > >> > types,
> >> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> to
> >> >>> > >> > >> C++
> >> >>> > >> > >> users without causing build pains,
> >> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> without somehow being compatible with the Protobuf wheels
> >> >>> > >> > >> (by
> >> >>> > >> > >> linking
> >> >>> > >> > >> to the same version, and doing magic to turn the C++
> >> >>> > >> > >> Protobufs
> >> >>> into
> >> >>> > >> > >> the Python ones),
> >> >>> > >> > >> - All our other application-defined fields are already
> >> >>> > >> > >> bytes.
> >> >>> > >> > >>
> >> >>> > >> > >> Applications that want structure can encode JSON or
> >> >>> > >> > >> Protobuf
> >> >>> > >> > >> Any
> >> >>> > >> > >> into
> >> >>> > >> > >> the bytes field themselves, much as you can already do for
> >> >>> Ticket,
> >> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
> >> >>> > >> > >> than
> >> >>> using
> >> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >> >>> > >> > >> and
> >> >>> > must
> >> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >> >>> message.
> >> >>> > >> > >>
> >> >>> > >> > >> If we decide on using bytes, then I don't think it makes
> >> >>> > >> > >> sense
> >> >>> > >> > >> to
> >> >>> > >> > >> define a new message with a oneof either, since it would be
> >> >>> > >> > >> redundant.
> >> >>> > >> > >>
> >> >>> > >> > >> Thanks,
> >> >>> > >> > >> David
> >> >>> > >> > >>
> >> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> >>> > >> > >> > I've been extremely backlogged, I will update the
> >> >>> > >> > >> > proposal
> >> >>> when I
> >> >>> > >> > >> > get
> >> >>> > >> > >> > a chance and reply here when done.
> >> >>> > >> > >> >
> >> >>> > >> > >> > Best,
> >> >>> > >> > >> > David
> >> >>> > >> > >> >
> >> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >> >>> > >> > >> >> passed.
> >> >>> It
> >> >>> > >> > >> >> seems
> >> >>> > >> > >> >> there are still some questions here, could we summarize
> >> >>> > >> > >> >> what
> >> >>> are
> >> >>> > >> > >> >> the
> >> >>> > >> > >> >> alternatives along with any public API implications so
> >> >>> > >> > >> >> we
> >> >>> > >> > >> >> can
> >> >>> > try
> >> >>> > >> > >> >> to
> >> >>> > >> > >> >> render a decision?
> >> >>> > >> > >> >>
> >> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >> >>> li.davidm96@gmail.com
> >> >>> > >
> >> >>> > >> > >> >> wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Hi Wes,
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Responses inline:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >> >>> wesmckinn@gmail.com>
> >> >>> > >> > wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> >>> > >> > >> >>> > <li...@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >> >>> > >> > >> >>> > > filled
> >> >>> in
> >> >>> > >> > >> >>> > > and
> >> >>> > >> > >> data
> >> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
> >> >>> > >> > >> >>> > > wrap
> >> >>> > >> > >> >>> > > the
> >> >>> > >> > >> FlightData
> >> >>> > >> > >> >>> > > structure in another structure that explicitly
> >> >>> > distinguishes
> >> >>> > >> > >> between
> >> >>> > >> > >> >>> > > control and data messages.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >> >>> > >> > >> >>> > with
> >> >>> > >> > >> >>> > empty
> >> >>> > >> > >> body.
> >> >>> > >> > >> >>> > One question to consider is what changes (if any)
> >> >>> > >> > >> >>> > would
> >> >>> need
> >> >>> > to
> >> >>> > >> > >> >>> > be
> >> >>> > >> > >> >>> > made to public APIs in either scenario.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
> >> >>> > >> > >> >>> empty
> >> >>> data
> >> >>> > >> > >> >>> messages
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> the future. This would be a breaking change, but
> >> >>> > >> > >> >>> wouldn't
> >> >>> > change
> >> >>> > >> > >> >>> the
> >> >>> > >> > >> >>> wire
> >> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >> >>> compatibly,
> >> >>> > >> > >> >>> though.
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >> >>> > >> > >> >>> > > fields.
> >> >>> So
> >> >>> > >> > >> >>> > > far,
> >> >>> > >> > >> >>> > > we've
> >> >>> > >> > >> >>> > > used bytestring fields for application-defined
> >> >>> > >> > >> >>> > > data.
> >> >>> > >> > >> >>> > > This
> >> >>> > is
> >> >>> > >> > >> >>> > > workable
> >> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
> >> >>> > >> > >> >>> > > of
> >> >>> those
> >> >>> > >> > >> >>> > > fields,
> >> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >> >>> > >> > >> >>> > > into/from
> >> >>> the
> >> >>> > >> > >> >>> > > bytestring
> >> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> >>> > >> > >> >>> > > dynamically
> >> >>> > >> > >> >>> > > typed
> >> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
> >> >>> > >> > >> >>> > > be
> >> >>> > >> > >> >>> > > exposing
> >> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >> >>> > >> > >> >>> > > combination
> >> >>> of
> >> >>> > >> > >> >>> > > a
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >> >>> > >> > >> >>> > > Protobuf
> >> >>> > >> > >> >>> > > Any
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
> >> >>> > >> > >> >>> > > actually
> >> >>> > cleaner
> >> >>> > >> > >> >>> > > in
> >> >>> > >> > >> any
> >> >>> > >> > >> >>> > > of the language APIs, though.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > particular
> >> >>> > >> > >> >>> > Flight implementation seems on first principles like
> >> >>> > >> > >> >>> > the
> >> >>> most
> >> >>> > >> > >> flexible
> >> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >> >>> > >> > >> >>> > _must_
> >> >>> be
> >> >>> > a
> >> >>> > >> > >> >>> > protobuf?
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
> >> >>> > >> > >> >>> API,
> >> >>> > >> > >> >>> but
> >> >>> it
> >> >>> > >> > would
> >> >>> > >> > >> >>> have
> >> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >> >>> > >> > >> >>> Then
> >> >>> > the
> >> >>> > >> > >> >>> question
> >> >>> > >> > >> >>> would just be how to expose this (would be easier in
> >> >>> > >> > >> >>> Java,
> >> >>> > harder
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> C++).
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > David
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >> >>> > >> > >> >>> > > wrote:
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >> >>> > >> > >> >>> > > > non-protobuf
> >> >>> > >> > >> >>> > > > terms?
> >> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
> >> >>> > >> > >> >>> > > > use
> >> >>> > Flight.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Regards
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Antoine.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
> >> >>> > >> > >> >>> > > >> it
> >> >>> > would
> >> >>> > >> > just
> >> >>> > >> > >> be
> >> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >> >>> > >> > >> >>> > > >> doesn't
> >> >>> > even
> >> >>> > >> > >> mention
> >> >>> > >> > >> >>> > > >> it
> >> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> >>> > >> > >> >>> > > >> as well.)
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> >>> > >> > >> >>> > > >> essentially
> >> >>> > >> > >> >>> > > >> inlining
> >> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightMessage {
> >> >>> > >> > >> >>> > > >>   oneof message {
> >> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >> >>> > >> > >> >>> > > >>   }
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightAny {
> >> >>> > >> > >> >>> > > >>   string type = 1;
> >> >>> > >> > >> >>> > > >>   bytes data = 2;
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Is this correct?
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >> >>> > >> > >> >>> > > >> for
> >> >>> > >> > >> >>> > > >> DoGet/DoPut
> >> >>> > >> > >> >>> > > >> as
> >> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >> >>> > >> > >> >>> > > >> consistent
> >> >>> > with
> >> >>> > >> > >> >>> > > >> all
> >> >>> > >> > >> >>> > > >> of
> >> >>> > >> > >> >>> > > >> them,
> >> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >> >>> > >> > >> >>> > > >> own
> >> >>> > >> > >> >>> > > >> thing.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Thanks,
> >> >>> > >> > >> >>> > > >> David
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >> >>> wrote:
> >> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >> >>> > >> > >> >>> > > >>> behavior
> >> >>> > >> > >> >>> > > >>> without
> >> >>> > >> > >> >>> > > >>> exposing
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
> >> >>> > >> > >> >>> > > >>> we
> >> >>> could
> >> >>> > >> > >> >>> > > >>> expose
> >> >>> > >> > >> >>> > > >>> as
> >> >>> > >> > >> >>> > > >>> two
> >> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
> >> >>> > >> > >> >>> > > >>> for
> >> >>> > >> > >> >>> > > >>> whatever
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> people
> >> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >> >>> > >> > >> >>> > > >>> (Basically
> >> >>> a
> >> >>> > >> > >> >>> > > >>> user
> >> >>> > >> > >> >>> > > >>> could
> >> >>> > >> > >> >>> > > >>> use
> >> >>> > >> > >> >>> > > >>> any
> >> >>> > >> > >> >>> > > >>> with it easily but they could also use any
> >> >>> > >> > >> >>> > > >>> other
> >> >>> > >> > >> >>> > > >>> mechanism).
> >> >>> > >> > >> >>> > > >>> At
> >> >>> > >> > >> >>> > least in
> >> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
> >> >>> > >> > >> >>> > > >>> Are
> >> >>> other
> >> >>> > >> > >> language
> >> >>> > >> > >> >>> > > >>> bindings
> >> >>> > >> > >> >>> > > >>> less diy?
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
> >> >>> > >> > >> >>> > > >>> +
> >> >>> > >> > >> >>> > > >>> metadata
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> it
> >> >>> > >> > >> >>> > just
> >> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
> >> >>> > >> > >> >>> > > >>> object
> >> >>> > thing,
> >> >>> > >> > >> >>> > > >>> I
> >> >>> > >> > >> >>> > > >>> wonder
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > we
> >> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
> >> >>> > >> > >> >>> > > >>> same
> >> >>> > property
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> we
> >> >>> > >> > >> >>> > think it
> >> >>> > >> > >> >>> > > >>> is
> >> >>> > >> > >> >>> > > >>> a good idea...
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> >>> > >> > >> li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > without
> >> >>> > >> > >> >>> > > >>>> data,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
> >> >>> > >> > >> >>> > > >>>> FlightData
> >> >>> > >> > >> >>> > > >>>> with
> >> >>> > >> > >> >>> > > >>>> only
> >> >>> > >> > >> >>> > app_metadata
> >> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > >>>> should
> >> >>> > >> > >> >>> > always
> >> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >> >>> > >> > >> >>> > > >>>> the
> >> >>> > >> > >> >>> > > >>>> documentation
> >> >>> > >> > >> >>> > > >>>> on
> >> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >> >>> > >> > >> >>> > > >>>> about
> >> >>> > >> > >> >>> > > >>>> having
> >> >>> > >> > >> >>> > > >>>> a
> >> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
> >> >>> > >> > >> >>> > > >>>> that
> >> >>> from
> >> >>> > >> > >> >>> > > >>>> happening,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >> >>> > >> > >> >>> > > >>>> and
> >> >>> > >> > >> >>> > > >>>> control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > >> >>> > is
> >> >>> > >> > >> >>> > > >>>> cleaner.
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >> >>> > >> > >> >>> > > >>>> refrained
> >> >>> > >> > >> >>> > > >>>> from
> >> >>> > >> > >> >>> > > >>>> exposing
> >> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
> >> >>> > >> > >> >>> > > >>>> change
> >> >>> that
> >> >>> > >> > >> >>> > > >>>> now?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> Best,
> >> >>> > >> > >> >>> > > >>>> David
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
> >> >>> > >> > >> >>> > > >>>>> we
> >> >>> > >> > >> >>> > > >>>>> use
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> doget/doput
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
> >> >>> > >> > >> >>> > > >>>>> grpc's
> >> >>> > async
> >> >>> > >> > >> >>> > > >>>>> always
> >> >>> > >> > >> >>> > pattern.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
> >> >>> > >> > >> >>> > > >>>>> it
> >> >>> > >> > >> >>> > > >>>>> is
> >> >>> > >> > >> >>> > > >>>>> overloading
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> > >> > >> >>> > > >>>> message
> >> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >> >>> control
> >> >>> > >> > >> >>> > > >>>>> message
> >> >>> > >> > >> >>> > > >>>> independently
> >> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >> >>> something
> >> >>> > >> > >> >>> > > >>>>> like
> >> >>> > >> > >> >>> > > >>>>> an
> >> >>> > >> > >> >>> > > >>>>> empty
> >> >>> > >> > >> >>> > > >>>> flight
> >> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
> >> >>> > >> > >> >>> > > >>>>> not
> >> >>> > support
> >> >>> > >> > >> >>> > > >>>>> a
> >> >>> > >> > >> >>> > > >>>>> container
> >> >>> > >> > >> >>> > > >>>>> object
> >> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
> >> >>> > >> > >> >>> > > >>>>> so
> >> >>> users
> >> >>> > >> > >> >>> > > >>>>> can
> >> >>> > >> > >> >>> > > >>>>> add
> >> >>> > >> > >> >>> > > >>>>> more
> >> >>> > >> > >> >>> > data
> >> >>> > >> > >> >>> > > >>>>> as
> >> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> Any
> >> >>> > >> > >> >>> > > >>>>> messages.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
> >> >>> > >> > >> >>> > > >>>>>> a
> >> >>> result
> >> >>> > >> > >> >>> > > >>>>>> of
> >> >>> > >> > >> prior
> >> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >> >>> > >> > >> >>> > > >>>>>> message
> >> >>> to
> >> >>> > >> > carry
> >> >>> > >> > >> >>> > > >>>>>> that
> >> >>> > >> > >> >>> > kind
> >> >>> > >> > >> >>> > > >>>>>> of information.
> >> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > handle
> >> >>> > >> > >> >>> > > >>>>>> incoming
> >> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >> >>> > >> > >> >>> > > >>>>>> actually
> >> >>> > be
> >> >>> > >> > >> >>> > > >>>>>> quite
> >> >>> > >> > >> >>> > natural
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
> >> >>> > >> > >> >>> > > >>>>>> it's
> >> >>> not
> >> >>> > >> > >> >>> > > >>>>>> possible
> >> >>> > >> > >> >>> > > >>>>>> in
> >> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
> >> >>> > >> > >> >>> > > >>>>>> though.
> >> >>> > >> > >> >>> > > >>>>>> (In
> >> >>> > >> > essence,
> >> >>> > >> > >> >>> > gRPC-Java
> >> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
> >> >>> > >> > >> >>> > > >>>>>> sync-always.)
> >> >>> There
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > experimental
> >> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
> >> >>> > >> > >> >>> > > >>>>>> similar
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> Java,
> >> >>> > >> > >> >>> > > >>>>>> but
> >> >>> > >> > >> >>> > > >>>>>> those
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > still
> >> >>> > >> > >> >>> > > >>>>>> under
> >> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
> >> >>> > >> > >> >>> > > >>>>>> APIs
> >> >>> which
> >> >>> > >> > >> >>> > > >>>>>> have
> >> >>> > >> > >> been
> >> >>> > >> > >> >>> > around
> >> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >> >>> > >> > >> >>> > > >>>>>>> Might
> >> >>> > >> > >> >>> > > >>>>>>> worth
> >> >>> > >> > >> >>> > > >>>>>>> discussion
> >> >>> > >> > >> >>> > > >>>>>>> here
> >> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >> >>> provide a
> >> >>> > >> > >> >>> > > >>>>>>>> Python
> >> >>> > >> > >> >>> > strawman.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
> >> >>> > >> > >> >>> > > >>>>>>>> you
> >> >>> > could
> >> >>> > >> > >> >>> > > >>>>>>>> interleave
> >> >>> > >> > >> >>> > > >>>> uploads
> >> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
> >> >>> > >> > >> >>> > > >>>>>>>> Right
> >> >>> now,
> >> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >> >>> > >> > >> >>> > APIs
> >> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >> >>> > >> > >> >>> > > >>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>> server
> >> >>> > >> > >> >>> > > >>>>>>>> wait
> >> >>> > >> > >> >>> > > >>>>>>>> for
> >> >>> > >> > >> >>> > each
> >> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
> >> >>> > >> > >> >>> > > >>>>>>>> (gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>> give
> >> >>> > >> > >> >>> > > >>>>>>>> us
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >> >>> overall
> >> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >> >>> > >> > >> >>> > > >>>>>>>> As
> >> >>> > >> > >> >>> > an
> >> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
> >> >>> > >> > >> >>> > > >>>>>>>> eventually
> >> >>> > >> > >> >>> > > >>>>>>>> we
> >> >>> > >> > >> >>> > > >>>>>>>> will
> >> >>> > >> > >> >>> > > >>>>>>>> want
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > design
> >> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> >>> > >> > bidirectional
> >> >>> > >> > >> >>> > > >>>>>>>> stream
> >> >>> > >> > >> >>> > > >>>>>>>> like
> >> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >> >>> > >> > >> >>> > > >>>>>>>> easier
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>> run
> >> >>> > >> > >> >>> > into.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >> >>> > >> > but
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > main
> >> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
> >> >>> > >> > >> >>> > > >>>>>>>> deploy,
> >> >>> > >> > >> >>> > > >>>>>>>> two
> >> >>> > >> > >> >>> > > >>>>>>>> separate
> >> >>> > >> > >> >>> > > >>>>>>>> calls
> >> >>> > >> > >> >>> > > >>>>>>>> could
> >> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> has
> >> >>> > >> > >> >>> > > >>>>>>>> some
> >> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >> >>> > >> > >> >>> > > >>>>>>>> away
> >> >>> in
> >> >>> > >> > >> >>> > > >>>>>>>> between
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > two
> >> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >> >>> > >> > >> >>> > > >>>>>>>> another
> >> >>> > >> > instance
> >> >>> > >> > >> >>> > available,
> >> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
> >> >>> > >> > >> >>> > > >>>>>>>> new
> >> >>> > server
> >> >>> > >> > >> without
> >> >>> > >> > >> >>> > > >>>>>>>> warning.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
> >> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
> >> >>> > >> > >> >>> > > >>>>>>>>> sense.
> >> >>> > >> > >> >>> > > >>>>>>>>> I
> >> >>> > like
> >> >>> > >> > >> >>> > > >>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >> >>> > >> > >> >>> > > >>>>>>>>> thing
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>>> help
> >> >>> > >> > >> >>> > me
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
> >> >>> > >> > >> >>> > > >>>>>>>>> life
> >> >>> use
> >> >>> > >> > >> >>> > > >>>>>>>>> case.
> >> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >> >>> > >> > >> >>> > what
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >> >>> > >> > >> >>> > > >>>>>>>>> Would
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client
> >> >>> > >> > >> >>> > > >>>>>>>>> block
> >> >>> > >> > >> >>> > > >>>>>>>> till
> >> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> >>> > >> > >> flight.vector_mult(a,
> >> >>> > >> > >> >>> > > >>>>>>>>> b)
> >> >>> > >> > >> >>> > or
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
> >> >>> > >> > >> >>> > > >>>>>>>>> computation
> >> >>> > >> > >> >>> > > >>>>>>>>> was
> >> >>> > >> > >> >>> > > >>>>>>>>> done.
> >> >>> > >> > >> >>> > > >>>>>>>>> If
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> later
> >> >>> > >> > >> >>> > > >>>>>>>>> how
> >> >>> > >> > >> >>> > > >>>>>>>>> is
> >> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >> >>> suppose
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> this
> >> >>> > >> > >> >>> > could
> >> >>> > >> > >> >>> > > >>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >> >>> > >> > >> >>> > > >>>>>>>>> interface
> >> >>> > >> > >> >>> > > >>>>>>>>> but
> >> >>> > >> > >> rather
> >> >>> > >> > >> >>> > > >>>>>>>>> by a
> >> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
> >> >>> > >> > >> >>> > > >>>>>>>>> <
> >> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
> >> >>> > >> > >> >>> > > >>>>>>>>>> successfully
> >> >>> > >> > >> >>> > > >>>>>>>>>> so
> >> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >> >>> > >> > but
> >> >>> > >> > >> we
> >> >>> > >> > >> >>> > > >>>>>>>>>> have
> >> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
> >> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
> >> >>> being
> >> >>> > >> > >> >>> > > >>>>>>>>>> able
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> both
> >> >>> > >> > >> >>> > > >>>>>>>>>> send
> >> >>> > >> > >> >>> > > >>>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >> >>> To
> >> >>> > >> > >> >>> > > >>>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >> >>> > >> > >> >>> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> method:
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >> >>> comment
> >> >>> > >> > >> >>> > > >>>>>>>>>> on
> >> >>> > >> > the
> >> >>> > >> > >> >>> > document.
> >> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > vote.
> >> >>> > >> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >> >>> (on
> >> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >> >>> > >> > >> >>> > > >>>>>>>>>> as
> >> >>> > >> > >> >>> > > >>>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >> >>> > >> > of
> >> >>> > >> > >> >>> > > >>>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> --
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
> >> >>> > >> > >> >>> > > >>>>>>>>> Engineer
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> >>> > >> > >> join
> >> >>> > >> > >> >>> > > >>>>>>>>> our
> >> >>> > >> > >> >>> > > >>>>>>>>> community
> >> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >> >>> Download
> >> >>> > >> > Dremio
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>
> >> >>> > >> > >> >
> >> >>> > >> > >>
> >> >>> > >> > >
> >> >>> > >> >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >
> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Wes,

Thanks for the review. I've broken out the format change into this PR:
https://github.com/apache/arrow/pull/6686

Best,
David

On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> hi David,
>
> I did a preliminary view and things look to be on the right track
> there. What do you think about breaking out the protocol changes (and
> adding appropriate comments) so we can have a vote on that in
> relatively short order?
>
> - Wes
>
> On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>>
>> Following up here, I've submitted a draft implementation for C++:
>> https://github.com/apache/arrow/pull/6656
>>
>> The core functionality is there, but there are still holes that I need
>> to implement. Compared to the draft spec, the client also sends a
>> FlightDescriptor to begin with, though it's currently not exposed.
>> This provides consistency with DoGet/DoPut which also send a message
>> to begin with to describe the stream to the server.
>>
>> Andy, I hope this helps clarify whether it meets your needs.
>>
>> Best,
>> David
>>
>> On 2/25/20, David Li <li...@gmail.com> wrote:
>> > Hey Andy,
>> >
>> > I've been rather busy unfortunately. I had started on an
>> > implementation in C++ to provide as part of this discussion, but it's
>> > not complete. I'm hoping to have more done in March.
>> >
>> > Best,
>> > David
>> >
>> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> >> I was wondering if there had been any momentum on this (the
>> >> BiDirectional
>> >> RPC design)?
>> >>
>> >> I'm interested in this for the use case of Apache Spark sending a
>> >> stream
>> >> of
>> >> data to another process to invoke custom code and then receive a
>> >> stream
>> >> back with the transformed data.
>> >>
>> >> Thanks,
>> >>
>> >> Andy.
>> >>
>> >>
>> >>
>> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> >> wrote:
>> >>
>> >>> I support moving forward with the current proposal.
>> >>>
>> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Just following up here again, any other thoughts?
>> >>> >
>> >>> > I think we do have justifications for potentially separate streams
>> >>> > in
>> >>> > a call, but that's more of an orthogonal question - it doesn't need
>> >>> > to
>> >>> > be addressed here. I do agree that it very much complicates things.
>> >>> >
>> >>> > Thanks,
>> >>> > David
>> >>> >
>> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > > I would generally agree with this. Note that you have the
>> >>> > > possibility
>> >>> > > to use unions-of-structs to send record batches with different
>> >>> > > schemas
>> >>> > > in the same stream, though with some added complexity on each
>> >>> > > side
>> >>> > >
>> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
>> >>> > > <ja...@apache.org>
>> >>> > wrote:
>> >>> > >>
>> >>> > >> I'd vote for explicitly not supported. We should keep our
>> >>> > >> primitives
>> >>> > >> narrow.
>> >>> > >>
>> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>> >>> > >> wrote:
>> >>> > >>
>> >>> > >> > Thanks for the feedback.
>> >>> > >> >
>> >>> > >> > I do think if we had explicitly embraced gRPC from the
>> >>> > >> > beginning,
>> >>> > >> > there are a lot of places where things could be made more
>> >>> > >> > ergonomic,
>> >>> > >> > including with the metadata fields. But it would also have
>> >>> > >> > locked
>> >>> out
>> >>> > >> > us of potential future transports.
>> >>> > >> >
>> >>> > >> > On another note: I hesitate to put too much into this method,
>> >>> > >> > but
>> >>> > >> > we
>> >>> > >> > are looking at use cases where potentially, a client may want
>> >>> > >> > to
>> >>> > >> > upload multiple distinct datasets (with differing schemas).
>> >>> > >> > (This
>> >>> is a
>> >>> > >> > little tentative, and I can get more details...) Right now,
>> >>> > >> > each
>> >>> > >> > logical stream in Flight must have a single, consistent
>> >>> > >> > schema;
>> >>> would
>> >>> > >> > it make sense to look at ways to relax this, or declare this
>> >>> > >> > explicitly out of scope (and require multiple calls and
>> >>> > >> > coordination
>> >>> > >> > with the deployment topology) in order to accomplish this?
>> >>> > >> >
>> >>> > >> > Best,
>> >>> > >> > David
>> >>> > >> >
>> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
>> >>> > >> > > proposal
>> >>> looks
>> >>> > >> > > good
>> >>> > >> > > to me.
>> >>> > >> > >
>> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>> >>> > >> > > <li...@gmail.com>
>> >>> > >> > > wrote:
>> >>> > >> > >
>> >>> > >> > >> I've updated the proposal.
>> >>> > >> > >>
>> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> >>> Protobuf
>> >>> > >> > types,
>> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> to
>> >>> > >> > >> C++
>> >>> > >> > >> users without causing build pains,
>> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> without somehow being compatible with the Protobuf wheels
>> >>> > >> > >> (by
>> >>> > >> > >> linking
>> >>> > >> > >> to the same version, and doing magic to turn the C++
>> >>> > >> > >> Protobufs
>> >>> into
>> >>> > >> > >> the Python ones),
>> >>> > >> > >> - All our other application-defined fields are already
>> >>> > >> > >> bytes.
>> >>> > >> > >>
>> >>> > >> > >> Applications that want structure can encode JSON or
>> >>> > >> > >> Protobuf
>> >>> > >> > >> Any
>> >>> > >> > >> into
>> >>> > >> > >> the bytes field themselves, much as you can already do for
>> >>> Ticket,
>> >>> > >> > >> commands in FlightDescriptors, and application metadata in
>> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
>> >>> > >> > >> than
>> >>> using
>> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>> >>> > >> > >> and
>> >>> > must
>> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> >>> message.
>> >>> > >> > >>
>> >>> > >> > >> If we decide on using bytes, then I don't think it makes
>> >>> > >> > >> sense
>> >>> > >> > >> to
>> >>> > >> > >> define a new message with a oneof either, since it would be
>> >>> > >> > >> redundant.
>> >>> > >> > >>
>> >>> > >> > >> Thanks,
>> >>> > >> > >> David
>> >>> > >> > >>
>> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> >>> > >> > >> > I've been extremely backlogged, I will update the
>> >>> > >> > >> > proposal
>> >>> when I
>> >>> > >> > >> > get
>> >>> > >> > >> > a chance and reply here when done.
>> >>> > >> > >> >
>> >>> > >> > >> > Best,
>> >>> > >> > >> > David
>> >>> > >> > >> >
>> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
>> >>> > >> > >> >> passed.
>> >>> It
>> >>> > >> > >> >> seems
>> >>> > >> > >> >> there are still some questions here, could we summarize
>> >>> > >> > >> >> what
>> >>> are
>> >>> > >> > >> >> the
>> >>> > >> > >> >> alternatives along with any public API implications so
>> >>> > >> > >> >> we
>> >>> > >> > >> >> can
>> >>> > try
>> >>> > >> > >> >> to
>> >>> > >> > >> >> render a decision?
>> >>> > >> > >> >>
>> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>> >>> li.davidm96@gmail.com
>> >>> > >
>> >>> > >> > >> >> wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Hi Wes,
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Responses inline:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>> >>> wesmckinn@gmail.com>
>> >>> > >> > wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> >>> > >> > >> >>> > <li...@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > The question is whether to repurpose the existing
>> >>> > FlightData
>> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
>> >>> > >> > >> >>> > > filled
>> >>> in
>> >>> > >> > >> >>> > > and
>> >>> > >> > >> data
>> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
>> >>> > >> > >> >>> > > wrap
>> >>> > >> > >> >>> > > the
>> >>> > >> > >> FlightData
>> >>> > >> > >> >>> > > structure in another structure that explicitly
>> >>> > distinguishes
>> >>> > >> > >> between
>> >>> > >> > >> >>> > > control and data messages.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>> >>> > >> > >> >>> > with
>> >>> > >> > >> >>> > empty
>> >>> > >> > >> body.
>> >>> > >> > >> >>> > One question to consider is what changes (if any)
>> >>> > >> > >> >>> > would
>> >>> need
>> >>> > to
>> >>> > >> > >> >>> > be
>> >>> > >> > >> >>> > made to public APIs in either scenario.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
>> >>> > >> > >> >>> empty
>> >>> data
>> >>> > >> > >> >>> messages
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> the future. This would be a breaking change, but
>> >>> > >> > >> >>> wouldn't
>> >>> > change
>> >>> > >> > >> >>> the
>> >>> > >> > >> >>> wire
>> >>> > >> > >> >>> format. I think the APIs could be changed backwards
>> >>> compatibly,
>> >>> > >> > >> >>> though.
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > The other question is how to handle the metadata
>> >>> > >> > >> >>> > > fields.
>> >>> So
>> >>> > >> > >> >>> > > far,
>> >>> > >> > >> >>> > > we've
>> >>> > >> > >> >>> > > used bytestring fields for application-defined
>> >>> > >> > >> >>> > > data.
>> >>> > >> > >> >>> > > This
>> >>> > is
>> >>> > >> > >> >>> > > workable
>> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
>> >>> > >> > >> >>> > > of
>> >>> those
>> >>> > >> > >> >>> > > fields,
>> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>> >>> > >> > >> >>> > > into/from
>> >>> the
>> >>> > >> > >> >>> > > bytestring
>> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> >>> > >> > >> >>> > > dynamically
>> >>> > >> > >> >>> > > typed
>> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
>> >>> > >> > >> >>> > > be
>> >>> > >> > >> >>> > > exposing
>> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>> >>> > >> > >> >>> > > combination
>> >>> of
>> >>> > >> > >> >>> > > a
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>> >>> > >> > >> >>> > > Protobuf
>> >>> > >> > >> >>> > > Any
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
>> >>> > >> > >> >>> > > actually
>> >>> > cleaner
>> >>> > >> > >> >>> > > in
>> >>> > >> > >> any
>> >>> > >> > >> >>> > > of the language APIs, though.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > particular
>> >>> > >> > >> >>> > Flight implementation seems on first principles like
>> >>> > >> > >> >>> > the
>> >>> most
>> >>> > >> > >> flexible
>> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>> >>> > >> > >> >>> > _must_
>> >>> be
>> >>> > a
>> >>> > >> > >> >>> > protobuf?
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
>> >>> > >> > >> >>> API,
>> >>> > >> > >> >>> but
>> >>> it
>> >>> > >> > would
>> >>> > >> > >> >>> have
>> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>> >>> > >> > >> >>> Then
>> >>> > the
>> >>> > >> > >> >>> question
>> >>> > >> > >> >>> would just be how to expose this (would be easier in
>> >>> > >> > >> >>> Java,
>> >>> > harder
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> C++).
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > David
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>> >>> > >> > >> >>> > > wrote:
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>> >>> > >> > >> >>> > > > non-protobuf
>> >>> > >> > >> >>> > > > terms?
>> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
>> >>> > >> > >> >>> > > > use
>> >>> > Flight.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Regards
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Antoine.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
>> >>> > >> > >> >>> > > >> it
>> >>> > would
>> >>> > >> > just
>> >>> > >> > >> be
>> >>> > >> > >> >>> > > >> application-level logic. (The official guide
>> >>> > >> > >> >>> > > >> doesn't
>> >>> > even
>> >>> > >> > >> mention
>> >>> > >> > >> >>> > > >> it
>> >>> > >> > >> >>> > > >> in the encoding docs; I found
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > >> > >> >>> > > >> as well.)
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> >>> > >> > >> >>> > > >> essentially
>> >>> > >> > >> >>> > > >> inlining
>> >>> > >> > >> >>> > > >> the definition of Any, e.g.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightMessage {
>> >>> > >> > >> >>> > > >>   oneof message {
>> >>> > >> > >> >>> > > >>     FlightData data = 1;
>> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>> >>> > >> > >> >>> > > >>   }
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightAny {
>> >>> > >> > >> >>> > > >>   string type = 1;
>> >>> > >> > >> >>> > > >>   bytes data = 2;
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Is this correct?
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>> >>> > >> > >> >>> > > >> for
>> >>> > >> > >> >>> > > >> DoGet/DoPut
>> >>> > >> > >> >>> > > >> as
>> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>> >>> > >> > >> >>> > > >> consistent
>> >>> > with
>> >>> > >> > >> >>> > > >> all
>> >>> > >> > >> >>> > > >> of
>> >>> > >> > >> >>> > > >> them,
>> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
>> >>> > >> > >> >>> > > >> own
>> >>> > >> > >> >>> > > >> thing.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Thanks,
>> >>> > >> > >> >>> > > >> David
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>> >>> wrote:
>> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>> >>> > >> > >> >>> > > >>> behavior
>> >>> > >> > >> >>> > > >>> without
>> >>> > >> > >> >>> > > >>> exposing
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
>> >>> > >> > >> >>> > > >>> we
>> >>> could
>> >>> > >> > >> >>> > > >>> expose
>> >>> > >> > >> >>> > > >>> as
>> >>> > >> > >> >>> > > >>> two
>> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
>> >>> > >> > >> >>> > > >>> for
>> >>> > >> > >> >>> > > >>> whatever
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> people
>> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>> >>> > >> > >> >>> > > >>> (Basically
>> >>> a
>> >>> > >> > >> >>> > > >>> user
>> >>> > >> > >> >>> > > >>> could
>> >>> > >> > >> >>> > > >>> use
>> >>> > >> > >> >>> > > >>> any
>> >>> > >> > >> >>> > > >>> with it easily but they could also use any
>> >>> > >> > >> >>> > > >>> other
>> >>> > >> > >> >>> > > >>> mechanism).
>> >>> > >> > >> >>> > > >>> At
>> >>> > >> > >> >>> > least in
>> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
>> >>> > >> > >> >>> > > >>> Are
>> >>> other
>> >>> > >> > >> language
>> >>> > >> > >> >>> > > >>> bindings
>> >>> > >> > >> >>> > > >>> less diy?
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
>> >>> > >> > >> >>> > > >>> +
>> >>> > >> > >> >>> > > >>> metadata
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> it
>> >>> > >> > >> >>> > just
>> >>> > >> > >> >>> > > >>> seemed a bit janky.
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
>> >>> > >> > >> >>> > > >>> object
>> >>> > thing,
>> >>> > >> > >> >>> > > >>> I
>> >>> > >> > >> >>> > > >>> wonder
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > we
>> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
>> >>> > >> > >> >>> > > >>> same
>> >>> > property
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> we
>> >>> > >> > >> >>> > think it
>> >>> > >> > >> >>> > > >>> is
>> >>> > >> > >> >>> > > >>> a good idea...
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> >>> > >> > >> li.davidm96@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>>> I was definitely considering having control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > without
>> >>> > >> > >> >>> > > >>>> data,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
>> >>> > >> > >> >>> > > >>>> FlightData
>> >>> > >> > >> >>> > > >>>> with
>> >>> > >> > >> >>> > > >>>> only
>> >>> > >> > >> >>> > app_metadata
>> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>> >>> > FlightData
>> >>> > >> > >> >>> > > >>>> should
>> >>> > >> > >> >>> > always
>> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>> >>> > >> > >> >>> > > >>>> the
>> >>> > >> > >> >>> > > >>>> documentation
>> >>> > >> > >> >>> > > >>>> on
>> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>> >>> > >> > >> >>> > > >>>> about
>> >>> > >> > >> >>> > > >>>> having
>> >>> > >> > >> >>> > > >>>> a
>> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
>> >>> > >> > >> >>> > > >>>> that
>> >>> from
>> >>> > >> > >> >>> > > >>>> happening,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
>> >>> > >> > >> >>> > > >>>> and
>> >>> > >> > >> >>> > > >>>> control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > >> >>> > is
>> >>> > >> > >> >>> > > >>>> cleaner.
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>> >>> > >> > >> >>> > > >>>> refrained
>> >>> > >> > >> >>> > > >>>> from
>> >>> > >> > >> >>> > > >>>> exposing
>> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
>> >>> > >> > >> >>> > > >>>> change
>> >>> that
>> >>> > >> > >> >>> > > >>>> now?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> Best,
>> >>> > >> > >> >>> > > >>>> David
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
>> >>> > >> > >> >>> > > >>>>> we
>> >>> > >> > >> >>> > > >>>>> use
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> doget/doput
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
>> >>> > >> > >> >>> > > >>>>> grpc's
>> >>> > async
>> >>> > >> > >> >>> > > >>>>> always
>> >>> > >> > >> >>> > pattern.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
>> >>> > >> > >> >>> > > >>>>> it
>> >>> > >> > >> >>> > > >>>>> is
>> >>> > >> > >> >>> > > >>>>> overloading
>> >>> > >> > >> >>> > > >>>>> the
>> >>> > >> > >> >>> > > >>>> message
>> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>> >>> control
>> >>> > >> > >> >>> > > >>>>> message
>> >>> > >> > >> >>> > > >>>> independently
>> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>> >>> something
>> >>> > >> > >> >>> > > >>>>> like
>> >>> > >> > >> >>> > > >>>>> an
>> >>> > >> > >> >>> > > >>>>> empty
>> >>> > >> > >> >>> > > >>>> flight
>> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
>> >>> > >> > >> >>> > > >>>>> not
>> >>> > support
>> >>> > >> > >> >>> > > >>>>> a
>> >>> > >> > >> >>> > > >>>>> container
>> >>> > >> > >> >>> > > >>>>> object
>> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
>> >>> > >> > >> >>> > > >>>>> so
>> >>> users
>> >>> > >> > >> >>> > > >>>>> can
>> >>> > >> > >> >>> > > >>>>> add
>> >>> > >> > >> >>> > > >>>>> more
>> >>> > >> > >> >>> > data
>> >>> > >> > >> >>> > > >>>>> as
>> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>> >>> > >> > >> >>> > > >>>>> the
>> >>> Any
>> >>> > >> > >> >>> > > >>>>> messages.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
>> >>> > >> > >> >>> > > >>>>>> a
>> >>> result
>> >>> > >> > >> >>> > > >>>>>> of
>> >>> > >> > >> prior
>> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>> >>> > >> > >> >>> > > >>>>>> message
>> >>> to
>> >>> > >> > carry
>> >>> > >> > >> >>> > > >>>>>> that
>> >>> > >> > >> >>> > kind
>> >>> > >> > >> >>> > > >>>>>> of information.
>> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > handle
>> >>> > >> > >> >>> > > >>>>>> incoming
>> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>> >>> > >> > >> >>> > > >>>>>> actually
>> >>> > be
>> >>> > >> > >> >>> > > >>>>>> quite
>> >>> > >> > >> >>> > natural
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
>> >>> > >> > >> >>> > > >>>>>> it's
>> >>> not
>> >>> > >> > >> >>> > > >>>>>> possible
>> >>> > >> > >> >>> > > >>>>>> in
>> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
>> >>> > >> > >> >>> > > >>>>>> though.
>> >>> > >> > >> >>> > > >>>>>> (In
>> >>> > >> > essence,
>> >>> > >> > >> >>> > gRPC-Java
>> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
>> >>> > >> > >> >>> > > >>>>>> sync-always.)
>> >>> There
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > experimental
>> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
>> >>> > >> > >> >>> > > >>>>>> similar
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> Java,
>> >>> > >> > >> >>> > > >>>>>> but
>> >>> > >> > >> >>> > > >>>>>> those
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > still
>> >>> > >> > >> >>> > > >>>>>> under
>> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
>> >>> > >> > >> >>> > > >>>>>> APIs
>> >>> which
>> >>> > >> > >> >>> > > >>>>>> have
>> >>> > >> > >> been
>> >>> > >> > >> >>> > around
>> >>> > >> > >> >>> > > >>>>>> for quite a while).
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>> David
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>> >>> > >> > >> >>> > > >>>>>>> Might
>> >>> > >> > >> >>> > > >>>>>>> worth
>> >>> > >> > >> >>> > > >>>>>>> discussion
>> >>> > >> > >> >>> > > >>>>>>> here
>> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>> >>> provide a
>> >>> > >> > >> >>> > > >>>>>>>> Python
>> >>> > >> > >> >>> > strawman.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
>> >>> > >> > >> >>> > > >>>>>>>> you
>> >>> > could
>> >>> > >> > >> >>> > > >>>>>>>> interleave
>> >>> > >> > >> >>> > > >>>> uploads
>> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
>> >>> > >> > >> >>> > > >>>>>>>> Right
>> >>> now,
>> >>> > >> > >> >>> > > >>>>>>>> synchronous
>> >>> > >> > >> >>> > APIs
>> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>> >>> > >> > >> >>> > > >>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>> server
>> >>> > >> > >> >>> > > >>>>>>>> wait
>> >>> > >> > >> >>> > > >>>>>>>> for
>> >>> > >> > >> >>> > each
>> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
>> >>> > >> > >> >>> > > >>>>>>>> (gRPC
>> >>> > >> > >> >>> > > >>>>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>> give
>> >>> > >> > >> >>> > > >>>>>>>> us
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>> >>> overall
>> >>> > >> > >> >>> > > >>>>>>>> timeout.)
>> >>> > >> > >> >>> > > >>>>>>>> As
>> >>> > >> > >> >>> > an
>> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
>> >>> > >> > >> >>> > > >>>>>>>> eventually
>> >>> > >> > >> >>> > > >>>>>>>> we
>> >>> > >> > >> >>> > > >>>>>>>> will
>> >>> > >> > >> >>> > > >>>>>>>> want
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > design
>> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> >>> > >> > bidirectional
>> >>> > >> > >> >>> > > >>>>>>>> stream
>> >>> > >> > >> >>> > > >>>>>>>> like
>> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>> >>> > >> > >> >>> > > >>>>>>>> pitfalls
>> >>> > >> > >> >>> > > >>>>>>>> easier
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>> run
>> >>> > >> > >> >>> > into.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> >>> > >> > >> >>> > > >>>>>>>> proposal,
>> >>> > >> > but
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > main
>> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
>> >>> > >> > >> >>> > > >>>>>>>> deploy,
>> >>> > >> > >> >>> > > >>>>>>>> two
>> >>> > >> > >> >>> > > >>>>>>>> separate
>> >>> > >> > >> >>> > > >>>>>>>> calls
>> >>> > >> > >> >>> > > >>>>>>>> could
>> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>> >>> > >> > >> >>> > > >>>>>>>> Additionally,
>> >>> > >> > >> >>> > > >>>>>>>> gRPC
>> >>> > >> > >> >>> > > >>>>>>>> has
>> >>> > >> > >> >>> > > >>>>>>>> some
>> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>> >>> > >> > >> >>> > > >>>>>>>> away
>> >>> in
>> >>> > >> > >> >>> > > >>>>>>>> between
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > two
>> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>> >>> > >> > >> >>> > > >>>>>>>> another
>> >>> > >> > instance
>> >>> > >> > >> >>> > available,
>> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
>> >>> > >> > >> >>> > > >>>>>>>> new
>> >>> > server
>> >>> > >> > >> without
>> >>> > >> > >> >>> > > >>>>>>>> warning.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
>> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
>> >>> > >> > >> >>> > > >>>>>>>>> sense.
>> >>> > >> > >> >>> > > >>>>>>>>> I
>> >>> > like
>> >>> > >> > >> >>> > > >>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> possibility
>> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>> >>> > >> > >> >>> > > >>>>>>>>> thing
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>>> help
>> >>> > >> > >> >>> > me
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
>> >>> > >> > >> >>> > > >>>>>>>>> life
>> >>> use
>> >>> > >> > >> >>> > > >>>>>>>>> case.
>> >>> > >> > >> >>> > > >>>>>>>>> Also,
>> >>> > >> > >> >>> > what
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>> >>> > >> > >> >>> > > >>>>>>>>> asyc?
>> >>> > >> > >> >>> > > >>>>>>>>> Would
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client
>> >>> > >> > >> >>> > > >>>>>>>>> block
>> >>> > >> > >> >>> > > >>>>>>>> till
>> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> >>> > >> > >> flight.vector_mult(a,
>> >>> > >> > >> >>> > > >>>>>>>>> b)
>> >>> > >> > >> >>> > or
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
>> >>> > >> > >> >>> > > >>>>>>>>> computation
>> >>> > >> > >> >>> > > >>>>>>>>> was
>> >>> > >> > >> >>> > > >>>>>>>>> done.
>> >>> > >> > >> >>> > > >>>>>>>>> If
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> later
>> >>> > >> > >> >>> > > >>>>>>>>> how
>> >>> > >> > >> >>> > > >>>>>>>>> is
>> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>> >>> suppose
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> this
>> >>> > >> > >> >>> > could
>> >>> > >> > >> >>> > > >>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>> >>> > >> > >> >>> > > >>>>>>>>> interface
>> >>> > >> > >> >>> > > >>>>>>>>> but
>> >>> > >> > >> rather
>> >>> > >> > >> >>> > > >>>>>>>>> by a
>> >>> > >> > >> >>> > > >>>>>>>>> function/util?
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Best,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
>> >>> > >> > >> >>> > > >>>>>>>>> <
>> >>> > >> > >> >>> > li.davidm96@gmail.com>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
>> >>> > >> > >> >>> > > >>>>>>>>>> successfully
>> >>> > >> > >> >>> > > >>>>>>>>>> so
>> >>> > >> > >> >>> > > >>>>>>>>>> far,
>> >>> > >> > but
>> >>> > >> > >> we
>> >>> > >> > >> >>> > > >>>>>>>>>> have
>> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
>> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
>> >>> being
>> >>> > >> > >> >>> > > >>>>>>>>>> able
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> both
>> >>> > >> > >> >>> > > >>>>>>>>>> send
>> >>> > >> > >> >>> > > >>>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> call.
>> >>> To
>> >>> > >> > >> >>> > > >>>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>>> end,
>> >>> > >> > >> >>> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> method:
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>> >>> comment
>> >>> > >> > >> >>> > > >>>>>>>>>> on
>> >>> > >> > the
>> >>> > >> > >> >>> > document.
>> >>> > >> > >> >>> > > >>>>>>>>>> I'd
>> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > >> > >> >>> > > >>>>>>>>>> relatively
>> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
>> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > vote.
>> >>> > >> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> decided
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>> >>> > >> > >> >>> > > >>>>>>>>>> proposed
>> >>> (on
>> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>> >>> > >> > >> >>> > > >>>>>>>>>> as
>> >>> > >> > >> >>> > > >>>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
>> >>> > >> > of
>> >>> > >> > >> >>> > > >>>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> --
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
>> >>> > >> > >> >>> > > >>>>>>>>> Engineer
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> >>> > >> > >> join
>> >>> > >> > >> >>> > > >>>>>>>>> our
>> >>> > >> > >> >>> > > >>>>>>>>> community
>> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>> >>> Download
>> >>> > >> > Dremio
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>
>> >>> > >> > >> >
>> >>> > >> > >>
>> >>> > >> > >
>> >>> > >> >
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
hi David,

I did a preliminary view and things look to be on the right track
there. What do you think about breaking out the protocol changes (and
adding appropriate comments) so we can have a vote on that in
relatively short order?

- Wes

On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>
> Following up here, I've submitted a draft implementation for C++:
> https://github.com/apache/arrow/pull/6656
>
> The core functionality is there, but there are still holes that I need
> to implement. Compared to the draft spec, the client also sends a
> FlightDescriptor to begin with, though it's currently not exposed.
> This provides consistency with DoGet/DoPut which also send a message
> to begin with to describe the stream to the server.
>
> Andy, I hope this helps clarify whether it meets your needs.
>
> Best,
> David
>
> On 2/25/20, David Li <li...@gmail.com> wrote:
> > Hey Andy,
> >
> > I've been rather busy unfortunately. I had started on an
> > implementation in C++ to provide as part of this discussion, but it's
> > not complete. I'm hoping to have more done in March.
> >
> > Best,
> > David
> >
> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> I was wondering if there had been any momentum on this (the BiDirectional
> >> RPC design)?
> >>
> >> I'm interested in this for the use case of Apache Spark sending a stream
> >> of
> >> data to another process to invoke custom code and then receive a stream
> >> back with the transformed data.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >>
> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>
> >>> I support moving forward with the current proposal.
> >>>
> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
> >>>
> >>> > Just following up here again, any other thoughts?
> >>> >
> >>> > I think we do have justifications for potentially separate streams in
> >>> > a call, but that's more of an orthogonal question - it doesn't need to
> >>> > be addressed here. I do agree that it very much complicates things.
> >>> >
> >>> > Thanks,
> >>> > David
> >>> >
> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > > I would generally agree with this. Note that you have the
> >>> > > possibility
> >>> > > to use unions-of-structs to send record batches with different
> >>> > > schemas
> >>> > > in the same stream, though with some added complexity on each side
> >>> > >
> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >>
> >>> > >> I'd vote for explicitly not supported. We should keep our
> >>> > >> primitives
> >>> > >> narrow.
> >>> > >>
> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Thanks for the feedback.
> >>> > >> >
> >>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> >>> > >> > there are a lot of places where things could be made more
> >>> > >> > ergonomic,
> >>> > >> > including with the metadata fields. But it would also have locked
> >>> out
> >>> > >> > us of potential future transports.
> >>> > >> >
> >>> > >> > On another note: I hesitate to put too much into this method, but
> >>> > >> > we
> >>> > >> > are looking at use cases where potentially, a client may want to
> >>> > >> > upload multiple distinct datasets (with differing schemas). (This
> >>> is a
> >>> > >> > little tentative, and I can get more details...) Right now, each
> >>> > >> > logical stream in Flight must have a single, consistent schema;
> >>> would
> >>> > >> > it make sense to look at ways to relax this, or declare this
> >>> > >> > explicitly out of scope (and require multiple calls and
> >>> > >> > coordination
> >>> > >> > with the deployment topology) in order to accomplish this?
> >>> > >> >
> >>> > >> > Best,
> >>> > >> > David
> >>> > >> >
> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> >>> looks
> >>> > >> > > good
> >>> > >> > > to me.
> >>> > >> > >
> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >>> > >> > > <li...@gmail.com>
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > >> I've updated the proposal.
> >>> > >> > >>
> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >>> Protobuf
> >>> > >> > types,
> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
> >>> > >> > >> to
> >>> > >> > >> C++
> >>> > >> > >> users without causing build pains,
> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >>> > >> > >> field
> >>> > >> > >> without somehow being compatible with the Protobuf wheels (by
> >>> > >> > >> linking
> >>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> >>> into
> >>> > >> > >> the Python ones),
> >>> > >> > >> - All our other application-defined fields are already bytes.
> >>> > >> > >>
> >>> > >> > >> Applications that want structure can encode JSON or Protobuf
> >>> > >> > >> Any
> >>> > >> > >> into
> >>> > >> > >> the bytes field themselves, much as you can already do for
> >>> Ticket,
> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> >>> using
> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >>> > >> > >> and
> >>> > must
> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >>> message.
> >>> > >> > >>
> >>> > >> > >> If we decide on using bytes, then I don't think it makes sense
> >>> > >> > >> to
> >>> > >> > >> define a new message with a oneof either, since it would be
> >>> > >> > >> redundant.
> >>> > >> > >>
> >>> > >> > >> Thanks,
> >>> > >> > >> David
> >>> > >> > >>
> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >>> > >> > >> > I've been extremely backlogged, I will update the proposal
> >>> when I
> >>> > >> > >> > get
> >>> > >> > >> > a chance and reply here when done.
> >>> > >> > >> >
> >>> > >> > >> > Best,
> >>> > >> > >> > David
> >>> > >> > >> >
> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >>> > >> > >> >> passed.
> >>> It
> >>> > >> > >> >> seems
> >>> > >> > >> >> there are still some questions here, could we summarize
> >>> > >> > >> >> what
> >>> are
> >>> > >> > >> >> the
> >>> > >> > >> >> alternatives along with any public API implications so we
> >>> > >> > >> >> can
> >>> > try
> >>> > >> > >> >> to
> >>> > >> > >> >> render a decision?
> >>> > >> > >> >>
> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >>> li.davidm96@gmail.com
> >>> > >
> >>> > >> > >> >> wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> Hi Wes,
> >>> > >> > >> >>>
> >>> > >> > >> >>> Responses inline:
> >>> > >> > >> >>>
> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >>> wesmckinn@gmail.com>
> >>> > >> > wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >>> > >> > >> >>> > <li...@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >>> > FlightData
> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >>> > >> > >> >>> > > filled
> >>> in
> >>> > >> > >> >>> > > and
> >>> > >> > >> data
> >>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
> >>> > >> > >> >>> > > the
> >>> > >> > >> FlightData
> >>> > >> > >> >>> > > structure in another structure that explicitly
> >>> > distinguishes
> >>> > >> > >> between
> >>> > >> > >> >>> > > control and data messages.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >>> > >> > >> >>> > with
> >>> > >> > >> >>> > empty
> >>> > >> > >> body.
> >>> > >> > >> >>> > One question to consider is what changes (if any) would
> >>> need
> >>> > to
> >>> > >> > >> >>> > be
> >>> > >> > >> >>> > made to public APIs in either scenario.
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
> >>> data
> >>> > >> > >> >>> messages
> >>> > >> > >> >>> in
> >>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
> >>> > change
> >>> > >> > >> >>> the
> >>> > >> > >> >>> wire
> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >>> compatibly,
> >>> > >> > >> >>> though.
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >>> > >> > >> >>> > > fields.
> >>> So
> >>> > >> > >> >>> > > far,
> >>> > >> > >> >>> > > we've
> >>> > >> > >> >>> > > used bytestring fields for application-defined data.
> >>> > >> > >> >>> > > This
> >>> > is
> >>> > >> > >> >>> > > workable
> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
> >>> those
> >>> > >> > >> >>> > > fields,
> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >>> > >> > >> >>> > > into/from
> >>> the
> >>> > >> > >> >>> > > bytestring
> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >>> > >> > >> >>> > > dynamically
> >>> > >> > >> >>> > > typed
> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
> >>> > >> > >> >>> > > exposing
> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >>> > >> > >> >>> > > combination
> >>> of
> >>> > >> > >> >>> > > a
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >>> > >> > >> >>> > > Protobuf
> >>> > >> > >> >>> > > Any
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> >>> > cleaner
> >>> > >> > >> >>> > > in
> >>> > >> > >> any
> >>> > >> > >> >>> > > of the language APIs, though.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
> >>> > >> > >> >>> > particular
> >>> > >> > >> >>> > Flight implementation seems on first principles like the
> >>> most
> >>> > >> > >> flexible
> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >>> > >> > >> >>> > _must_
> >>> be
> >>> > a
> >>> > >> > >> >>> > protobuf?
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
> >>> > >> > >> >>> but
> >>> it
> >>> > >> > would
> >>> > >> > >> >>> have
> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >>> > >> > >> >>> Then
> >>> > the
> >>> > >> > >> >>> question
> >>> > >> > >> >>> would just be how to expose this (would be easier in Java,
> >>> > harder
> >>> > >> > >> >>> in
> >>> > >> > >> >>> C++).
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > David
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >>> > >> > >> >>> > > wrote:
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >>> > >> > >> >>> > > > non-protobuf
> >>> > >> > >> >>> > > > terms?
> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> >>> > Flight.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Regards
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Antoine.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> >>> > would
> >>> > >> > just
> >>> > >> > >> be
> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >>> > >> > >> >>> > > >> doesn't
> >>> > even
> >>> > >> > >> mention
> >>> > >> > >> >>> > > >> it
> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > >> > >> >>> > > >> as well.)
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >>> > >> > >> >>> > > >> essentially
> >>> > >> > >> >>> > > >> inlining
> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightMessage {
> >>> > >> > >> >>> > > >>   oneof message {
> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >>> > >> > >> >>> > > >>   }
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightAny {
> >>> > >> > >> >>> > > >>   string type = 1;
> >>> > >> > >> >>> > > >>   bytes data = 2;
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Is this correct?
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >>> > >> > >> >>> > > >> for
> >>> > >> > >> >>> > > >> DoGet/DoPut
> >>> > >> > >> >>> > > >> as
> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >>> > >> > >> >>> > > >> consistent
> >>> > with
> >>> > >> > >> >>> > > >> all
> >>> > >> > >> >>> > > >> of
> >>> > >> > >> >>> > > >> them,
> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >>> > >> > >> >>> > > >> own
> >>> > >> > >> >>> > > >> thing.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Thanks,
> >>> > >> > >> >>> > > >> David
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >>> wrote:
> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >>> > >> > >> >>> > > >>> behavior
> >>> > >> > >> >>> > > >>> without
> >>> > >> > >> >>> > > >>> exposing
> >>> > >> > >> >>> > the
> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
> >>> could
> >>> > >> > >> >>> > > >>> expose
> >>> > >> > >> >>> > > >>> as
> >>> > >> > >> >>> > > >>> two
> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >>> > >> > >> >>> > > >>> whatever
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> people
> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >>> > >> > >> >>> > > >>> (Basically
> >>> a
> >>> > >> > >> >>> > > >>> user
> >>> > >> > >> >>> > > >>> could
> >>> > >> > >> >>> > > >>> use
> >>> > >> > >> >>> > > >>> any
> >>> > >> > >> >>> > > >>> with it easily but they could also use any other
> >>> > >> > >> >>> > > >>> mechanism).
> >>> > >> > >> >>> > > >>> At
> >>> > >> > >> >>> > least in
> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
> >>> other
> >>> > >> > >> language
> >>> > >> > >> >>> > > >>> bindings
> >>> > >> > >> >>> > > >>> less diy?
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >>> > >> > >> >>> > > >>> metadata
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> it
> >>> > >> > >> >>> > just
> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
> >>> > thing,
> >>> > >> > >> >>> > > >>> I
> >>> > >> > >> >>> > > >>> wonder
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > we
> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> >>> > property
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> we
> >>> > >> > >> >>> > think it
> >>> > >> > >> >>> > > >>> is
> >>> > >> > >> >>> > > >>> a good idea...
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >>> > >> > >> li.davidm96@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > without
> >>> > >> > >> >>> > > >>>> data,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
> >>> > >> > >> >>> > > >>>> with
> >>> > >> > >> >>> > > >>>> only
> >>> > >> > >> >>> > app_metadata
> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >>> > FlightData
> >>> > >> > >> >>> > > >>>> should
> >>> > >> > >> >>> > always
> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >>> > >> > >> >>> > > >>>> the
> >>> > >> > >> >>> > > >>>> documentation
> >>> > >> > >> >>> > > >>>> on
> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >>> > >> > >> >>> > > >>>> about
> >>> > >> > >> >>> > > >>>> having
> >>> > >> > >> >>> > > >>>> a
> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
> >>> from
> >>> > >> > >> >>> > > >>>> happening,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >>> > >> > >> >>> > > >>>> and
> >>> > >> > >> >>> > > >>>> control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > >> >>> > is
> >>> > >> > >> >>> > > >>>> cleaner.
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >>> > >> > >> >>> > > >>>> refrained
> >>> > >> > >> >>> > > >>>> from
> >>> > >> > >> >>> > > >>>> exposing
> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
> >>> that
> >>> > >> > >> >>> > > >>>> now?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> Best,
> >>> > >> > >> >>> > > >>>> David
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
> >>> > >> > >> >>> > > >>>>> use
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> doget/doput
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> >>> > async
> >>> > >> > >> >>> > > >>>>> always
> >>> > >> > >> >>> > pattern.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
> >>> > >> > >> >>> > > >>>>> is
> >>> > >> > >> >>> > > >>>>> overloading
> >>> > >> > >> >>> > > >>>>> the
> >>> > >> > >> >>> > > >>>> message
> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >>> control
> >>> > >> > >> >>> > > >>>>> message
> >>> > >> > >> >>> > > >>>> independently
> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >>> something
> >>> > >> > >> >>> > > >>>>> like
> >>> > >> > >> >>> > > >>>>> an
> >>> > >> > >> >>> > > >>>>> empty
> >>> > >> > >> >>> > > >>>> flight
> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> >>> > support
> >>> > >> > >> >>> > > >>>>> a
> >>> > >> > >> >>> > > >>>>> container
> >>> > >> > >> >>> > > >>>>> object
> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
> >>> users
> >>> > >> > >> >>> > > >>>>> can
> >>> > >> > >> >>> > > >>>>> add
> >>> > >> > >> >>> > > >>>>> more
> >>> > >> > >> >>> > data
> >>> > >> > >> >>> > > >>>>> as
> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >>> > >> > >> >>> > > >>>>> the
> >>> Any
> >>> > >> > >> >>> > > >>>>> messages.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>>> wrote:
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
> >>> result
> >>> > >> > >> >>> > > >>>>>> of
> >>> > >> > >> prior
> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >>> > >> > >> >>> > > >>>>>> message
> >>> to
> >>> > >> > carry
> >>> > >> > >> >>> > > >>>>>> that
> >>> > >> > >> >>> > kind
> >>> > >> > >> >>> > > >>>>>> of information.
> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> >>> > handle
> >>> > >> > >> >>> > > >>>>>> incoming
> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >>> > >> > >> >>> > > >>>>>> actually
> >>> > be
> >>> > >> > >> >>> > > >>>>>> quite
> >>> > >> > >> >>> > natural
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
> >>> not
> >>> > >> > >> >>> > > >>>>>> possible
> >>> > >> > >> >>> > > >>>>>> in
> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
> >>> > >> > >> >>> > > >>>>>> (In
> >>> > >> > essence,
> >>> > >> > >> >>> > gRPC-Java
> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
> >>> There
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > experimental
> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> Java,
> >>> > >> > >> >>> > > >>>>>> but
> >>> > >> > >> >>> > > >>>>>> those
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> >>> > still
> >>> > >> > >> >>> > > >>>>>> under
> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
> >>> which
> >>> > >> > >> >>> > > >>>>>> have
> >>> > >> > >> been
> >>> > >> > >> >>> > around
> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>> David
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >>> > >> > >> >>> > > >>>>>>> Might
> >>> > >> > >> >>> > > >>>>>>> worth
> >>> > >> > >> >>> > > >>>>>>> discussion
> >>> > >> > >> >>> > > >>>>>>> here
> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >>> provide a
> >>> > >> > >> >>> > > >>>>>>>> Python
> >>> > >> > >> >>> > strawman.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> >>> > could
> >>> > >> > >> >>> > > >>>>>>>> interleave
> >>> > >> > >> >>> > > >>>> uploads
> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
> >>> now,
> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >>> > >> > >> >>> > APIs
> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >>> > >> > >> >>> > > >>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>> server
> >>> > >> > >> >>> > > >>>>>>>> wait
> >>> > >> > >> >>> > > >>>>>>>> for
> >>> > >> > >> >>> > each
> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>> give
> >>> > >> > >> >>> > > >>>>>>>> us
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >>> overall
> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >>> > >> > >> >>> > > >>>>>>>> As
> >>> > >> > >> >>> > an
> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >>> > >> > >> >>> > > >>>>>>>>
> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
> >>> > >> > >> >>> > > >>>>>>>> we
> >>> > >> > >> >>> > > >>>>>>>> will
> >>> > >> > >> >>> > > >>>>>>>> want
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > design
> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >>> > >> > bidirectional
> >>> > >> > >> >>> > > >>>>>>>> stream
> >>> > >> > >> >>> > > >>>>>>>> like
> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >>> > >> > >> >>> > > >>>>>>>> easier
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>> run
> >>> > >> > >> >>> > into.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >>> > >> > but
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > main
> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
> >>> > >> > >> >>> > > >>>>>>>> two
> >>> > >> > >> >>> > > >>>>>>>> separate
> >>> > >> > >> >>> > > >>>>>>>> calls
> >>> > >> > >> >>> > > >>>>>>>> could
> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >>> > >> > >> >>> > > >>>>>>>> has
> >>> > >> > >> >>> > > >>>>>>>> some
> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >>> > >> > >> >>> > > >>>>>>>> away
> >>> in
> >>> > >> > >> >>> > > >>>>>>>> between
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > two
> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >>> > >> > >> >>> > > >>>>>>>> another
> >>> > >> > instance
> >>> > >> > >> >>> > available,
> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> >>> > server
> >>> > >> > >> without
> >>> > >> > >> >>> > > >>>>>>>> warning.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
> >>> > >> > >> >>> > > >>>>>>>>> I
> >>> > like
> >>> > >> > >> >>> > > >>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >>> > >> > >> >>> > > >>>>>>>>> thing
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>>> help
> >>> > >> > >> >>> > me
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> be
> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
> >>> use
> >>> > >> > >> >>> > > >>>>>>>>> case.
> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >>> > >> > >> >>> > what
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >>> > >> > >> >>> > > >>>>>>>>> Would
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client
> >>> > >> > >> >>> > > >>>>>>>>> block
> >>> > >> > >> >>> > > >>>>>>>> till
> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >>> > >> > >> flight.vector_mult(a,
> >>> > >> > >> >>> > > >>>>>>>>> b)
> >>> > >> > >> >>> > or
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
> >>> > >> > >> >>> > > >>>>>>>>> was
> >>> > >> > >> >>> > > >>>>>>>>> done.
> >>> > >> > >> >>> > > >>>>>>>>> If
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> later
> >>> > >> > >> >>> > > >>>>>>>>> how
> >>> > >> > >> >>> > > >>>>>>>>> is
> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >>> suppose
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> this
> >>> > >> > >> >>> > could
> >>> > >> > >> >>> > > >>>> be
> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >>> > >> > >> >>> > > >>>>>>>>> interface
> >>> > >> > >> >>> > > >>>>>>>>> but
> >>> > >> > >> rather
> >>> > >> > >> >>> > > >>>>>>>>> by a
> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
> >>> > >> > >> >>> > > >>>>>>>>>> so
> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >>> > >> > but
> >>> > >> > >> we
> >>> > >> > >> >>> > > >>>>>>>>>> have
> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
> >>> being
> >>> > >> > >> >>> > > >>>>>>>>>> able
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> both
> >>> > >> > >> >>> > > >>>>>>>>>> send
> >>> > >> > >> >>> > > >>>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >>> To
> >>> > >> > >> >>> > > >>>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >>> > >> > >> >>> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >>> comment
> >>> > >> > >> >>> > > >>>>>>>>>> on
> >>> > >> > the
> >>> > >> > >> >>> > document.
> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> >>> > vote.
> >>> > >> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >>> (on
> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >>> > >> > >> >>> > > >>>>>>>>>> as
> >>> > >> > >> >>> > > >>>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >>> > >> > of
> >>> > >> > >> >>> > > >>>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> --
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >>> > >> > >> join
> >>> > >> > >> >>> > > >>>>>>>>> our
> >>> > >> > >> >>> > > >>>>>>>>> community
> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >>> Download
> >>> > >> > Dremio
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> >
> >>> > >> > >> >>
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> > >
> >>> > >> >
> >>> > >
> >>> >
> >>>
> >>
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Following up here, I've submitted a draft implementation for C++:
https://github.com/apache/arrow/pull/6656

The core functionality is there, but there are still holes that I need
to implement. Compared to the draft spec, the client also sends a
FlightDescriptor to begin with, though it's currently not exposed.
This provides consistency with DoGet/DoPut which also send a message
to begin with to describe the stream to the server.

Andy, I hope this helps clarify whether it meets your needs.

Best,
David

On 2/25/20, David Li <li...@gmail.com> wrote:
> Hey Andy,
>
> I've been rather busy unfortunately. I had started on an
> implementation in C++ to provide as part of this discussion, but it's
> not complete. I'm hoping to have more done in March.
>
> Best,
> David
>
> On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> I was wondering if there had been any momentum on this (the BiDirectional
>> RPC design)?
>>
>> I'm interested in this for the use case of Apache Spark sending a stream
>> of
>> data to another process to invoke custom code and then receive a stream
>> back with the transformed data.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> I support moving forward with the current proposal.
>>>
>>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>>>
>>> > Just following up here again, any other thoughts?
>>> >
>>> > I think we do have justifications for potentially separate streams in
>>> > a call, but that's more of an orthogonal question - it doesn't need to
>>> > be addressed here. I do agree that it very much complicates things.
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>>> > > I would generally agree with this. Note that you have the
>>> > > possibility
>>> > > to use unions-of-structs to send record batches with different
>>> > > schemas
>>> > > in the same stream, though with some added complexity on each side
>>> > >
>>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >>
>>> > >> I'd vote for explicitly not supported. We should keep our
>>> > >> primitives
>>> > >> narrow.
>>> > >>
>>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Thanks for the feedback.
>>> > >> >
>>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>>> > >> > there are a lot of places where things could be made more
>>> > >> > ergonomic,
>>> > >> > including with the metadata fields. But it would also have locked
>>> out
>>> > >> > us of potential future transports.
>>> > >> >
>>> > >> > On another note: I hesitate to put too much into this method, but
>>> > >> > we
>>> > >> > are looking at use cases where potentially, a client may want to
>>> > >> > upload multiple distinct datasets (with differing schemas). (This
>>> is a
>>> > >> > little tentative, and I can get more details...) Right now, each
>>> > >> > logical stream in Flight must have a single, consistent schema;
>>> would
>>> > >> > it make sense to look at ways to relax this, or declare this
>>> > >> > explicitly out of scope (and require multiple calls and
>>> > >> > coordination
>>> > >> > with the deployment topology) in order to accomplish this?
>>> > >> >
>>> > >> > Best,
>>> > >> > David
>>> > >> >
>>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>>> looks
>>> > >> > > good
>>> > >> > > to me.
>>> > >> > >
>>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>>> > >> > > <li...@gmail.com>
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > >> I've updated the proposal.
>>> > >> > >>
>>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>>> > >> > >> errors/metadata, I still think using bytes is preferable:
>>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>>> Protobuf
>>> > >> > types,
>>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>>> > >> > >> to
>>> > >> > >> C++
>>> > >> > >> users without causing build pains,
>>> > >> > >> - We can't let Python users take advantage of the Protobuf
>>> > >> > >> field
>>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>>> > >> > >> linking
>>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>>> into
>>> > >> > >> the Python ones),
>>> > >> > >> - All our other application-defined fields are already bytes.
>>> > >> > >>
>>> > >> > >> Applications that want structure can encode JSON or Protobuf
>>> > >> > >> Any
>>> > >> > >> into
>>> > >> > >> the bytes field themselves, much as you can already do for
>>> Ticket,
>>> > >> > >> commands in FlightDescriptors, and application metadata in
>>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>>> using
>>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>>> > >> > >> and
>>> > must
>>> > >> > >> invoke the Protobuf deserializer again to read the actual
>>> message.
>>> > >> > >>
>>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>>> > >> > >> to
>>> > >> > >> define a new message with a oneof either, since it would be
>>> > >> > >> redundant.
>>> > >> > >>
>>> > >> > >> Thanks,
>>> > >> > >> David
>>> > >> > >>
>>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>>> > >> > >> > I've been extremely backlogged, I will update the proposal
>>> when I
>>> > >> > >> > get
>>> > >> > >> > a chance and reply here when done.
>>> > >> > >> >
>>> > >> > >> > Best,
>>> > >> > >> > David
>>> > >> > >> >
>>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>>> > >> > >> >> Bumping this discussion since a couple of weeks have
>>> > >> > >> >> passed.
>>> It
>>> > >> > >> >> seems
>>> > >> > >> >> there are still some questions here, could we summarize
>>> > >> > >> >> what
>>> are
>>> > >> > >> >> the
>>> > >> > >> >> alternatives along with any public API implications so we
>>> > >> > >> >> can
>>> > try
>>> > >> > >> >> to
>>> > >> > >> >> render a decision?
>>> > >> > >> >>
>>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>>> li.davidm96@gmail.com
>>> > >
>>> > >> > >> >> wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> Hi Wes,
>>> > >> > >> >>>
>>> > >> > >> >>> Responses inline:
>>> > >> > >> >>>
>>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>>> wesmckinn@gmail.com>
>>> > >> > wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>>> > >> > >> >>> > <li...@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > The question is whether to repurpose the existing
>>> > FlightData
>>> > >> > >> >>> > > structure, and allow for the metadata field to be
>>> > >> > >> >>> > > filled
>>> in
>>> > >> > >> >>> > > and
>>> > >> > >> data
>>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
>>> > >> > >> >>> > > the
>>> > >> > >> FlightData
>>> > >> > >> >>> > > structure in another structure that explicitly
>>> > distinguishes
>>> > >> > >> between
>>> > >> > >> >>> > > control and data messages.
>>> > >> > >> >>> >
>>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>>> > >> > >> >>> > with
>>> > >> > >> >>> > empty
>>> > >> > >> body.
>>> > >> > >> >>> > One question to consider is what changes (if any) would
>>> need
>>> > to
>>> > >> > >> >>> > be
>>> > >> > >> >>> > made to public APIs in either scenario.
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
>>> data
>>> > >> > >> >>> messages
>>> > >> > >> >>> in
>>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
>>> > change
>>> > >> > >> >>> the
>>> > >> > >> >>> wire
>>> > >> > >> >>> format. I think the APIs could be changed backwards
>>> compatibly,
>>> > >> > >> >>> though.
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > The other question is how to handle the metadata
>>> > >> > >> >>> > > fields.
>>> So
>>> > >> > >> >>> > > far,
>>> > >> > >> >>> > > we've
>>> > >> > >> >>> > > used bytestring fields for application-defined data.
>>> > >> > >> >>> > > This
>>> > is
>>> > >> > >> >>> > > workable
>>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
>>> those
>>> > >> > >> >>> > > fields,
>>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>>> > >> > >> >>> > > into/from
>>> the
>>> > >> > >> >>> > > bytestring
>>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>>> > >> > >> >>> > > dynamically
>>> > >> > >> >>> > > typed
>>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
>>> > >> > >> >>> > > exposing
>>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>>> > >> > >> >>> > > combination
>>> of
>>> > >> > >> >>> > > a
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>>> > >> > >> >>> > > Protobuf
>>> > >> > >> >>> > > Any
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
>>> > cleaner
>>> > >> > >> >>> > > in
>>> > >> > >> any
>>> > >> > >> >>> > > of the language APIs, though.
>>> > >> > >> >>> >
>>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
>>> > >> > >> >>> > particular
>>> > >> > >> >>> > Flight implementation seems on first principles like the
>>> most
>>> > >> > >> flexible
>>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>>> > >> > >> >>> > _must_
>>> be
>>> > a
>>> > >> > >> >>> > protobuf?
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
>>> > >> > >> >>> but
>>> it
>>> > >> > would
>>> > >> > >> >>> have
>>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>>> > >> > >> >>> Then
>>> > the
>>> > >> > >> >>> question
>>> > >> > >> >>> would just be how to expose this (would be easier in Java,
>>> > harder
>>> > >> > >> >>> in
>>> > >> > >> >>> C++).
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > David
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>>> > >> > >> >>> > > wrote:
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>>> > >> > >> >>> > > > non-protobuf
>>> > >> > >> >>> > > > terms?
>>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
>>> > Flight.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Regards
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Antoine.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
>>> > would
>>> > >> > just
>>> > >> > >> be
>>> > >> > >> >>> > > >> application-level logic. (The official guide
>>> > >> > >> >>> > > >> doesn't
>>> > even
>>> > >> > >> mention
>>> > >> > >> >>> > > >> it
>>> > >> > >> >>> > > >> in the encoding docs; I found
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > >> > >> >>> > > >> as well.)
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>>> > >> > >> >>> > > >> essentially
>>> > >> > >> >>> > > >> inlining
>>> > >> > >> >>> > > >> the definition of Any, e.g.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightMessage {
>>> > >> > >> >>> > > >>   oneof message {
>>> > >> > >> >>> > > >>     FlightData data = 1;
>>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>>> > >> > >> >>> > > >>   }
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightAny {
>>> > >> > >> >>> > > >>   string type = 1;
>>> > >> > >> >>> > > >>   bytes data = 2;
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Is this correct?
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>>> > >> > >> >>> > > >> for
>>> > >> > >> >>> > > >> DoGet/DoPut
>>> > >> > >> >>> > > >> as
>>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>>> > >> > >> >>> > > >> consistent
>>> > with
>>> > >> > >> >>> > > >> all
>>> > >> > >> >>> > > >> of
>>> > >> > >> >>> > > >> them,
>>> > >> > >> >>> > > >> rather than have one of the three methods do its
>>> > >> > >> >>> > > >> own
>>> > >> > >> >>> > > >> thing.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Thanks,
>>> > >> > >> >>> > > >> David
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>>> > >> > >> >>> > > >>> behavior
>>> > >> > >> >>> > > >>> without
>>> > >> > >> >>> > > >>> exposing
>>> > >> > >> >>> > the
>>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
>>> could
>>> > >> > >> >>> > > >>> expose
>>> > >> > >> >>> > > >>> as
>>> > >> > >> >>> > > >>> two
>>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
>>> > >> > >> >>> > > >>> whatever
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> people
>>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>>> > >> > >> >>> > > >>> (Basically
>>> a
>>> > >> > >> >>> > > >>> user
>>> > >> > >> >>> > > >>> could
>>> > >> > >> >>> > > >>> use
>>> > >> > >> >>> > > >>> any
>>> > >> > >> >>> > > >>> with it easily but they could also use any other
>>> > >> > >> >>> > > >>> mechanism).
>>> > >> > >> >>> > > >>> At
>>> > >> > >> >>> > least in
>>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
>>> other
>>> > >> > >> language
>>> > >> > >> >>> > > >>> bindings
>>> > >> > >> >>> > > >>> less diy?
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>>> > >> > >> >>> > > >>> metadata
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> it
>>> > >> > >> >>> > just
>>> > >> > >> >>> > > >>> seemed a bit janky.
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
>>> > thing,
>>> > >> > >> >>> > > >>> I
>>> > >> > >> >>> > > >>> wonder
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > we
>>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
>>> > property
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> we
>>> > >> > >> >>> > think it
>>> > >> > >> >>> > > >>> is
>>> > >> > >> >>> > > >>> a good idea...
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>>> > >> > >> li.davidm96@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>>> I was definitely considering having control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > without
>>> > >> > >> >>> > > >>>> data,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
>>> > >> > >> >>> > > >>>> with
>>> > >> > >> >>> > > >>>> only
>>> > >> > >> >>> > app_metadata
>>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>>> > FlightData
>>> > >> > >> >>> > > >>>> should
>>> > >> > >> >>> > always
>>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>>> > >> > >> >>> > > >>>> the
>>> > >> > >> >>> > > >>>> documentation
>>> > >> > >> >>> > > >>>> on
>>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>>> > >> > >> >>> > > >>>> about
>>> > >> > >> >>> > > >>>> having
>>> > >> > >> >>> > > >>>> a
>>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
>>> from
>>> > >> > >> >>> > > >>>> happening,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> overall having a clear separation between data
>>> > >> > >> >>> > > >>>> and
>>> > >> > >> >>> > > >>>> control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > >> >>> > is
>>> > >> > >> >>> > > >>>> cleaner.
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>>> > >> > >> >>> > > >>>> refrained
>>> > >> > >> >>> > > >>>> from
>>> > >> > >> >>> > > >>>> exposing
>>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
>>> that
>>> > >> > >> >>> > > >>>> now?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> Best,
>>> > >> > >> >>> > > >>>> David
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >> > >> >>> > > >>>>> Hey David,
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
>>> > >> > >> >>> > > >>>>> use
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> doget/doput
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
>>> > async
>>> > >> > >> >>> > > >>>>> always
>>> > >> > >> >>> > pattern.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
>>> > >> > >> >>> > > >>>>> is
>>> > >> > >> >>> > > >>>>> overloading
>>> > >> > >> >>> > > >>>>> the
>>> > >> > >> >>> > > >>>> message
>>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>>> control
>>> > >> > >> >>> > > >>>>> message
>>> > >> > >> >>> > > >>>> independently
>>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>>> something
>>> > >> > >> >>> > > >>>>> like
>>> > >> > >> >>> > > >>>>> an
>>> > >> > >> >>> > > >>>>> empty
>>> > >> > >> >>> > > >>>> flight
>>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
>>> > support
>>> > >> > >> >>> > > >>>>> a
>>> > >> > >> >>> > > >>>>> container
>>> > >> > >> >>> > > >>>>> object
>>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
>>> users
>>> > >> > >> >>> > > >>>>> can
>>> > >> > >> >>> > > >>>>> add
>>> > >> > >> >>> > > >>>>> more
>>> > >> > >> >>> > data
>>> > >> > >> >>> > > >>>>> as
>>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>>> > >> > >> >>> > > >>>>> the
>>> Any
>>> > >> > >> >>> > > >>>>> messages.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>>> wrote:
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>>> Hi Jacques,
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
>>> result
>>> > >> > >> >>> > > >>>>>> of
>>> > >> > >> prior
>>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>>> > >> > >> >>> > > >>>>>> message
>>> to
>>> > >> > carry
>>> > >> > >> >>> > > >>>>>> that
>>> > >> > >> >>> > kind
>>> > >> > >> >>> > > >>>>>> of information.
>>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
>>> > handle
>>> > >> > >> >>> > > >>>>>> incoming
>>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>>> > >> > >> >>> > > >>>>>> actually
>>> > be
>>> > >> > >> >>> > > >>>>>> quite
>>> > >> > >> >>> > natural
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
>>> not
>>> > >> > >> >>> > > >>>>>> possible
>>> > >> > >> >>> > > >>>>>> in
>>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
>>> > >> > >> >>> > > >>>>>> (In
>>> > >> > essence,
>>> > >> > >> >>> > gRPC-Java
>>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
>>> There
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > experimental
>>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> Java,
>>> > >> > >> >>> > > >>>>>> but
>>> > >> > >> >>> > > >>>>>> those
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
>>> > still
>>> > >> > >> >>> > > >>>>>> under
>>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
>>> which
>>> > >> > >> >>> > > >>>>>> have
>>> > >> > >> been
>>> > >> > >> >>> > around
>>> > >> > >> >>> > > >>>>>> for quite a while).
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>> David
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>>> > >> > >> >>> > > >>>>>>> Might
>>> > >> > >> >>> > > >>>>>>> worth
>>> > >> > >> >>> > > >>>>>>> discussion
>>> > >> > >> >>> > > >>>>>>> here
>>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>> wrote:
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>>> provide a
>>> > >> > >> >>> > > >>>>>>>> Python
>>> > >> > >> >>> > strawman.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
>>> > could
>>> > >> > >> >>> > > >>>>>>>> interleave
>>> > >> > >> >>> > > >>>> uploads
>>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
>>> now,
>>> > >> > >> >>> > > >>>>>>>> synchronous
>>> > >> > >> >>> > APIs
>>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>>> > >> > >> >>> > > >>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>> server
>>> > >> > >> >>> > > >>>>>>>> wait
>>> > >> > >> >>> > > >>>>>>>> for
>>> > >> > >> >>> > each
>>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>>> > >> > >> >>> > > >>>>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>> give
>>> > >> > >> >>> > > >>>>>>>> us
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>>> overall
>>> > >> > >> >>> > > >>>>>>>> timeout.)
>>> > >> > >> >>> > > >>>>>>>> As
>>> > >> > >> >>> > an
>>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>>> > >> > >> >>> > > >>>>>>>>
>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
>>> > >> > >> >>> > > >>>>>>>> we
>>> > >> > >> >>> > > >>>>>>>> will
>>> > >> > >> >>> > > >>>>>>>> want
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > design
>>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>>> > >> > bidirectional
>>> > >> > >> >>> > > >>>>>>>> stream
>>> > >> > >> >>> > > >>>>>>>> like
>>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>>> > >> > >> >>> > > >>>>>>>> pitfalls
>>> > >> > >> >>> > > >>>>>>>> easier
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>> run
>>> > >> > >> >>> > into.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>>> > >> > >> >>> > > >>>>>>>> proposal,
>>> > >> > but
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > main
>>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
>>> > >> > >> >>> > > >>>>>>>> two
>>> > >> > >> >>> > > >>>>>>>> separate
>>> > >> > >> >>> > > >>>>>>>> calls
>>> > >> > >> >>> > > >>>>>>>> could
>>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>>> > >> > >> >>> > > >>>>>>>> Additionally,
>>> > >> > >> >>> > > >>>>>>>> gRPC
>>> > >> > >> >>> > > >>>>>>>> has
>>> > >> > >> >>> > > >>>>>>>> some
>>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>>> > >> > >> >>> > > >>>>>>>> away
>>> in
>>> > >> > >> >>> > > >>>>>>>> between
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > two
>>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>>> > >> > >> >>> > > >>>>>>>> another
>>> > >> > instance
>>> > >> > >> >>> > available,
>>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
>>> > server
>>> > >> > >> without
>>> > >> > >> >>> > > >>>>>>>> warning.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
>>> > wrote:
>>> > >> > >> >>> > > >>>>>>>>> Hey David,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
>>> > >> > >> >>> > > >>>>>>>>> I
>>> > like
>>> > >> > >> >>> > > >>>>>>>>> it
>>> > >> > >> >>> > > >>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> possibility
>>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>>> > >> > >> >>> > > >>>>>>>>> thing
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>>> help
>>> > >> > >> >>> > me
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> be
>>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
>>> use
>>> > >> > >> >>> > > >>>>>>>>> case.
>>> > >> > >> >>> > > >>>>>>>>> Also,
>>> > >> > >> >>> > what
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>>> > >> > >> >>> > > >>>>>>>>> asyc?
>>> > >> > >> >>> > > >>>>>>>>> Would
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client
>>> > >> > >> >>> > > >>>>>>>>> block
>>> > >> > >> >>> > > >>>>>>>> till
>>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>>> > >> > >> flight.vector_mult(a,
>>> > >> > >> >>> > > >>>>>>>>> b)
>>> > >> > >> >>> > or
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
>>> > >> > >> >>> > > >>>>>>>>> was
>>> > >> > >> >>> > > >>>>>>>>> done.
>>> > >> > >> >>> > > >>>>>>>>> If
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> later
>>> > >> > >> >>> > > >>>>>>>>> how
>>> > >> > >> >>> > > >>>>>>>>> is
>>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>>> suppose
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> this
>>> > >> > >> >>> > could
>>> > >> > >> >>> > > >>>> be
>>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>>> > >> > >> >>> > > >>>>>>>>> interface
>>> > >> > >> >>> > > >>>>>>>>> but
>>> > >> > >> rather
>>> > >> > >> >>> > > >>>>>>>>> by a
>>> > >> > >> >>> > > >>>>>>>>> function/util?
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Best,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > >> > >> >>> > li.davidm96@gmail.com>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
>>> > >> > >> >>> > > >>>>>>>>>> so
>>> > >> > >> >>> > > >>>>>>>>>> far,
>>> > >> > but
>>> > >> > >> we
>>> > >> > >> >>> > > >>>>>>>>>> have
>>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
>>> being
>>> > >> > >> >>> > > >>>>>>>>>> able
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> both
>>> > >> > >> >>> > > >>>>>>>>>> send
>>> > >> > >> >>> > > >>>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>>> > >> > >> >>> > > >>>>>>>>>> call.
>>> To
>>> > >> > >> >>> > > >>>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>>> end,
>>> > >> > >> >>> > I've
>>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>>> comment
>>> > >> > >> >>> > > >>>>>>>>>> on
>>> > >> > the
>>> > >> > >> >>> > document.
>>> > >> > >> >>> > > >>>>>>>>>> I'd
>>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>>> > >> > >> >>> > > >>>>>>>>>> relatively
>>> > >> > >> >>> > > >>>>>>>>>> straightforward
>>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
>>> > vote.
>>> > >> > I've
>>> > >> > >> >>> > > >>>>>>>>>> decided
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>>> > >> > >> >>> > > >>>>>>>>>> proposed
>>> (on
>>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>>> > >> > >> >>> > > >>>>>>>>>> as
>>> > >> > >> >>> > > >>>>>>>>>> it
>>> > >> > >> >>> > > >>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>>> > >> > >> >>> > > >>>>>>>>>> interpretation
>>> > >> > of
>>> > >> > >> >>> > > >>>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>>> semantics.
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> --
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>>> > >> > >> join
>>> > >> > >> >>> > > >>>>>>>>> our
>>> > >> > >> >>> > > >>>>>>>>> community
>>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>>> Download
>>> > >> > Dremio
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >
>>> > >> > >> >>> >
>>> > >> > >> >>
>>> > >> > >> >
>>> > >> > >>
>>> > >> > >
>>> > >> >
>>> > >
>>> >
>>>
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Andy,

I've been rather busy unfortunately. I had started on an
implementation in C++ to provide as part of this discussion, but it's
not complete. I'm hoping to have more done in March.

Best,
David

On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> I was wondering if there had been any momentum on this (the BiDirectional
> RPC design)?
>
> I'm interested in this for the use case of Apache Spark sending a stream of
> data to another process to invoke custom code and then receive a stream
> back with the transformed data.
>
> Thanks,
>
> Andy.
>
>
>
> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> I support moving forward with the current proposal.
>>
>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>>
>> > Just following up here again, any other thoughts?
>> >
>> > I think we do have justifications for potentially separate streams in
>> > a call, but that's more of an orthogonal question - it doesn't need to
>> > be addressed here. I do agree that it very much complicates things.
>> >
>> > Thanks,
>> > David
>> >
>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>> > > I would generally agree with this. Note that you have the possibility
>> > > to use unions-of-structs to send record batches with different
>> > > schemas
>> > > in the same stream, though with some added complexity on each side
>> > >
>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
>> > wrote:
>> > >>
>> > >> I'd vote for explicitly not supported. We should keep our primitives
>> > >> narrow.
>> > >>
>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Thanks for the feedback.
>> > >> >
>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>> > >> > there are a lot of places where things could be made more
>> > >> > ergonomic,
>> > >> > including with the metadata fields. But it would also have locked
>> out
>> > >> > us of potential future transports.
>> > >> >
>> > >> > On another note: I hesitate to put too much into this method, but
>> > >> > we
>> > >> > are looking at use cases where potentially, a client may want to
>> > >> > upload multiple distinct datasets (with differing schemas). (This
>> is a
>> > >> > little tentative, and I can get more details...) Right now, each
>> > >> > logical stream in Flight must have a single, consistent schema;
>> would
>> > >> > it make sense to look at ways to relax this, or declare this
>> > >> > explicitly out of scope (and require multiple calls and
>> > >> > coordination
>> > >> > with the deployment topology) in order to accomplish this?
>> > >> >
>> > >> > Best,
>> > >> > David
>> > >> >
>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>> looks
>> > >> > > good
>> > >> > > to me.
>> > >> > >
>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > >> I've updated the proposal.
>> > >> > >>
>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> Protobuf
>> > >> > types,
>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>> > >> > >> to
>> > >> > >> C++
>> > >> > >> users without causing build pains,
>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> > >> > >> field
>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>> > >> > >> linking
>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>> into
>> > >> > >> the Python ones),
>> > >> > >> - All our other application-defined fields are already bytes.
>> > >> > >>
>> > >> > >> Applications that want structure can encode JSON or Protobuf
>> > >> > >> Any
>> > >> > >> into
>> > >> > >> the bytes field themselves, much as you can already do for
>> Ticket,
>> > >> > >> commands in FlightDescriptors, and application metadata in
>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>> using
>> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
>> > must
>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> message.
>> > >> > >>
>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>> > >> > >> to
>> > >> > >> define a new message with a oneof either, since it would be
>> > >> > >> redundant.
>> > >> > >>
>> > >> > >> Thanks,
>> > >> > >> David
>> > >> > >>
>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > >> > >> > I've been extremely backlogged, I will update the proposal
>> when I
>> > >> > >> > get
>> > >> > >> > a chance and reply here when done.
>> > >> > >> >
>> > >> > >> > Best,
>> > >> > >> > David
>> > >> > >> >
>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
>> It
>> > >> > >> >> seems
>> > >> > >> >> there are still some questions here, could we summarize what
>> are
>> > >> > >> >> the
>> > >> > >> >> alternatives along with any public API implications so we
>> > >> > >> >> can
>> > try
>> > >> > >> >> to
>> > >> > >> >> render a decision?
>> > >> > >> >>
>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>> li.davidm96@gmail.com
>> > >
>> > >> > >> >> wrote:
>> > >> > >> >>>
>> > >> > >> >>> Hi Wes,
>> > >> > >> >>>
>> > >> > >> >>> Responses inline:
>> > >> > >> >>>
>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>> wesmckinn@gmail.com>
>> > >> > wrote:
>> > >> > >> >>>
>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> > >> > >> >>> > <li...@gmail.com>
>> > >> > >> >>> > wrote:
>> > >> > >> >>> > >
>> > >> > >> >>> > > The question is whether to repurpose the existing
>> > FlightData
>> > >> > >> >>> > > structure, and allow for the metadata field to be
>> > >> > >> >>> > > filled
>> in
>> > >> > >> >>> > > and
>> > >> > >> data
>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
>> > >> > >> >>> > > the
>> > >> > >> FlightData
>> > >> > >> >>> > > structure in another structure that explicitly
>> > distinguishes
>> > >> > >> between
>> > >> > >> >>> > > control and data messages.
>> > >> > >> >>> >
>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>> > >> > >> >>> > with
>> > >> > >> >>> > empty
>> > >> > >> body.
>> > >> > >> >>> > One question to consider is what changes (if any) would
>> need
>> > to
>> > >> > >> >>> > be
>> > >> > >> >>> > made to public APIs in either scenario.
>> > >> > >> >>> >
>> > >> > >> >>>
>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
>> data
>> > >> > >> >>> messages
>> > >> > >> >>> in
>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
>> > change
>> > >> > >> >>> the
>> > >> > >> >>> wire
>> > >> > >> >>> format. I think the APIs could be changed backwards
>> compatibly,
>> > >> > >> >>> though.
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> > > The other question is how to handle the metadata
>> > >> > >> >>> > > fields.
>> So
>> > >> > >> >>> > > far,
>> > >> > >> >>> > > we've
>> > >> > >> >>> > > used bytestring fields for application-defined data.
>> > >> > >> >>> > > This
>> > is
>> > >> > >> >>> > > workable
>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
>> those
>> > >> > >> >>> > > fields,
>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from
>> the
>> > >> > >> >>> > > bytestring
>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> > >> > >> >>> > > dynamically
>> > >> > >> >>> > > typed
>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
>> > >> > >> >>> > > exposing
>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>> > >> > >> >>> > > combination
>> of
>> > >> > >> >>> > > a
>> > >> > >> >>> > > type
>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>> > >> > >> >>> > > Protobuf
>> > >> > >> >>> > > Any
>> > >> > >> >>> > > type
>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
>> > cleaner
>> > >> > >> >>> > > in
>> > >> > >> any
>> > >> > >> >>> > > of the language APIs, though.
>> > >> > >> >>> >
>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
>> > >> > >> >>> > particular
>> > >> > >> >>> > Flight implementation seems on first principles like the
>> most
>> > >> > >> flexible
>> > >> > >> >>> > thing, if Any is used, does that mean the metadata _must_
>> be
>> > a
>> > >> > >> >>> > protobuf?
>> > >> > >> >>> >
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
>> > >> > >> >>> but
>> it
>> > >> > would
>> > >> > >> >>> have
>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>> > >> > >> >>> Then
>> > the
>> > >> > >> >>> question
>> > >> > >> >>> would just be how to expose this (would be easier in Java,
>> > harder
>> > >> > >> >>> in
>> > >> > >> >>> C++).
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> > > David
>> > >> > >> >>> > >
>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>> > >> > >> >>> > > > non-protobuf
>> > >> > >> >>> > > > terms?
>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
>> > Flight.
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Regards
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Antoine.
>> > >> > >> >>> > > >
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
>> > would
>> > >> > just
>> > >> > >> be
>> > >> > >> >>> > > >> application-level logic. (The official guide doesn't
>> > even
>> > >> > >> mention
>> > >> > >> >>> > > >> it
>> > >> > >> >>> > > >> in the encoding docs; I found
>> > >> > >> >>> > > >>
>> > >> > >> >>> >
>> > >> > >>
>> > >> >
>> >
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > >> > >> >>> > > >> as well.)
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> > >> > >> >>> > > >> essentially
>> > >> > >> >>> > > >> inlining
>> > >> > >> >>> > > >> the definition of Any, e.g.
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> message FlightMessage {
>> > >> > >> >>> > > >>   oneof message {
>> > >> > >> >>> > > >>     FlightData data = 1;
>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>> > >> > >> >>> > > >>   }
>> > >> > >> >>> > > >> }
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> message FlightAny {
>> > >> > >> >>> > > >>   string type = 1;
>> > >> > >> >>> > > >>   bytes data = 2;
>> > >> > >> >>> > > >> }
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> Is this correct?
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> It might be nice to consider the wrapper message for
>> > >> > >> >>> > > >> DoGet/DoPut
>> > >> > >> >>> > > >> as
>> > >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
>> > with
>> > >> > >> >>> > > >> all
>> > >> > >> >>> > > >> of
>> > >> > >> >>> > > >> them,
>> > >> > >> >>> > > >> rather than have one of the three methods do its own
>> > >> > >> >>> > > >> thing.
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> Thanks,
>> > >> > >> >>> > > >> David
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> > >> > >> >>> > > >>> I think we could probably expose the oneof behavior
>> > >> > >> >>> > > >>> without
>> > >> > >> >>> > > >>> exposing
>> > >> > >> >>> > the
>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
>> could
>> > >> > >> >>> > > >>> expose
>> > >> > >> >>> > > >>> as
>> > >> > >> >>> > > >>> two
>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
>> > >> > >> >>> > > >>> whatever
>> > >> > >> >>> > > >>> but
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > > >>> people
>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>> > >> > >> >>> > > >>> (Basically
>> a
>> > >> > >> >>> > > >>> user
>> > >> > >> >>> > > >>> could
>> > >> > >> >>> > > >>> use
>> > >> > >> >>> > > >>> any
>> > >> > >> >>> > > >>> with it easily but they could also use any other
>> > >> > >> >>> > > >>> mechanism).
>> > >> > >> >>> > > >>> At
>> > >> > >> >>> > least in
>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
>> other
>> > >> > >> language
>> > >> > >> >>> > > >>> bindings
>> > >> > >> >>> > > >>> less diy?
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>> > >> > >> >>> > > >>> metadata
>> > >> > >> >>> > > >>> but
>> > >> > >> >>> > > >>> it
>> > >> > >> >>> > just
>> > >> > >> >>> > > >>> seemed a bit janky.
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
>> > thing,
>> > >> > >> >>> > > >>> I
>> > >> > >> >>> > > >>> wonder
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > we
>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
>> > property
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > > >>> we
>> > >> > >> >>> > think it
>> > >> > >> >>> > > >>> is
>> > >> > >> >>> > > >>> a good idea...
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> > >> > >> li.davidm96@gmail.com>
>> > >> > >> >>> > wrote:
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>>> I was definitely considering having control
>> > >> > >> >>> > > >>>> messages
>> > >> > without
>> > >> > >> >>> > > >>>> data,
>> > >> > >> >>> > and
>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
>> > >> > >> >>> > > >>>> with
>> > >> > >> >>> > > >>>> only
>> > >> > >> >>> > app_metadata
>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>> > FlightData
>> > >> > >> >>> > > >>>> should
>> > >> > >> >>> > always
>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
>> > >> > >> >>> > > >>>> documentation
>> > >> > >> >>> > > >>>> on
>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>> > >> > >> >>> > > >>>> about
>> > >> > >> >>> > > >>>> having
>> > >> > >> >>> > > >>>> a
>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
>> from
>> > >> > >> >>> > > >>>> happening,
>> > >> > >> >>> > and
>> > >> > >> >>> > > >>>> overall having a clear separation between data and
>> > >> > >> >>> > > >>>> control
>> > >> > >> >>> > > >>>> messages
>> > >> > >> >>> > is
>> > >> > >> >>> > > >>>> cleaner.
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>> > >> > >> >>> > > >>>> refrained
>> > >> > >> >>> > > >>>> from
>> > >> > >> >>> > > >>>> exposing
>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
>> that
>> > >> > >> >>> > > >>>> now?
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> Best,
>> > >> > >> >>> > > >>>> David
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
>> > wrote:
>> > >> > >> >>> > > >>>>> Hey David,
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
>> > >> > >> >>> > > >>>>> use
>> > >> > >> >>> > > >>>>> for
>> > >> > >> >>> > > >>>>> doget/doput
>> > >> > >> >>> > > >>>>> for
>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
>> > async
>> > >> > >> >>> > > >>>>> always
>> > >> > >> >>> > pattern.
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
>> > >> > >> >>> > > >>>>> is
>> > >> > >> >>> > > >>>>> overloading
>> > >> > >> >>> > > >>>>> the
>> > >> > >> >>> > > >>>> message
>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>> control
>> > >> > >> >>> > > >>>>> message
>> > >> > >> >>> > > >>>> independently
>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>> something
>> > >> > >> >>> > > >>>>> like
>> > >> > >> >>> > > >>>>> an
>> > >> > >> >>> > > >>>>> empty
>> > >> > >> >>> > > >>>> flight
>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
>> > support
>> > >> > >> >>> > > >>>>> a
>> > >> > >> >>> > > >>>>> container
>> > >> > >> >>> > > >>>>> object
>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
>> users
>> > >> > >> >>> > > >>>>> can
>> > >> > >> >>> > > >>>>> add
>> > >> > >> >>> > > >>>>> more
>> > >> > >> >>> > data
>> > >> > >> >>> > > >>>>> as
>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for the
>> Any
>> > >> > >> >>> > > >>>>> messages.
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>> > >> > >> >>> > > >>>>> wrote:
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>>> Hi Jacques,
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
>> result
>> > >> > >> >>> > > >>>>>> of
>> > >> > >> prior
>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>> > >> > >> >>> > > >>>>>> message
>> to
>> > >> > carry
>> > >> > >> >>> > > >>>>>> that
>> > >> > >> >>> > kind
>> > >> > >> >>> > > >>>>>> of information.
>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
>> > handle
>> > >> > >> >>> > > >>>>>> incoming
>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>> > >> > >> >>> > > >>>>>> actually
>> > be
>> > >> > >> >>> > > >>>>>> quite
>> > >> > >> >>> > natural
>> > >> > >> >>> > > >>>>>> to
>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
>> not
>> > >> > >> >>> > > >>>>>> possible
>> > >> > >> >>> > > >>>>>> in
>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
>> > >> > >> >>> > > >>>>>> (In
>> > >> > essence,
>> > >> > >> >>> > gRPC-Java
>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
>> There
>> > >> > >> >>> > > >>>>>> are
>> > >> > >> >>> > experimental
>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
>> > >> > >> >>> > > >>>>>> to
>> > >> > >> >>> > > >>>>>> Java,
>> > >> > >> >>> > > >>>>>> but
>> > >> > >> >>> > > >>>>>> those
>> > >> > >> >>> > > >>>>>> are
>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
>> > still
>> > >> > >> >>> > > >>>>>> under
>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
>> which
>> > >> > >> >>> > > >>>>>> have
>> > >> > >> been
>> > >> > >> >>> > around
>> > >> > >> >>> > > >>>>>> for quite a while).
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> Thanks,
>> > >> > >> >>> > > >>>>>> David
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
>> > >> > >> >>> > > >>>>>> wrote:
>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>> > >> > >> >>> > > >>>>>>> Might
>> > >> > >> >>> > > >>>>>>> worth
>> > >> > >> >>> > > >>>>>>> discussion
>> > >> > >> >>> > > >>>>>>> here
>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>> > >> > >> >>> > > >>>> wrote:
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>> provide a
>> > >> > >> >>> > > >>>>>>>> Python
>> > >> > >> >>> > strawman.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
>> > could
>> > >> > >> >>> > > >>>>>>>> interleave
>> > >> > >> >>> > > >>>> uploads
>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
>> now,
>> > >> > >> >>> > > >>>>>>>> synchronous
>> > >> > >> >>> > APIs
>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
>> > >> > >> >>> > > >>>>>>>> server
>> > >> > >> >>> > > >>>>>>>> wait
>> > >> > >> >>> > > >>>>>>>> for
>> > >> > >> >>> > each
>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>> > >> > >> >>> > > >>>>>>>> doesn't
>> > >> > >> >>> > > >>>>>>>> give
>> > >> > >> >>> > > >>>>>>>> us
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>> overall
>> > >> > >> >>> > > >>>>>>>> timeout.)
>> > >> > >> >>> > > >>>>>>>> As
>> > >> > >> >>> > an
>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>> > >> > >> >>> > > >>>>>>>>
>> https://issues.apache.org/jira/browse/ARROW-6063
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
>> > >> > >> >>> > > >>>>>>>> we
>> > >> > >> >>> > > >>>>>>>> will
>> > >> > >> >>> > > >>>>>>>> want
>> > >> > >> >>> > > >>>>>>>> to
>> > >> > >> >>> > design
>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> > >> > bidirectional
>> > >> > >> >>> > > >>>>>>>> stream
>> > >> > >> >>> > > >>>>>>>> like
>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>> > >> > >> >>> > > >>>>>>>> pitfalls
>> > >> > >> >>> > > >>>>>>>> easier
>> > >> > >> >>> > > >>>>>>>> to
>> > >> > >> >>> > > >>>>>>>> run
>> > >> > >> >>> > into.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> > >> > >> >>> > > >>>>>>>> proposal,
>> > >> > but
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > main
>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
>> > >> > >> >>> > > >>>>>>>> two
>> > >> > >> >>> > > >>>>>>>> separate
>> > >> > >> >>> > > >>>>>>>> calls
>> > >> > >> >>> > > >>>>>>>> could
>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>> > >> > >> >>> > > >>>>>>>> Additionally,
>> > >> > >> >>> > > >>>>>>>> gRPC
>> > >> > >> >>> > > >>>>>>>> has
>> > >> > >> >>> > > >>>>>>>> some
>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>> > >> > >> >>> > > >>>>>>>> away
>> in
>> > >> > >> >>> > > >>>>>>>> between
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > two
>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>> > >> > >> >>> > > >>>>>>>> another
>> > >> > instance
>> > >> > >> >>> > available,
>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
>> > server
>> > >> > >> without
>> > >> > >> >>> > > >>>>>>>> warning.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Thanks,
>> > >> > >> >>> > > >>>>>>>> David
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
>> > wrote:
>> > >> > >> >>> > > >>>>>>>>> Hey David,
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
>> > like
>> > >> > >> >>> > > >>>>>>>>> it
>> > >> > >> >>> > > >>>>>>>>> and
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> possibility
>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>> > >> > >> >>> > > >>>>>>>>> thing
>> > >> > >> >>> > > >>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>> would
>> > >> > >> >>> > > >>>>>>>>> help
>> > >> > >> >>> > me
>> > >> > >> >>> > > >>>>>> would
>> > >> > >> >>> > > >>>>>>>> be
>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
>> use
>> > >> > >> >>> > > >>>>>>>>> case.
>> > >> > >> >>> > > >>>>>>>>> Also,
>> > >> > >> >>> > what
>> > >> > >> >>> > > >>>>>> would
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>> > >> > >> >>> > > >>>>>>>>> asyc?
>> > >> > >> >>> > > >>>>>>>>> Would
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client
>> > >> > >> >>> > > >>>>>>>>> block
>> > >> > >> >>> > > >>>>>>>> till
>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> > >> > >> flight.vector_mult(a,
>> > >> > >> >>> > > >>>>>>>>> b)
>> > >> > >> >>> > or
>> > >> > >> >>> > > >>>>>>>>> would
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
>> > >> > >> >>> > > >>>>>>>>> was
>> > >> > >> >>> > > >>>>>>>>> done.
>> > >> > >> >>> > > >>>>>>>>> If
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> later
>> > >> > >> >>> > > >>>>>>>>> how
>> > >> > >> >>> > > >>>>>>>>> is
>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>> suppose
>> > >> > >> >>> > > >>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>> this
>> > >> > >> >>> > could
>> > >> > >> >>> > > >>>> be
>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>> > >> > >> >>> > > >>>>>>>>> interface
>> > >> > >> >>> > > >>>>>>>>> but
>> > >> > >> rather
>> > >> > >> >>> > > >>>>>>>>> by a
>> > >> > >> >>> > > >>>>>>>>> function/util?
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Best,
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Ryan
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > >> > >> >>> > li.davidm96@gmail.com>
>> > >> > >> >>> > > >>>>>> wrote:
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
>> > >> > >> >>> > > >>>>>>>>>> so
>> > >> > >> >>> > > >>>>>>>>>> far,
>> > >> > but
>> > >> > >> we
>> > >> > >> >>> > > >>>>>>>>>> have
>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
>> being
>> > >> > >> >>> > > >>>>>>>>>> able
>> > >> > >> >>> > > >>>>>>>>>> to
>> > >> > >> >>> > > >>>>>>>>>> both
>> > >> > >> >>> > > >>>>>>>>>> send
>> > >> > >> >>> > > >>>>>>>>>> and
>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>> > >> > >> >>> > > >>>>>>>>>> call.
>> To
>> > >> > >> >>> > > >>>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>>> end,
>> > >> > >> >>> > I've
>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> >
>> > >> > >>
>> > >> >
>> >
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>> comment
>> > >> > >> >>> > > >>>>>>>>>> on
>> > >> > the
>> > >> > >> >>> > document.
>> > >> > >> >>> > > >>>>>>>>>> I'd
>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>> > >> > >> >>> > > >>>>>>>>>> relatively
>> > >> > >> >>> > > >>>>>>>>>> straightforward
>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
>> > vote.
>> > >> > I've
>> > >> > >> >>> > > >>>>>>>>>> decided
>> > >> > >> >>> > > >>>>>>>>>> to
>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed
>> (on
>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>> > >> > >> >>> > > >>>>>>>>>> as
>> > >> > >> >>> > > >>>>>>>>>> it
>> > >> > >> >>> > > >>>>>> doesn't
>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> > >> > >> >>> > > >>>>>>>>>> interpretation
>> > >> > of
>> > >> > >> >>> > > >>>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>>> semantics.
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>> > >> > >> >>> > > >>>>>>>>>> David
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> --
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> > >> > >> join
>> > >> > >> >>> > > >>>>>>>>> our
>> > >> > >> >>> > > >>>>>>>>> community
>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>> Download
>> > >> > Dremio
>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >
>> > >> > >> >>> >
>> > >> > >> >>
>> > >> > >> >
>> > >> > >>
>> > >> > >
>> > >> >
>> > >
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Andy Grove <an...@gmail.com>.
I was wondering if there had been any momentum on this (the BiDirectional
RPC design)?

I'm interested in this for the use case of Apache Spark sending a stream of
data to another process to invoke custom code and then receive a stream
back with the transformed data.

Thanks,

Andy.



On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org> wrote:

> I support moving forward with the current proposal.
>
> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>
> > Just following up here again, any other thoughts?
> >
> > I think we do have justifications for potentially separate streams in
> > a call, but that's more of an orthogonal question - it doesn't need to
> > be addressed here. I do agree that it very much complicates things.
> >
> > Thanks,
> > David
> >
> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > > I would generally agree with this. Note that you have the possibility
> > > to use unions-of-structs to send record batches with different schemas
> > > in the same stream, though with some added complexity on each side
> > >
> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> > wrote:
> > >>
> > >> I'd vote for explicitly not supported. We should keep our primitives
> > >> narrow.
> > >>
> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> > >>
> > >> > Thanks for the feedback.
> > >> >
> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> > >> > there are a lot of places where things could be made more ergonomic,
> > >> > including with the metadata fields. But it would also have locked
> out
> > >> > us of potential future transports.
> > >> >
> > >> > On another note: I hesitate to put too much into this method, but we
> > >> > are looking at use cases where potentially, a client may want to
> > >> > upload multiple distinct datasets (with differing schemas). (This
> is a
> > >> > little tentative, and I can get more details...) Right now, each
> > >> > logical stream in Flight must have a single, consistent schema;
> would
> > >> > it make sense to look at ways to relax this, or declare this
> > >> > explicitly out of scope (and require multiple calls and coordination
> > >> > with the deployment topology) in order to accomplish this?
> > >> >
> > >> > Best,
> > >> > David
> > >> >
> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> looks
> > >> > > good
> > >> > > to me.
> > >> > >
> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > >> I've updated the proposal.
> > >> > >>
> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> > >> > >> errors/metadata, I still think using bytes is preferable:
> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> Protobuf
> > >> > types,
> > >> > >> - We wouldn't be able to practically expose the Protobuf field to
> > >> > >> C++
> > >> > >> users without causing build pains,
> > >> > >> - We can't let Python users take advantage of the Protobuf field
> > >> > >> without somehow being compatible with the Protobuf wheels (by
> > >> > >> linking
> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> into
> > >> > >> the Python ones),
> > >> > >> - All our other application-defined fields are already bytes.
> > >> > >>
> > >> > >> Applications that want structure can encode JSON or Protobuf Any
> > >> > >> into
> > >> > >> the bytes field themselves, much as you can already do for
> Ticket,
> > >> > >> commands in FlightDescriptors, and application metadata in
> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> using
> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
> > must
> > >> > >> invoke the Protobuf deserializer again to read the actual
> message.
> > >> > >>
> > >> > >> If we decide on using bytes, then I don't think it makes sense to
> > >> > >> define a new message with a oneof either, since it would be
> > >> > >> redundant.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> David
> > >> > >>
> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> > >> > >> > I've been extremely backlogged, I will update the proposal
> when I
> > >> > >> > get
> > >> > >> > a chance and reply here when done.
> > >> > >> >
> > >> > >> > Best,
> > >> > >> > David
> > >> > >> >
> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
> It
> > >> > >> >> seems
> > >> > >> >> there are still some questions here, could we summarize what
> are
> > >> > >> >> the
> > >> > >> >> alternatives along with any public API implications so we can
> > try
> > >> > >> >> to
> > >> > >> >> render a decision?
> > >> > >> >>
> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> li.davidm96@gmail.com
> > >
> > >> > >> >> wrote:
> > >> > >> >>>
> > >> > >> >>> Hi Wes,
> > >> > >> >>>
> > >> > >> >>> Responses inline:
> > >> > >> >>>
> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> wesmckinn@gmail.com>
> > >> > wrote:
> > >> > >> >>>
> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> > >> > >> >>> > <li...@gmail.com>
> > >> > >> >>> > wrote:
> > >> > >> >>> > >
> > >> > >> >>> > > The question is whether to repurpose the existing
> > FlightData
> > >> > >> >>> > > structure, and allow for the metadata field to be filled
> in
> > >> > >> >>> > > and
> > >> > >> data
> > >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> > >> > >> FlightData
> > >> > >> >>> > > structure in another structure that explicitly
> > distinguishes
> > >> > >> between
> > >> > >> >>> > > control and data messages.
> > >> > >> >>> >
> > >> > >> >>> > I'm not super against having metadata-only FlightData with
> > >> > >> >>> > empty
> > >> > >> body.
> > >> > >> >>> > One question to consider is what changes (if any) would
> need
> > to
> > >> > >> >>> > be
> > >> > >> >>> > made to public APIs in either scenario.
> > >> > >> >>> >
> > >> > >> >>>
> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
> data
> > >> > >> >>> messages
> > >> > >> >>> in
> > >> > >> >>> the future. This would be a breaking change, but wouldn't
> > change
> > >> > >> >>> the
> > >> > >> >>> wire
> > >> > >> >>> format. I think the APIs could be changed backwards
> compatibly,
> > >> > >> >>> though.
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> > > The other question is how to handle the metadata fields.
> So
> > >> > >> >>> > > far,
> > >> > >> >>> > > we've
> > >> > >> >>> > > used bytestring fields for application-defined data. This
> > is
> > >> > >> >>> > > workable
> > >> > >> >>> > > if you want to use Protobuf to define the contents of
> those
> > >> > >> >>> > > fields,
> > >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from
> the
> > >> > >> >>> > > bytestring
> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> > >> > >> >>> > > dynamically
> > >> > >> >>> > > typed
> > >> > >> >>> > > field, this would be more convenient, but then we'd be
> > >> > >> >>> > > exposing
> > >> > >> >>> > > Protobuf types. We could alternatively use a combination
> of
> > >> > >> >>> > > a
> > >> > >> >>> > > type
> > >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> > >> > >> >>> > > Any
> > >> > >> >>> > > type
> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> > cleaner
> > >> > >> >>> > > in
> > >> > >> any
> > >> > >> >>> > > of the language APIs, though.
> > >> > >> >>> >
> > >> > >> >>> > Leaving the deserialization of the app metadata to the
> > >> > >> >>> > particular
> > >> > >> >>> > Flight implementation seems on first principles like the
> most
> > >> > >> flexible
> > >> > >> >>> > thing, if Any is used, does that mean the metadata _must_
> be
> > a
> > >> > >> >>> > protobuf?
> > >> > >> >>> >
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> If Any is used, we could still expose a bytes-based API, but
> it
> > >> > would
> > >> > >> >>> have
> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then
> > the
> > >> > >> >>> question
> > >> > >> >>> would just be how to expose this (would be easier in Java,
> > harder
> > >> > >> >>> in
> > >> > >> >>> C++).
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> > > David
> > >> > >> >>> > >
> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > >> > >> >>> > > >
> > >> > >> >>> > > > Can one of you explain what is being proposed in
> > >> > >> >>> > > > non-protobuf
> > >> > >> >>> > > > terms?
> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> > Flight.
> > >> > >> >>> > > >
> > >> > >> >>> > > > Regards
> > >> > >> >>> > > >
> > >> > >> >>> > > > Antoine.
> > >> > >> >>> > > >
> > >> > >> >>> > > >
> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> > would
> > >> > just
> > >> > >> be
> > >> > >> >>> > > >> application-level logic. (The official guide doesn't
> > even
> > >> > >> mention
> > >> > >> >>> > > >> it
> > >> > >> >>> > > >> in the encoding docs; I found
> > >> > >> >>> > > >>
> > >> > >> >>> >
> > >> > >>
> > >> >
> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> > >> >>> > > >> as well.)
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> > >> > >> >>> > > >> essentially
> > >> > >> >>> > > >> inlining
> > >> > >> >>> > > >> the definition of Any, e.g.
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> message FlightMessage {
> > >> > >> >>> > > >>   oneof message {
> > >> > >> >>> > > >>     FlightData data = 1;
> > >> > >> >>> > > >>     FlightAny metadata = 2;
> > >> > >> >>> > > >>   }
> > >> > >> >>> > > >> }
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> message FlightAny {
> > >> > >> >>> > > >>   string type = 1;
> > >> > >> >>> > > >>   bytes data = 2;
> > >> > >> >>> > > >> }
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> Is this correct?
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> It might be nice to consider the wrapper message for
> > >> > >> >>> > > >> DoGet/DoPut
> > >> > >> >>> > > >> as
> > >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
> > with
> > >> > >> >>> > > >> all
> > >> > >> >>> > > >> of
> > >> > >> >>> > > >> them,
> > >> > >> >>> > > >> rather than have one of the three methods do its own
> > >> > >> >>> > > >> thing.
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> Thanks,
> > >> > >> >>> > > >> David
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> wrote:
> > >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> > >> > >> >>> > > >>> without
> > >> > >> >>> > > >>> exposing
> > >> > >> >>> > the
> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
> could
> > >> > >> >>> > > >>> expose
> > >> > >> >>> > > >>> as
> > >> > >> >>> > > >>> two
> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
> > >> > >> >>> > > >>> whatever
> > >> > >> >>> > > >>> but
> > >> > >> >>> > > >>> if
> > >> > >> >>> > > >>> people
> > >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically
> a
> > >> > >> >>> > > >>> user
> > >> > >> >>> > > >>> could
> > >> > >> >>> > > >>> use
> > >> > >> >>> > > >>> any
> > >> > >> >>> > > >>> with it easily but they could also use any other
> > >> > >> >>> > > >>> mechanism).
> > >> > >> >>> > > >>> At
> > >> > >> >>> > least in
> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
> other
> > >> > >> language
> > >> > >> >>> > > >>> bindings
> > >> > >> >>> > > >>> less diy?
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> > >> > >> >>> > > >>> metadata
> > >> > >> >>> > > >>> but
> > >> > >> >>> > > >>> it
> > >> > >> >>> > just
> > >> > >> >>> > > >>> seemed a bit janky.
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
> > thing,
> > >> > >> >>> > > >>> I
> > >> > >> >>> > > >>> wonder
> > >> > >> >>> > > >>> if
> > >> > >> >>> > we
> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> > property
> > >> > >> >>> > > >>> if
> > >> > >> >>> > > >>> we
> > >> > >> >>> > think it
> > >> > >> >>> > > >>> is
> > >> > >> >>> > > >>> a good idea...
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> > >> > >> li.davidm96@gmail.com>
> > >> > >> >>> > wrote:
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>>> I was definitely considering having control messages
> > >> > without
> > >> > >> >>> > > >>>> data,
> > >> > >> >>> > and
> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> > >> > >> >>> > > >>>> only
> > >> > >> >>> > app_metadata
> > >> > >> >>> > > >>>> set. I think I understand your position now:
> > FlightData
> > >> > >> >>> > > >>>> should
> > >> > >> >>> > always
> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> > >> > >> >>> > > >>>> documentation
> > >> > >> >>> > > >>>> on
> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> > >> > >> >>> > > >>>> having
> > >> > >> >>> > > >>>> a
> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
> from
> > >> > >> >>> > > >>>> happening,
> > >> > >> >>> > and
> > >> > >> >>> > > >>>> overall having a clear separation between data and
> > >> > >> >>> > > >>>> control
> > >> > >> >>> > > >>>> messages
> > >> > >> >>> > is
> > >> > >> >>> > > >>>> cleaner.
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> > >> > >> >>> > > >>>> from
> > >> > >> >>> > > >>>> exposing
> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
> that
> > >> > >> >>> > > >>>> now?
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> Best,
> > >> > >> >>> > > >>>> David
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> > wrote:
> > >> > >> >>> > > >>>>> Hey David,
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> > >> > >> >>> > > >>>>> for
> > >> > >> >>> > > >>>>> doget/doput
> > >> > >> >>> > > >>>>> for
> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> > async
> > >> > >> >>> > > >>>>> always
> > >> > >> >>> > pattern.
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> > >> > >> >>> > > >>>>> overloading
> > >> > >> >>> > > >>>>> the
> > >> > >> >>> > > >>>> message
> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> control
> > >> > >> >>> > > >>>>> message
> > >> > >> >>> > > >>>> independently
> > >> > >> >>> > > >>>>> of the data message, I would have to define
> something
> > >> > >> >>> > > >>>>> like
> > >> > >> >>> > > >>>>> an
> > >> > >> >>> > > >>>>> empty
> > >> > >> >>> > > >>>> flight
> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> > support
> > >> > >> >>> > > >>>>> a
> > >> > >> >>> > > >>>>> container
> > >> > >> >>> > > >>>>> object
> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
> users
> > >> > >> >>> > > >>>>> can
> > >> > >> >>> > > >>>>> add
> > >> > >> >>> > > >>>>> more
> > >> > >> >>> > data
> > >> > >> >>> > > >>>>> as
> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for the
> Any
> > >> > >> >>> > > >>>>> messages.
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> > >> > >> >>> > > >>>>> <li...@gmail.com>
> > >> > >> >>> > > >>>>> wrote:
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>>> Hi Jacques,
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> Thanks for the comments.
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
> result
> > >> > >> >>> > > >>>>>> of
> > >> > >> prior
> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message
> to
> > >> > carry
> > >> > >> >>> > > >>>>>> that
> > >> > >> >>> > kind
> > >> > >> >>> > > >>>>>> of information.
> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> > handle
> > >> > >> >>> > > >>>>>> incoming
> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually
> > be
> > >> > >> >>> > > >>>>>> quite
> > >> > >> >>> > natural
> > >> > >> >>> > > >>>>>> to
> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
> not
> > >> > >> >>> > > >>>>>> possible
> > >> > >> >>> > > >>>>>> in
> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> > >> > essence,
> > >> > >> >>> > gRPC-Java
> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
> There
> > >> > >> >>> > > >>>>>> are
> > >> > >> >>> > experimental
> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> > >> > >> >>> > > >>>>>> Java,
> > >> > >> >>> > > >>>>>> but
> > >> > >> >>> > > >>>>>> those
> > >> > >> >>> > > >>>>>> are
> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> > still
> > >> > >> >>> > > >>>>>> under
> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
> which
> > >> > >> >>> > > >>>>>> have
> > >> > >> been
> > >> > >> >>> > around
> > >> > >> >>> > > >>>>>> for quite a while).
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> Thanks,
> > >> > >> >>> > > >>>>>> David
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> > >> > >> >>> > > >>>>>> wrote:
> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> > >> > >> >>> > > >>>>>>> worth
> > >> > >> >>> > > >>>>>>> discussion
> > >> > >> >>> > > >>>>>>> here
> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> > >> > >> >>> > > >>>> wrote:
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> provide a
> > >> > >> >>> > > >>>>>>>> Python
> > >> > >> >>> > strawman.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> > could
> > >> > >> >>> > > >>>>>>>> interleave
> > >> > >> >>> > > >>>> uploads
> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
> now,
> > >> > >> >>> > > >>>>>>>> synchronous
> > >> > >> >>> > APIs
> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> > >> > >> >>> > > >>>>>>>> server
> > >> > >> >>> > > >>>>>>>> wait
> > >> > >> >>> > > >>>>>>>> for
> > >> > >> >>> > each
> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> > >> > >> >>> > > >>>>>>>> doesn't
> > >> > >> >>> > > >>>>>>>> give
> > >> > >> >>> > > >>>>>>>> us
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> overall
> > >> > >> >>> > > >>>>>>>> timeout.)
> > >> > >> >>> > > >>>>>>>> As
> > >> > >> >>> > an
> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> > >> > >> >>> > > >>>>>>>>
> https://issues.apache.org/jira/browse/ARROW-6063
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> > >> > >> >>> > > >>>>>>>> will
> > >> > >> >>> > > >>>>>>>> want
> > >> > >> >>> > > >>>>>>>> to
> > >> > >> >>> > design
> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> > >> > bidirectional
> > >> > >> >>> > > >>>>>>>> stream
> > >> > >> >>> > > >>>>>>>> like
> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> > >> > >> >>> > > >>>>>>>> easier
> > >> > >> >>> > > >>>>>>>> to
> > >> > >> >>> > > >>>>>>>> run
> > >> > >> >>> > into.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> > >> > >> >>> > > >>>>>>>> proposal,
> > >> > but
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > main
> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> > >> > >> >>> > > >>>>>>>> separate
> > >> > >> >>> > > >>>>>>>> calls
> > >> > >> >>> > > >>>>>>>> could
> > >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> > >> > >> >>> > > >>>>>>>> gRPC
> > >> > >> >>> > > >>>>>>>> has
> > >> > >> >>> > > >>>>>>>> some
> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away
> in
> > >> > >> >>> > > >>>>>>>> between
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > two
> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> > >> > instance
> > >> > >> >>> > available,
> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> > server
> > >> > >> without
> > >> > >> >>> > > >>>>>>>> warning.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Thanks,
> > >> > >> >>> > > >>>>>>>> David
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> > wrote:
> > >> > >> >>> > > >>>>>>>>> Hey David,
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
> > like
> > >> > >> >>> > > >>>>>>>>> it
> > >> > >> >>> > > >>>>>>>>> and
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> possibility
> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> > >> > >> >>> > > >>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>> would
> > >> > >> >>> > > >>>>>>>>> help
> > >> > >> >>> > me
> > >> > >> >>> > > >>>>>> would
> > >> > >> >>> > > >>>>>>>> be
> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
> use
> > >> > >> >>> > > >>>>>>>>> case.
> > >> > >> >>> > > >>>>>>>>> Also,
> > >> > >> >>> > what
> > >> > >> >>> > > >>>>>> would
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> > >> > >> >>> > > >>>>>>>>> Would
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client
> > >> > >> >>> > > >>>>>>>>> block
> > >> > >> >>> > > >>>>>>>> till
> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> > >> > >> flight.vector_mult(a,
> > >> > >> >>> > > >>>>>>>>> b)
> > >> > >> >>> > or
> > >> > >> >>> > > >>>>>>>>> would
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> > >> > >> >>> > > >>>>>>>>> done.
> > >> > >> >>> > > >>>>>>>>> If
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> later
> > >> > >> >>> > > >>>>>>>>> how
> > >> > >> >>> > > >>>>>>>>> is
> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> suppose
> > >> > >> >>> > > >>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>> this
> > >> > >> >>> > could
> > >> > >> >>> > > >>>> be
> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> > >> > >> >>> > > >>>>>>>>> but
> > >> > >> rather
> > >> > >> >>> > > >>>>>>>>> by a
> > >> > >> >>> > > >>>>>>>>> function/util?
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Best,
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Ryan
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > >> > >> >>> > li.davidm96@gmail.com>
> > >> > >> >>> > > >>>>>> wrote:
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Hi all,
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> > >> > >> >>> > > >>>>>>>>>> far,
> > >> > but
> > >> > >> we
> > >> > >> >>> > > >>>>>>>>>> have
> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
> being
> > >> > >> >>> > > >>>>>>>>>> able
> > >> > >> >>> > > >>>>>>>>>> to
> > >> > >> >>> > > >>>>>>>>>> both
> > >> > >> >>> > > >>>>>>>>>> send
> > >> > >> >>> > > >>>>>>>>>> and
> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call.
> To
> > >> > >> >>> > > >>>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>>> end,
> > >> > >> >>> > I've
> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>
> > >> > >> >>> >
> > >> > >>
> > >> >
> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> comment
> > >> > >> >>> > > >>>>>>>>>> on
> > >> > the
> > >> > >> >>> > document.
> > >> > >> >>> > > >>>>>>>>>> I'd
> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> > >> > >> >>> > > >>>>>>>>>> relatively
> > >> > >> >>> > > >>>>>>>>>> straightforward
> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> > vote.
> > >> > I've
> > >> > >> >>> > > >>>>>>>>>> decided
> > >> > >> >>> > > >>>>>>>>>> to
> > >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed
> (on
> > >> > >> >>> > > >>>>>>>>>> DoPut),
> > >> > >> >>> > > >>>>>>>>>> as
> > >> > >> >>> > > >>>>>>>>>> it
> > >> > >> >>> > > >>>>>> doesn't
> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> > >> > >> >>> > > >>>>>>>>>> interpretation
> > >> > of
> > >> > >> >>> > > >>>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>>> semantics.
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Thanks,
> > >> > >> >>> > > >>>>>>>>>> David
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> --
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> > >> > >> join
> > >> > >> >>> > > >>>>>>>>> our
> > >> > >> >>> > > >>>>>>>>> community
> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> Download
> > >> > Dremio
> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >
> > >> > >> >>> >
> > >> > >> >>
> > >> > >> >
> > >> > >>
> > >> > >
> > >> >
> > >
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I support moving forward with the current proposal.

On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:

> Just following up here again, any other thoughts?
>
> I think we do have justifications for potentially separate streams in
> a call, but that's more of an orthogonal question - it doesn't need to
> be addressed here. I do agree that it very much complicates things.
>
> Thanks,
> David
>
> On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > I would generally agree with this. Note that you have the possibility
> > to use unions-of-structs to send record batches with different schemas
> > in the same stream, though with some added complexity on each side
> >
> > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>
> >> I'd vote for explicitly not supported. We should keep our primitives
> >> narrow.
> >>
> >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> >>
> >> > Thanks for the feedback.
> >> >
> >> > I do think if we had explicitly embraced gRPC from the beginning,
> >> > there are a lot of places where things could be made more ergonomic,
> >> > including with the metadata fields. But it would also have locked out
> >> > us of potential future transports.
> >> >
> >> > On another note: I hesitate to put too much into this method, but we
> >> > are looking at use cases where potentially, a client may want to
> >> > upload multiple distinct datasets (with differing schemas). (This is a
> >> > little tentative, and I can get more details...) Right now, each
> >> > logical stream in Flight must have a single, consistent schema; would
> >> > it make sense to look at ways to relax this, or declare this
> >> > explicitly out of scope (and require multiple calls and coordination
> >> > with the deployment topology) in order to accomplish this?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > > Fair enough. I'm okay with the bytes approach and the proposal looks
> >> > > good
> >> > > to me.
> >> > >
> >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I've updated the proposal.
> >> > >>
> >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> > >> errors/metadata, I still think using bytes is preferable:
> >> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> >> > types,
> >> > >> - We wouldn't be able to practically expose the Protobuf field to
> >> > >> C++
> >> > >> users without causing build pains,
> >> > >> - We can't let Python users take advantage of the Protobuf field
> >> > >> without somehow being compatible with the Protobuf wheels (by
> >> > >> linking
> >> > >> to the same version, and doing magic to turn the C++ Protobufs into
> >> > >> the Python ones),
> >> > >> - All our other application-defined fields are already bytes.
> >> > >>
> >> > >> Applications that want structure can encode JSON or Protobuf Any
> >> > >> into
> >> > >> the bytes field themselves, much as you can already do for Ticket,
> >> > >> commands in FlightDescriptors, and application metadata in
> >> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> > >> Any directly, since Any itself is a bytes field with a tag, and
> must
> >> > >> invoke the Protobuf deserializer again to read the actual message.
> >> > >>
> >> > >> If we decide on using bytes, then I don't think it makes sense to
> >> > >> define a new message with a oneof either, since it would be
> >> > >> redundant.
> >> > >>
> >> > >> Thanks,
> >> > >> David
> >> > >>
> >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > >> > I've been extremely backlogged, I will update the proposal when I
> >> > >> > get
> >> > >> > a chance and reply here when done.
> >> > >> >
> >> > >> > Best,
> >> > >> > David
> >> > >> >
> >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> > >> >> Bumping this discussion since a couple of weeks have passed. It
> >> > >> >> seems
> >> > >> >> there are still some questions here, could we summarize what are
> >> > >> >> the
> >> > >> >> alternatives along with any public API implications so we can
> try
> >> > >> >> to
> >> > >> >> render a decision?
> >> > >> >>
> >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.davidm96@gmail.com
> >
> >> > >> >> wrote:
> >> > >> >>>
> >> > >> >>> Hi Wes,
> >> > >> >>>
> >> > >> >>> Responses inline:
> >> > >> >>>
> >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > >> >>>
> >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> > >> >>> > <li...@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > >
> >> > >> >>> > > The question is whether to repurpose the existing
> FlightData
> >> > >> >>> > > structure, and allow for the metadata field to be filled in
> >> > >> >>> > > and
> >> > >> data
> >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> >> > >> FlightData
> >> > >> >>> > > structure in another structure that explicitly
> distinguishes
> >> > >> between
> >> > >> >>> > > control and data messages.
> >> > >> >>> >
> >> > >> >>> > I'm not super against having metadata-only FlightData with
> >> > >> >>> > empty
> >> > >> body.
> >> > >> >>> > One question to consider is what changes (if any) would need
> to
> >> > >> >>> > be
> >> > >> >>> > made to public APIs in either scenario.
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> > >> >>> messages
> >> > >> >>> in
> >> > >> >>> the future. This would be a breaking change, but wouldn't
> change
> >> > >> >>> the
> >> > >> >>> wire
> >> > >> >>> format. I think the APIs could be changed backwards compatibly,
> >> > >> >>> though.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > The other question is how to handle the metadata fields. So
> >> > >> >>> > > far,
> >> > >> >>> > > we've
> >> > >> >>> > > used bytestring fields for application-defined data. This
> is
> >> > >> >>> > > workable
> >> > >> >>> > > if you want to use Protobuf to define the contents of those
> >> > >> >>> > > fields,
> >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> > >> >>> > > bytestring
> >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> > >> >>> > > dynamically
> >> > >> >>> > > typed
> >> > >> >>> > > field, this would be more convenient, but then we'd be
> >> > >> >>> > > exposing
> >> > >> >>> > > Protobuf types. We could alternatively use a combination of
> >> > >> >>> > > a
> >> > >> >>> > > type
> >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> >> > >> >>> > > Any
> >> > >> >>> > > type
> >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> cleaner
> >> > >> >>> > > in
> >> > >> any
> >> > >> >>> > > of the language APIs, though.
> >> > >> >>> >
> >> > >> >>> > Leaving the deserialization of the app metadata to the
> >> > >> >>> > particular
> >> > >> >>> > Flight implementation seems on first principles like the most
> >> > >> flexible
> >> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be
> a
> >> > >> >>> > protobuf?
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> >> > would
> >> > >> >>> have
> >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then
> the
> >> > >> >>> question
> >> > >> >>> would just be how to expose this (would be easier in Java,
> harder
> >> > >> >>> in
> >> > >> >>> C++).
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > David
> >> > >> >>> > >
> >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> > >> >>> > > >
> >> > >> >>> > > > Can one of you explain what is being proposed in
> >> > >> >>> > > > non-protobuf
> >> > >> >>> > > > terms?
> >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> Flight.
> >> > >> >>> > > >
> >> > >> >>> > > > Regards
> >> > >> >>> > > >
> >> > >> >>> > > > Antoine.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> would
> >> > just
> >> > >> be
> >> > >> >>> > > >> application-level logic. (The official guide doesn't
> even
> >> > >> mention
> >> > >> >>> > > >> it
> >> > >> >>> > > >> in the encoding docs; I found
> >> > >> >>> > > >>
> >> > >> >>> >
> >> > >>
> >> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> > >> >>> > > >> as well.)
> >> > >> >>> > > >>
> >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> > >> >>> > > >> essentially
> >> > >> >>> > > >> inlining
> >> > >> >>> > > >> the definition of Any, e.g.
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightMessage {
> >> > >> >>> > > >>   oneof message {
> >> > >> >>> > > >>     FlightData data = 1;
> >> > >> >>> > > >>     FlightAny metadata = 2;
> >> > >> >>> > > >>   }
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightAny {
> >> > >> >>> > > >>   string type = 1;
> >> > >> >>> > > >>   bytes data = 2;
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> Is this correct?
> >> > >> >>> > > >>
> >> > >> >>> > > >> It might be nice to consider the wrapper message for
> >> > >> >>> > > >> DoGet/DoPut
> >> > >> >>> > > >> as
> >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
> with
> >> > >> >>> > > >> all
> >> > >> >>> > > >> of
> >> > >> >>> > > >> them,
> >> > >> >>> > > >> rather than have one of the three methods do its own
> >> > >> >>> > > >> thing.
> >> > >> >>> > > >>
> >> > >> >>> > > >> Thanks,
> >> > >> >>> > > >> David
> >> > >> >>> > > >>
> >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> >> > >> >>> > > >>> without
> >> > >> >>> > > >>> exposing
> >> > >> >>> > the
> >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> > >> >>> > > >>> expose
> >> > >> >>> > > >>> as
> >> > >> >>> > > >>> two
> >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >> > >> >>> > > >>> whatever
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> people
> >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
> >> > >> >>> > > >>> user
> >> > >> >>> > > >>> could
> >> > >> >>> > > >>> use
> >> > >> >>> > > >>> any
> >> > >> >>> > > >>> with it easily but they could also use any other
> >> > >> >>> > > >>> mechanism).
> >> > >> >>> > > >>> At
> >> > >> >>> > least in
> >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> > >> language
> >> > >> >>> > > >>> bindings
> >> > >> >>> > > >>> less diy?
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >> > >> >>> > > >>> metadata
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> it
> >> > >> >>> > just
> >> > >> >>> > > >>> seemed a bit janky.
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> Thinking about the control message/wrapper object
> thing,
> >> > >> >>> > > >>> I
> >> > >> >>> > > >>> wonder
> >> > >> >>> > > >>> if
> >> > >> >>> > we
> >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> property
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> we
> >> > >> >>> > think it
> >> > >> >>> > > >>> is
> >> > >> >>> > > >>> a good idea...
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> > >> li.davidm96@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > > >>>
> >> > >> >>> > > >>>> I was definitely considering having control messages
> >> > without
> >> > >> >>> > > >>>> data,
> >> > >> >>> > and
> >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> >> > >> >>> > > >>>> only
> >> > >> >>> > app_metadata
> >> > >> >>> > > >>>> set. I think I understand your position now:
> FlightData
> >> > >> >>> > > >>>> should
> >> > >> >>> > always
> >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> > >> >>> > > >>>> documentation
> >> > >> >>> > > >>>> on
> >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> >> > >> >>> > > >>>> having
> >> > >> >>> > > >>>> a
> >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> > >> >>> > > >>>> happening,
> >> > >> >>> > and
> >> > >> >>> > > >>>> overall having a clear separation between data and
> >> > >> >>> > > >>>> control
> >> > >> >>> > > >>>> messages
> >> > >> >>> > is
> >> > >> >>> > > >>>> cleaner.
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> >> > >> >>> > > >>>> from
> >> > >> >>> > > >>>> exposing
> >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
> >> > >> >>> > > >>>> now?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> Best,
> >> > >> >>> > > >>>> David
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> wrote:
> >> > >> >>> > > >>>>> Hey David,
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> doget/doput
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> async
> >> > >> >>> > > >>>>> always
> >> > >> >>> > pattern.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> > >> >>> > > >>>>> overloading
> >> > >> >>> > > >>>>> the
> >> > >> >>> > > >>>> message
> >> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> > >> >>> > > >>>>> message
> >> > >> >>> > > >>>> independently
> >> > >> >>> > > >>>>> of the data message, I would have to define something
> >> > >> >>> > > >>>>> like
> >> > >> >>> > > >>>>> an
> >> > >> >>> > > >>>>> empty
> >> > >> >>> > > >>>> flight
> >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> support
> >> > >> >>> > > >>>>> a
> >> > >> >>> > > >>>>> container
> >> > >> >>> > > >>>>> object
> >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
> >> > >> >>> > > >>>>> can
> >> > >> >>> > > >>>>> add
> >> > >> >>> > > >>>>> more
> >> > >> >>> > data
> >> > >> >>> > > >>>>> as
> >> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> > >> >>> > > >>>>> messages.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> > >> >>> > > >>>>> <li...@gmail.com>
> >> > >> >>> > > >>>>> wrote:
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>>> Hi Jacques,
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
> >> > >> >>> > > >>>>>> of
> >> > >> prior
> >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> >> > carry
> >> > >> >>> > > >>>>>> that
> >> > >> >>> > kind
> >> > >> >>> > > >>>>>> of information.
> >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> handle
> >> > >> >>> > > >>>>>> incoming
> >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually
> be
> >> > >> >>> > > >>>>>> quite
> >> > >> >>> > natural
> >> > >> >>> > > >>>>>> to
> >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> > >> >>> > > >>>>>> possible
> >> > >> >>> > > >>>>>> in
> >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> >> > essence,
> >> > >> >>> > gRPC-Java
> >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > experimental
> >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> >> > >> >>> > > >>>>>> Java,
> >> > >> >>> > > >>>>>> but
> >> > >> >>> > > >>>>>> those
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> still
> >> > >> >>> > > >>>>>> under
> >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
> >> > >> >>> > > >>>>>> have
> >> > >> been
> >> > >> >>> > around
> >> > >> >>> > > >>>>>> for quite a while).
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks,
> >> > >> >>> > > >>>>>> David
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> >> > >> >>> > > >>>>>>> worth
> >> > >> >>> > > >>>>>>> discussion
> >> > >> >>> > > >>>>>>> here
> >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> > >> >>> > > >>>> wrote:
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> > >> >>> > > >>>>>>>> Python
> >> > >> >>> > strawman.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> could
> >> > >> >>> > > >>>>>>>> interleave
> >> > >> >>> > > >>>> uploads
> >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> > >> >>> > > >>>>>>>> synchronous
> >> > >> >>> > APIs
> >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> >> > >> >>> > > >>>>>>>> server
> >> > >> >>> > > >>>>>>>> wait
> >> > >> >>> > > >>>>>>>> for
> >> > >> >>> > each
> >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >> > >> >>> > > >>>>>>>> doesn't
> >> > >> >>> > > >>>>>>>> give
> >> > >> >>> > > >>>>>>>> us
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> > >> >>> > > >>>>>>>> timeout.)
> >> > >> >>> > > >>>>>>>> As
> >> > >> >>> > an
> >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> >> > >> >>> > > >>>>>>>> will
> >> > >> >>> > > >>>>>>>> want
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > design
> >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> > bidirectional
> >> > >> >>> > > >>>>>>>> stream
> >> > >> >>> > > >>>>>>>> like
> >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> >> > >> >>> > > >>>>>>>> easier
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > > >>>>>>>> run
> >> > >> >>> > into.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> > >> >>> > > >>>>>>>> proposal,
> >> > but
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > main
> >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> > >> >>> > > >>>>>>>> separate
> >> > >> >>> > > >>>>>>>> calls
> >> > >> >>> > > >>>>>>>> could
> >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> >> > >> >>> > > >>>>>>>> gRPC
> >> > >> >>> > > >>>>>>>> has
> >> > >> >>> > > >>>>>>>> some
> >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> > >> >>> > > >>>>>>>> between
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > two
> >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> >> > instance
> >> > >> >>> > available,
> >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> server
> >> > >> without
> >> > >> >>> > > >>>>>>>> warning.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>> David
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> wrote:
> >> > >> >>> > > >>>>>>>>> Hey David,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
> like
> >> > >> >>> > > >>>>>>>>> it
> >> > >> >>> > > >>>>>>>>> and
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> possibility
> >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>>> help
> >> > >> >>> > me
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> be
> >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
> >> > >> >>> > > >>>>>>>>> case.
> >> > >> >>> > > >>>>>>>>> Also,
> >> > >> >>> > what
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> >> > >> >>> > > >>>>>>>>> Would
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client
> >> > >> >>> > > >>>>>>>>> block
> >> > >> >>> > > >>>>>>>> till
> >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> > >> flight.vector_mult(a,
> >> > >> >>> > > >>>>>>>>> b)
> >> > >> >>> > or
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> >> > >> >>> > > >>>>>>>>> done.
> >> > >> >>> > > >>>>>>>>> If
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> later
> >> > >> >>> > > >>>>>>>>> how
> >> > >> >>> > > >>>>>>>>> is
> >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> this
> >> > >> >>> > could
> >> > >> >>> > > >>>> be
> >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> >> > >> >>> > > >>>>>>>>> but
> >> > >> rather
> >> > >> >>> > > >>>>>>>>> by a
> >> > >> >>> > > >>>>>>>>> function/util?
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Best,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> > >> >>> > li.davidm96@gmail.com>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Hi all,
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> >> > >> >>> > > >>>>>>>>>> far,
> >> > but
> >> > >> we
> >> > >> >>> > > >>>>>>>>>> have
> >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
> >> > >> >>> > > >>>>>>>>>> able
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> both
> >> > >> >>> > > >>>>>>>>>> send
> >> > >> >>> > > >>>>>>>>>> and
> >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
> >> > >> >>> > > >>>>>>>>>> that
> >> > >> >>> > > >>>>>>>>>> end,
> >> > >> >>> > I've
> >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> >
> >> > >>
> >> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
> >> > >> >>> > > >>>>>>>>>> on
> >> > the
> >> > >> >>> > document.
> >> > >> >>> > > >>>>>>>>>> I'd
> >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >> > >> >>> > > >>>>>>>>>> relatively
> >> > >> >>> > > >>>>>>>>>> straightforward
> >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> vote.
> >> > I've
> >> > >> >>> > > >>>>>>>>>> decided
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> > >> >>> > > >>>>>>>>>> DoPut),
> >> > >> >>> > > >>>>>>>>>> as
> >> > >> >>> > > >>>>>>>>>> it
> >> > >> >>> > > >>>>>> doesn't
> >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> > >> >>> > > >>>>>>>>>> interpretation
> >> > of
> >> > >> >>> > > >>>>>>>>>> the
> >> > >> >>> > > >>>>>>>>>> semantics.
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>>>> David
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> --
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> > >> join
> >> > >> >>> > > >>>>>>>>> our
> >> > >> >>> > > >>>>>>>>> community
> >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> >> > Dremio
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>
> >> > >> >>> > > >
> >> > >> >>> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Just following up here again, any other thoughts?

I think we do have justifications for potentially separate streams in
a call, but that's more of an orthogonal question - it doesn't need to
be addressed here. I do agree that it very much complicates things.

Thanks,
David

On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> I would generally agree with this. Note that you have the possibility
> to use unions-of-structs to send record batches with different schemas
> in the same stream, though with some added complexity on each side
>
> On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
>>
>> I'd vote for explicitly not supported. We should keep our primitives
>> narrow.
>>
>> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
>>
>> > Thanks for the feedback.
>> >
>> > I do think if we had explicitly embraced gRPC from the beginning,
>> > there are a lot of places where things could be made more ergonomic,
>> > including with the metadata fields. But it would also have locked out
>> > us of potential future transports.
>> >
>> > On another note: I hesitate to put too much into this method, but we
>> > are looking at use cases where potentially, a client may want to
>> > upload multiple distinct datasets (with differing schemas). (This is a
>> > little tentative, and I can get more details...) Right now, each
>> > logical stream in Flight must have a single, consistent schema; would
>> > it make sense to look at ways to relax this, or declare this
>> > explicitly out of scope (and require multiple calls and coordination
>> > with the deployment topology) in order to accomplish this?
>> >
>> > Best,
>> > David
>> >
>> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > Fair enough. I'm okay with the bytes approach and the proposal looks
>> > > good
>> > > to me.
>> > >
>> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
>> > > wrote:
>> > >
>> > >> I've updated the proposal.
>> > >>
>> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> > >> errors/metadata, I still think using bytes is preferable:
>> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
>> > types,
>> > >> - We wouldn't be able to practically expose the Protobuf field to
>> > >> C++
>> > >> users without causing build pains,
>> > >> - We can't let Python users take advantage of the Protobuf field
>> > >> without somehow being compatible with the Protobuf wheels (by
>> > >> linking
>> > >> to the same version, and doing magic to turn the C++ Protobufs into
>> > >> the Python ones),
>> > >> - All our other application-defined fields are already bytes.
>> > >>
>> > >> Applications that want structure can encode JSON or Protobuf Any
>> > >> into
>> > >> the bytes field themselves, much as you can already do for Ticket,
>> > >> commands in FlightDescriptors, and application metadata in
>> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
>> > >> Any directly, since Any itself is a bytes field with a tag, and must
>> > >> invoke the Protobuf deserializer again to read the actual message.
>> > >>
>> > >> If we decide on using bytes, then I don't think it makes sense to
>> > >> define a new message with a oneof either, since it would be
>> > >> redundant.
>> > >>
>> > >> Thanks,
>> > >> David
>> > >>
>> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > >> > I've been extremely backlogged, I will update the proposal when I
>> > >> > get
>> > >> > a chance and reply here when done.
>> > >> >
>> > >> > Best,
>> > >> > David
>> > >> >
>> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> > >> >> Bumping this discussion since a couple of weeks have passed. It
>> > >> >> seems
>> > >> >> there are still some questions here, could we summarize what are
>> > >> >> the
>> > >> >> alternatives along with any public API implications so we can try
>> > >> >> to
>> > >> >> render a decision?
>> > >> >>
>> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
>> > >> >> wrote:
>> > >> >>>
>> > >> >>> Hi Wes,
>> > >> >>>
>> > >> >>> Responses inline:
>> > >> >>>
>> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >> >>>
>> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> > >> >>> > <li...@gmail.com>
>> > >> >>> > wrote:
>> > >> >>> > >
>> > >> >>> > > The question is whether to repurpose the existing FlightData
>> > >> >>> > > structure, and allow for the metadata field to be filled in
>> > >> >>> > > and
>> > >> data
>> > >> >>> > > fields to be blank (as a control message), or to wrap the
>> > >> FlightData
>> > >> >>> > > structure in another structure that explicitly distinguishes
>> > >> between
>> > >> >>> > > control and data messages.
>> > >> >>> >
>> > >> >>> > I'm not super against having metadata-only FlightData with
>> > >> >>> > empty
>> > >> body.
>> > >> >>> > One question to consider is what changes (if any) would need to
>> > >> >>> > be
>> > >> >>> > made to public APIs in either scenario.
>> > >> >>> >
>> > >> >>>
>> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
>> > >> >>> messages
>> > >> >>> in
>> > >> >>> the future. This would be a breaking change, but wouldn't change
>> > >> >>> the
>> > >> >>> wire
>> > >> >>> format. I think the APIs could be changed backwards compatibly,
>> > >> >>> though.
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> > > The other question is how to handle the metadata fields. So
>> > >> >>> > > far,
>> > >> >>> > > we've
>> > >> >>> > > used bytestring fields for application-defined data. This is
>> > >> >>> > > workable
>> > >> >>> > > if you want to use Protobuf to define the contents of those
>> > >> >>> > > fields,
>> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
>> > >> >>> > > bytestring
>> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> > >> >>> > > dynamically
>> > >> >>> > > typed
>> > >> >>> > > field, this would be more convenient, but then we'd be
>> > >> >>> > > exposing
>> > >> >>> > > Protobuf types. We could alternatively use a combination of
>> > >> >>> > > a
>> > >> >>> > > type
>> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
>> > >> >>> > > Any
>> > >> >>> > > type
>> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner
>> > >> >>> > > in
>> > >> any
>> > >> >>> > > of the language APIs, though.
>> > >> >>> >
>> > >> >>> > Leaving the deserialization of the app metadata to the
>> > >> >>> > particular
>> > >> >>> > Flight implementation seems on first principles like the most
>> > >> flexible
>> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
>> > >> >>> > protobuf?
>> > >> >>> >
>> > >> >>>
>> > >> >>>
>> > >> >>> If Any is used, we could still expose a bytes-based API, but it
>> > would
>> > >> >>> have
>> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
>> > >> >>> question
>> > >> >>> would just be how to expose this (would be easier in Java, harder
>> > >> >>> in
>> > >> >>> C++).
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> > > David
>> > >> >>> > >
>> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > >> >>> > > >
>> > >> >>> > > > Can one of you explain what is being proposed in
>> > >> >>> > > > non-protobuf
>> > >> >>> > > > terms?
>> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> > >> >>> > > >
>> > >> >>> > > > Regards
>> > >> >>> > > >
>> > >> >>> > > > Antoine.
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
>> > just
>> > >> be
>> > >> >>> > > >> application-level logic. (The official guide doesn't even
>> > >> mention
>> > >> >>> > > >> it
>> > >> >>> > > >> in the encoding docs; I found
>> > >> >>> > > >>
>> > >> >>> >
>> > >>
>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > >> >>> > > >> as well.)
>> > >> >>> > > >>
>> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> > >> >>> > > >> essentially
>> > >> >>> > > >> inlining
>> > >> >>> > > >> the definition of Any, e.g.
>> > >> >>> > > >>
>> > >> >>> > > >> message FlightMessage {
>> > >> >>> > > >>   oneof message {
>> > >> >>> > > >>     FlightData data = 1;
>> > >> >>> > > >>     FlightAny metadata = 2;
>> > >> >>> > > >>   }
>> > >> >>> > > >> }
>> > >> >>> > > >>
>> > >> >>> > > >> message FlightAny {
>> > >> >>> > > >>   string type = 1;
>> > >> >>> > > >>   bytes data = 2;
>> > >> >>> > > >> }
>> > >> >>> > > >>
>> > >> >>> > > >> Is this correct?
>> > >> >>> > > >>
>> > >> >>> > > >> It might be nice to consider the wrapper message for
>> > >> >>> > > >> DoGet/DoPut
>> > >> >>> > > >> as
>> > >> >>> > > >> well, but at that point, I'd rather we be consistent with
>> > >> >>> > > >> all
>> > >> >>> > > >> of
>> > >> >>> > > >> them,
>> > >> >>> > > >> rather than have one of the three methods do its own
>> > >> >>> > > >> thing.
>> > >> >>> > > >>
>> > >> >>> > > >> Thanks,
>> > >> >>> > > >> David
>> > >> >>> > > >>
>> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> >>> > > >>> I think we could probably expose the oneof behavior
>> > >> >>> > > >>> without
>> > >> >>> > > >>> exposing
>> > >> >>> > the
>> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
>> > >> >>> > > >>> expose
>> > >> >>> > > >>> as
>> > >> >>> > > >>> two
>> > >> >>> > > >>> fields: type and data. Then users could use it for
>> > >> >>> > > >>> whatever
>> > >> >>> > > >>> but
>> > >> >>> > > >>> if
>> > >> >>> > > >>> people
>> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
>> > >> >>> > > >>> user
>> > >> >>> > > >>> could
>> > >> >>> > > >>> use
>> > >> >>> > > >>> any
>> > >> >>> > > >>> with it easily but they could also use any other
>> > >> >>> > > >>> mechanism).
>> > >> >>> > > >>> At
>> > >> >>> > least in
>> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
>> > >> language
>> > >> >>> > > >>> bindings
>> > >> >>> > > >>> less diy?
>> > >> >>> > > >>>
>> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>> > >> >>> > > >>> metadata
>> > >> >>> > > >>> but
>> > >> >>> > > >>> it
>> > >> >>> > just
>> > >> >>> > > >>> seemed a bit janky.
>> > >> >>> > > >>>
>> > >> >>> > > >>> Thinking about the control message/wrapper object thing,
>> > >> >>> > > >>> I
>> > >> >>> > > >>> wonder
>> > >> >>> > > >>> if
>> > >> >>> > we
>> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property
>> > >> >>> > > >>> if
>> > >> >>> > > >>> we
>> > >> >>> > think it
>> > >> >>> > > >>> is
>> > >> >>> > > >>> a good idea...
>> > >> >>> > > >>>
>> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> > >> li.davidm96@gmail.com>
>> > >> >>> > wrote:
>> > >> >>> > > >>>
>> > >> >>> > > >>>> I was definitely considering having control messages
>> > without
>> > >> >>> > > >>>> data,
>> > >> >>> > and
>> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
>> > >> >>> > > >>>> only
>> > >> >>> > app_metadata
>> > >> >>> > > >>>> set. I think I understand your position now: FlightData
>> > >> >>> > > >>>> should
>> > >> >>> > always
>> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> That makes sense to me, and is consistent with the
>> > >> >>> > > >>>> documentation
>> > >> >>> > > >>>> on
>> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
>> > >> >>> > > >>>> having
>> > >> >>> > > >>>> a
>> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
>> > >> >>> > > >>>> happening,
>> > >> >>> > and
>> > >> >>> > > >>>> overall having a clear separation between data and
>> > >> >>> > > >>>> control
>> > >> >>> > > >>>> messages
>> > >> >>> > is
>> > >> >>> > > >>>> cleaner.
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
>> > >> >>> > > >>>> from
>> > >> >>> > > >>>> exposing
>> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
>> > >> >>> > > >>>> now?
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> Best,
>> > >> >>> > > >>>> David
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> >>> > > >>>>> Hey David,
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
>> > >> >>> > > >>>>> for
>> > >> >>> > > >>>>> doget/doput
>> > >> >>> > > >>>>> for
>> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
>> > >> >>> > > >>>>> always
>> > >> >>> > pattern.
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
>> > >> >>> > > >>>>> overloading
>> > >> >>> > > >>>>> the
>> > >> >>> > > >>>> message
>> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
>> > >> >>> > > >>>>> message
>> > >> >>> > > >>>> independently
>> > >> >>> > > >>>>> of the data message, I would have to define something
>> > >> >>> > > >>>>> like
>> > >> >>> > > >>>>> an
>> > >> >>> > > >>>>> empty
>> > >> >>> > > >>>> flight
>> > >> >>> > > >>>>> data message that has custom metadata. Why not support
>> > >> >>> > > >>>>> a
>> > >> >>> > > >>>>> container
>> > >> >>> > > >>>>> object
>> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
>> > >> >>> > > >>>>> can
>> > >> >>> > > >>>>> add
>> > >> >>> > > >>>>> more
>> > >> >>> > data
>> > >> >>> > > >>>>> as
>> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
>> > >> >>> > > >>>>> messages.
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > >> >>> > > >>>>> <li...@gmail.com>
>> > >> >>> > > >>>>> wrote:
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>>> Hi Jacques,
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> Thanks for the comments.
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
>> > >> >>> > > >>>>>> of
>> > >> prior
>> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
>> > carry
>> > >> >>> > > >>>>>> that
>> > >> >>> > kind
>> > >> >>> > > >>>>>> of information.
>> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
>> > >> >>> > > >>>>>> incoming
>> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
>> > >> >>> > > >>>>>> quite
>> > >> >>> > natural
>> > >> >>> > > >>>>>> to
>> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
>> > >> >>> > > >>>>>> possible
>> > >> >>> > > >>>>>> in
>> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
>> > essence,
>> > >> >>> > gRPC-Java
>> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
>> > >> >>> > > >>>>>> are
>> > >> >>> > experimental
>> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
>> > >> >>> > > >>>>>> Java,
>> > >> >>> > > >>>>>> but
>> > >> >>> > > >>>>>> those
>> > >> >>> > > >>>>>> are
>> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
>> > >> >>> > > >>>>>> under
>> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
>> > >> >>> > > >>>>>> have
>> > >> been
>> > >> >>> > around
>> > >> >>> > > >>>>>> for quite a while).
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> Thanks,
>> > >> >>> > > >>>>>> David
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
>> > >> >>> > > >>>>>> wrote:
>> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
>> > >> >>> > > >>>>>>> worth
>> > >> >>> > > >>>>>>> discussion
>> > >> >>> > > >>>>>>> here
>> > >> >>> > > >>>>>>> depending on your thoughts.
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > >> >>> > > >>>>>>> <li...@gmail.com>
>> > >> >>> > > >>>> wrote:
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>>> Hey Ryan,
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Thanks for the comments.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
>> > >> >>> > > >>>>>>>> Python
>> > >> >>> > strawman.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> > >> >>> > > >>>>>>>> interleave
>> > >> >>> > > >>>> uploads
>> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> > >> >>> > > >>>>>>>> synchronous
>> > >> >>> > APIs
>> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
>> > >> >>> > > >>>>>>>> server
>> > >> >>> > > >>>>>>>> wait
>> > >> >>> > > >>>>>>>> for
>> > >> >>> > each
>> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>> > >> >>> > > >>>>>>>> doesn't
>> > >> >>> > > >>>>>>>> give
>> > >> >>> > > >>>>>>>> us
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>> > >> >>> > > >>>>>>>> timeout.)
>> > >> >>> > > >>>>>>>> As
>> > >> >>> > an
>> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> > >> >>> > > >>>>>>>> ARROW-6063:
>> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
>> > >> >>> > > >>>>>>>> will
>> > >> >>> > > >>>>>>>> want
>> > >> >>> > > >>>>>>>> to
>> > >> >>> > design
>> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> > bidirectional
>> > >> >>> > > >>>>>>>> stream
>> > >> >>> > > >>>>>>>> like
>> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
>> > >> >>> > > >>>>>>>> easier
>> > >> >>> > > >>>>>>>> to
>> > >> >>> > > >>>>>>>> run
>> > >> >>> > into.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> > >> >>> > > >>>>>>>> proposal,
>> > but
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > main
>> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
>> > >> >>> > > >>>>>>>> separate
>> > >> >>> > > >>>>>>>> calls
>> > >> >>> > > >>>>>>>> could
>> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
>> > >> >>> > > >>>>>>>> gRPC
>> > >> >>> > > >>>>>>>> has
>> > >> >>> > > >>>>>>>> some
>> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
>> > >> >>> > > >>>>>>>> between
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > two
>> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
>> > instance
>> > >> >>> > available,
>> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
>> > >> without
>> > >> >>> > > >>>>>>>> warning.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Thanks,
>> > >> >>> > > >>>>>>>> David
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> > >> >>> > > >>>>>>>>> Hey David,
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like
>> > >> >>> > > >>>>>>>>> it
>> > >> >>> > > >>>>>>>>> and
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> possibility
>> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
>> > >> >>> > > >>>>>>>>> that
>> > >> >>> > > >>>>>>>>> would
>> > >> >>> > > >>>>>>>>> help
>> > >> >>> > me
>> > >> >>> > > >>>>>> would
>> > >> >>> > > >>>>>>>> be
>> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
>> > >> >>> > > >>>>>>>>> case.
>> > >> >>> > > >>>>>>>>> Also,
>> > >> >>> > what
>> > >> >>> > > >>>>>> would
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
>> > >> >>> > > >>>>>>>>> Would
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> client
>> > >> >>> > > >>>>>>>>> block
>> > >> >>> > > >>>>>>>> till
>> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> > >> flight.vector_mult(a,
>> > >> >>> > > >>>>>>>>> b)
>> > >> >>> > or
>> > >> >>> > > >>>>>>>>> would
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
>> > >> >>> > > >>>>>>>>> done.
>> > >> >>> > > >>>>>>>>> If
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> later
>> > >> >>> > > >>>>>>>>> how
>> > >> >>> > > >>>>>>>>> is
>> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
>> > >> >>> > > >>>>>>>>> that
>> > >> >>> > > >>>>>>>>> this
>> > >> >>> > could
>> > >> >>> > > >>>> be
>> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
>> > >> >>> > > >>>>>>>>> but
>> > >> rather
>> > >> >>> > > >>>>>>>>> by a
>> > >> >>> > > >>>>>>>>> function/util?
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Best,
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Ryan
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > >> >>> > li.davidm96@gmail.com>
>> > >> >>> > > >>>>>> wrote:
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Hi all,
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
>> > >> >>> > > >>>>>>>>>> far,
>> > but
>> > >> we
>> > >> >>> > > >>>>>>>>>> have
>> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
>> > >> >>> > > >>>>>>>>>> able
>> > >> >>> > > >>>>>>>>>> to
>> > >> >>> > > >>>>>>>>>> both
>> > >> >>> > > >>>>>>>>>> send
>> > >> >>> > > >>>>>>>>>> and
>> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
>> > >> >>> > > >>>>>>>>>> that
>> > >> >>> > > >>>>>>>>>> end,
>> > >> >>> > I've
>> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>
>> > >> >>> >
>> > >>
>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
>> > >> >>> > > >>>>>>>>>> on
>> > the
>> > >> >>> > document.
>> > >> >>> > > >>>>>>>>>> I'd
>> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>> > >> >>> > > >>>>>>>>>> relatively
>> > >> >>> > > >>>>>>>>>> straightforward
>> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
>> > I've
>> > >> >>> > > >>>>>>>>>> decided
>> > >> >>> > > >>>>>>>>>> to
>> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
>> > >> >>> > > >>>>>>>>>> DoPut),
>> > >> >>> > > >>>>>>>>>> as
>> > >> >>> > > >>>>>>>>>> it
>> > >> >>> > > >>>>>> doesn't
>> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> > >> >>> > > >>>>>>>>>> interpretation
>> > of
>> > >> >>> > > >>>>>>>>>> the
>> > >> >>> > > >>>>>>>>>> semantics.
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Thanks,
>> > >> >>> > > >>>>>>>>>> David
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> --
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> > >> >>> > > >>>>>>>>> Check out our GitHub
>> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> > >> join
>> > >> >>> > > >>>>>>>>> our
>> > >> >>> > > >>>>>>>>> community
>> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
>> > Dremio
>> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>
>> > >> >>> > > >>>
>> > >> >>> > > >
>> > >> >>> >
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
I would generally agree with this. Note that you have the possibility
to use unions-of-structs to send record batches with different schemas
in the same stream, though with some added complexity on each side

On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
>
> I'd vote for explicitly not supported. We should keep our primitives narrow.
>
> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
>
> > Thanks for the feedback.
> >
> > I do think if we had explicitly embraced gRPC from the beginning,
> > there are a lot of places where things could be made more ergonomic,
> > including with the metadata fields. But it would also have locked out
> > us of potential future transports.
> >
> > On another note: I hesitate to put too much into this method, but we
> > are looking at use cases where potentially, a client may want to
> > upload multiple distinct datasets (with differing schemas). (This is a
> > little tentative, and I can get more details...) Right now, each
> > logical stream in Flight must have a single, consistent schema; would
> > it make sense to look at ways to relax this, or declare this
> > explicitly out of scope (and require multiple calls and coordination
> > with the deployment topology) in order to accomplish this?
> >
> > Best,
> > David
> >
> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > Fair enough. I'm okay with the bytes approach and the proposal looks good
> > > to me.
> > >
> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
> > >
> > >> I've updated the proposal.
> > >>
> > >> On the subject of Protobuf Any vs bytes, and how to handle
> > >> errors/metadata, I still think using bytes is preferable:
> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> > types,
> > >> - We wouldn't be able to practically expose the Protobuf field to C++
> > >> users without causing build pains,
> > >> - We can't let Python users take advantage of the Protobuf field
> > >> without somehow being compatible with the Protobuf wheels (by linking
> > >> to the same version, and doing magic to turn the C++ Protobufs into
> > >> the Python ones),
> > >> - All our other application-defined fields are already bytes.
> > >>
> > >> Applications that want structure can encode JSON or Protobuf Any into
> > >> the bytes field themselves, much as you can already do for Ticket,
> > >> commands in FlightDescriptors, and application metadata in
> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> > >> Any directly, since Any itself is a bytes field with a tag, and must
> > >> invoke the Protobuf deserializer again to read the actual message.
> > >>
> > >> If we decide on using bytes, then I don't think it makes sense to
> > >> define a new message with a oneof either, since it would be redundant.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> > >> > I've been extremely backlogged, I will update the proposal when I get
> > >> > a chance and reply here when done.
> > >> >
> > >> > Best,
> > >> > David
> > >> >
> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> > >> >> Bumping this discussion since a couple of weeks have passed. It seems
> > >> >> there are still some questions here, could we summarize what are the
> > >> >> alternatives along with any public API implications so we can try to
> > >> >> render a decision?
> > >> >>
> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> Hi Wes,
> > >> >>>
> > >> >>> Responses inline:
> > >> >>>
> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> > wrote:
> > >> >>>
> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> > >> >>> > wrote:
> > >> >>> > >
> > >> >>> > > The question is whether to repurpose the existing FlightData
> > >> >>> > > structure, and allow for the metadata field to be filled in and
> > >> data
> > >> >>> > > fields to be blank (as a control message), or to wrap the
> > >> FlightData
> > >> >>> > > structure in another structure that explicitly distinguishes
> > >> between
> > >> >>> > > control and data messages.
> > >> >>> >
> > >> >>> > I'm not super against having metadata-only FlightData with empty
> > >> body.
> > >> >>> > One question to consider is what changes (if any) would need to be
> > >> >>> > made to public APIs in either scenario.
> > >> >>> >
> > >> >>>
> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> > >> >>> messages
> > >> >>> in
> > >> >>> the future. This would be a breaking change, but wouldn't change the
> > >> >>> wire
> > >> >>> format. I think the APIs could be changed backwards compatibly,
> > >> >>> though.
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> > > The other question is how to handle the metadata fields. So far,
> > >> >>> > > we've
> > >> >>> > > used bytestring fields for application-defined data. This is
> > >> >>> > > workable
> > >> >>> > > if you want to use Protobuf to define the contents of those
> > >> >>> > > fields,
> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> > >> >>> > > bytestring
> > >> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> > >> >>> > > typed
> > >> >>> > > field, this would be more convenient, but then we'd be exposing
> > >> >>> > > Protobuf types. We could alternatively use a combination of a
> > >> >>> > > type
> > >> >>> > > field and a bytestring field, mimicking what the Protobuf Any
> > >> >>> > > type
> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> > >> any
> > >> >>> > > of the language APIs, though.
> > >> >>> >
> > >> >>> > Leaving the deserialization of the app metadata to the particular
> > >> >>> > Flight implementation seems on first principles like the most
> > >> flexible
> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> > >> >>> > protobuf?
> > >> >>> >
> > >> >>>
> > >> >>>
> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> > would
> > >> >>> have
> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> > >> >>> question
> > >> >>> would just be how to expose this (would be easier in Java, harder in
> > >> >>> C++).
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> > > David
> > >> >>> > >
> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > >> >>> > > >
> > >> >>> > > > Can one of you explain what is being proposed in non-protobuf
> > >> >>> > > > terms?
> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> > >> >>> > > >
> > >> >>> > > > Regards
> > >> >>> > > >
> > >> >>> > > > Antoine.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> > just
> > >> be
> > >> >>> > > >> application-level logic. (The official guide doesn't even
> > >> mention
> > >> >>> > > >> it
> > >> >>> > > >> in the encoding docs; I found
> > >> >>> > > >>
> > >> >>> >
> > >>
> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> >>> > > >> as well.)
> > >> >>> > > >>
> > >> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> > >> >>> > > >> inlining
> > >> >>> > > >> the definition of Any, e.g.
> > >> >>> > > >>
> > >> >>> > > >> message FlightMessage {
> > >> >>> > > >>   oneof message {
> > >> >>> > > >>     FlightData data = 1;
> > >> >>> > > >>     FlightAny metadata = 2;
> > >> >>> > > >>   }
> > >> >>> > > >> }
> > >> >>> > > >>
> > >> >>> > > >> message FlightAny {
> > >> >>> > > >>   string type = 1;
> > >> >>> > > >>   bytes data = 2;
> > >> >>> > > >> }
> > >> >>> > > >>
> > >> >>> > > >> Is this correct?
> > >> >>> > > >>
> > >> >>> > > >> It might be nice to consider the wrapper message for
> > >> >>> > > >> DoGet/DoPut
> > >> >>> > > >> as
> > >> >>> > > >> well, but at that point, I'd rather we be consistent with all
> > >> >>> > > >> of
> > >> >>> > > >> them,
> > >> >>> > > >> rather than have one of the three methods do its own thing.
> > >> >>> > > >>
> > >> >>> > > >> Thanks,
> > >> >>> > > >> David
> > >> >>> > > >>
> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>> I think we could probably expose the oneof behavior without
> > >> >>> > > >>> exposing
> > >> >>> > the
> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> > >> >>> > > >>> expose
> > >> >>> > > >>> as
> > >> >>> > > >>> two
> > >> >>> > > >>> fields: type and data. Then users could use it for whatever
> > >> >>> > > >>> but
> > >> >>> > > >>> if
> > >> >>> > > >>> people
> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> > >> >>> > > >>> could
> > >> >>> > > >>> use
> > >> >>> > > >>> any
> > >> >>> > > >>> with it easily but they could also use any other mechanism).
> > >> >>> > > >>> At
> > >> >>> > least in
> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> > >> language
> > >> >>> > > >>> bindings
> > >> >>> > > >>> less diy?
> > >> >>> > > >>>
> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
> > >> >>> > > >>> but
> > >> >>> > > >>> it
> > >> >>> > just
> > >> >>> > > >>> seemed a bit janky.
> > >> >>> > > >>>
> > >> >>> > > >>> Thinking about the control message/wrapper object thing, I
> > >> >>> > > >>> wonder
> > >> >>> > > >>> if
> > >> >>> > we
> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property if
> > >> >>> > > >>> we
> > >> >>> > think it
> > >> >>> > > >>> is
> > >> >>> > > >>> a good idea...
> > >> >>> > > >>>
> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> > >> li.davidm96@gmail.com>
> > >> >>> > wrote:
> > >> >>> > > >>>
> > >> >>> > > >>>> I was definitely considering having control messages
> > without
> > >> >>> > > >>>> data,
> > >> >>> > and
> > >> >>> > > >>>> I thought that could be encoded by a FlightData with only
> > >> >>> > app_metadata
> > >> >>> > > >>>> set. I think I understand your position now: FlightData
> > >> >>> > > >>>> should
> > >> >>> > always
> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> > >> >>> > > >>>>
> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> > >> >>> > > >>>> documentation
> > >> >>> > > >>>> on
> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
> > >> >>> > > >>>> a
> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> > >> >>> > > >>>> happening,
> > >> >>> > and
> > >> >>> > > >>>> overall having a clear separation between data and control
> > >> >>> > > >>>> messages
> > >> >>> > is
> > >> >>> > > >>>> cleaner.
> > >> >>> > > >>>>
> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> > >> >>> > > >>>> exposing
> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> > >> >>> > > >>>>
> > >> >>> > > >>>> Best,
> > >> >>> > > >>>> David
> > >> >>> > > >>>>
> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>>>> Hey David,
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> > >> >>> > > >>>>> doget/doput
> > >> >>> > > >>>>> for
> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> > >> >>> > > >>>>> always
> > >> >>> > pattern.
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> > >> >>> > > >>>>> overloading
> > >> >>> > > >>>>> the
> > >> >>> > > >>>> message
> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> > >> >>> > > >>>>> message
> > >> >>> > > >>>> independently
> > >> >>> > > >>>>> of the data message, I would have to define something like
> > >> >>> > > >>>>> an
> > >> >>> > > >>>>> empty
> > >> >>> > > >>>> flight
> > >> >>> > > >>>>> data message that has custom metadata. Why not support a
> > >> >>> > > >>>>> container
> > >> >>> > > >>>>> object
> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
> > >> >>> > > >>>>> add
> > >> >>> > > >>>>> more
> > >> >>> > data
> > >> >>> > > >>>>> as
> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> > >> >>> > > >>>>> messages.
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> > >> >>> > > >>>>> <li...@gmail.com>
> > >> >>> > > >>>>> wrote:
> > >> >>> > > >>>>>
> > >> >>> > > >>>>>> Hi Jacques,
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> Thanks for the comments.
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> > >> prior
> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> > carry
> > >> >>> > > >>>>>> that
> > >> >>> > kind
> > >> >>> > > >>>>>> of information.
> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> > >> >>> > > >>>>>> incoming
> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> > >> >>> > > >>>>>> quite
> > >> >>> > natural
> > >> >>> > > >>>>>> to
> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> > >> >>> > > >>>>>> possible
> > >> >>> > > >>>>>> in
> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> > essence,
> > >> >>> > gRPC-Java
> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> > >> >>> > experimental
> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
> > >> >>> > > >>>>>> but
> > >> >>> > > >>>>>> those
> > >> >>> > > >>>>>> are
> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> > >> >>> > > >>>>>> under
> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> > >> been
> > >> >>> > around
> > >> >>> > > >>>>>> for quite a while).
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> Thanks,
> > >> >>> > > >>>>>> David
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> > >> >>> > > >>>>>>> discussion
> > >> >>> > > >>>>>>> here
> > >> >>> > > >>>>>>> depending on your thoughts.
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> > >> >>> > > >>>>>>> <li...@gmail.com>
> > >> >>> > > >>>> wrote:
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>>> Hey Ryan,
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Thanks for the comments.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> > >> >>> > > >>>>>>>> Python
> > >> >>> > strawman.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> > >> >>> > > >>>>>>>> interleave
> > >> >>> > > >>>> uploads
> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> > >> >>> > > >>>>>>>> synchronous
> > >> >>> > APIs
> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
> > >> >>> > > >>>>>>>> wait
> > >> >>> > > >>>>>>>> for
> > >> >>> > each
> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
> > >> >>> > > >>>>>>>> give
> > >> >>> > > >>>>>>>> us
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> > >> >>> > > >>>>>>>> timeout.)
> > >> >>> > > >>>>>>>> As
> > >> >>> > an
> > >> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
> > >> >>> > > >>>>>>>> want
> > >> >>> > > >>>>>>>> to
> > >> >>> > design
> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> > bidirectional
> > >> >>> > > >>>>>>>> stream
> > >> >>> > > >>>>>>>> like
> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
> > >> >>> > > >>>>>>>> to
> > >> >>> > > >>>>>>>> run
> > >> >>> > into.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal,
> > but
> > >> >>> > > >>>>>>>> the
> > >> >>> > main
> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> > >> >>> > > >>>>>>>> separate
> > >> >>> > > >>>>>>>> calls
> > >> >>> > > >>>>>>>> could
> > >> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
> > >> >>> > > >>>>>>>> has
> > >> >>> > > >>>>>>>> some
> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> > >> >>> > > >>>>>>>> between
> > >> >>> > > >>>>>>>> the
> > >> >>> > two
> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> > instance
> > >> >>> > available,
> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> > >> without
> > >> >>> > > >>>>>>>> warning.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Thanks,
> > >> >>> > > >>>>>>>> David
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > >> >>> > > >>>>>>>>> Hey David,
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
> > >> >>> > > >>>>>>>>> and
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> possibility
> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
> > >> >>> > > >>>>>>>>> would
> > >> >>> > > >>>>>>>>> help
> > >> >>> > me
> > >> >>> > > >>>>>> would
> > >> >>> > > >>>>>>>> be
> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> > >> >>> > > >>>>>>>>> Also,
> > >> >>> > what
> > >> >>> > > >>>>>> would
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> client
> > >> >>> > > >>>>>>>>> block
> > >> >>> > > >>>>>>>> till
> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> > >> flight.vector_mult(a,
> > >> >>> > > >>>>>>>>> b)
> > >> >>> > or
> > >> >>> > > >>>>>>>>> would
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
> > >> >>> > > >>>>>>>>> If
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> later
> > >> >>> > > >>>>>>>>> how
> > >> >>> > > >>>>>>>>> is
> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> > >> >>> > > >>>>>>>>> this
> > >> >>> > could
> > >> >>> > > >>>> be
> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> > >> rather
> > >> >>> > > >>>>>>>>> by a
> > >> >>> > > >>>>>>>>> function/util?
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Best,
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Ryan
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > >> >>> > li.davidm96@gmail.com>
> > >> >>> > > >>>>>> wrote:
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>> Hi all,
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far,
> > but
> > >> we
> > >> >>> > > >>>>>>>>>> have
> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
> > >> >>> > > >>>>>>>>>> to
> > >> >>> > > >>>>>>>>>> both
> > >> >>> > > >>>>>>>>>> send
> > >> >>> > > >>>>>>>>>> and
> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> > >> >>> > > >>>>>>>>>> end,
> > >> >>> > I've
> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>
> > >> >>> >
> > >>
> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on
> > the
> > >> >>> > document.
> > >> >>> > > >>>>>>>>>> I'd
> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > >> >>> > > >>>>>>>>>> straightforward
> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> > I've
> > >> >>> > > >>>>>>>>>> decided
> > >> >>> > > >>>>>>>>>> to
> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> > >> >>> > > >>>>>>>>>> DoPut),
> > >> >>> > > >>>>>>>>>> as
> > >> >>> > > >>>>>>>>>> it
> > >> >>> > > >>>>>> doesn't
> > >> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation
> > of
> > >> >>> > > >>>>>>>>>> the
> > >> >>> > > >>>>>>>>>> semantics.
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> Thanks,
> > >> >>> > > >>>>>>>>>> David
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> --
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> > >> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> > >> join
> > >> >>> > > >>>>>>>>> our
> > >> >>> > > >>>>>>>>> community
> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> > Dremio
> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>
> > >> >>> > > >>>>
> > >> >>> > > >>>
> > >> >>> > > >
> > >> >>> >
> > >> >>
> > >> >
> > >>
> > >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I'd vote for explicitly not supported. We should keep our primitives narrow.

On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:

> Thanks for the feedback.
>
> I do think if we had explicitly embraced gRPC from the beginning,
> there are a lot of places where things could be made more ergonomic,
> including with the metadata fields. But it would also have locked out
> us of potential future transports.
>
> On another note: I hesitate to put too much into this method, but we
> are looking at use cases where potentially, a client may want to
> upload multiple distinct datasets (with differing schemas). (This is a
> little tentative, and I can get more details...) Right now, each
> logical stream in Flight must have a single, consistent schema; would
> it make sense to look at ways to relax this, or declare this
> explicitly out of scope (and require multiple calls and coordination
> with the deployment topology) in order to accomplish this?
>
> Best,
> David
>
> On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > Fair enough. I'm okay with the bytes approach and the proposal looks good
> > to me.
> >
> > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
> >
> >> I've updated the proposal.
> >>
> >> On the subject of Protobuf Any vs bytes, and how to handle
> >> errors/metadata, I still think using bytes is preferable:
> >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> types,
> >> - We wouldn't be able to practically expose the Protobuf field to C++
> >> users without causing build pains,
> >> - We can't let Python users take advantage of the Protobuf field
> >> without somehow being compatible with the Protobuf wheels (by linking
> >> to the same version, and doing magic to turn the C++ Protobufs into
> >> the Python ones),
> >> - All our other application-defined fields are already bytes.
> >>
> >> Applications that want structure can encode JSON or Protobuf Any into
> >> the bytes field themselves, much as you can already do for Ticket,
> >> commands in FlightDescriptors, and application metadata in
> >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> Any directly, since Any itself is a bytes field with a tag, and must
> >> invoke the Protobuf deserializer again to read the actual message.
> >>
> >> If we decide on using bytes, then I don't think it makes sense to
> >> define a new message with a oneof either, since it would be redundant.
> >>
> >> Thanks,
> >> David
> >>
> >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > I've been extremely backlogged, I will update the proposal when I get
> >> > a chance and reply here when done.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> >> Bumping this discussion since a couple of weeks have passed. It seems
> >> >> there are still some questions here, could we summarize what are the
> >> >> alternatives along with any public API implications so we can try to
> >> >> render a decision?
> >> >>
> >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Wes,
> >> >>>
> >> >>> Responses inline:
> >> >>>
> >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> wrote:
> >> >>>
> >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> >> >>> > wrote:
> >> >>> > >
> >> >>> > > The question is whether to repurpose the existing FlightData
> >> >>> > > structure, and allow for the metadata field to be filled in and
> >> data
> >> >>> > > fields to be blank (as a control message), or to wrap the
> >> FlightData
> >> >>> > > structure in another structure that explicitly distinguishes
> >> between
> >> >>> > > control and data messages.
> >> >>> >
> >> >>> > I'm not super against having metadata-only FlightData with empty
> >> body.
> >> >>> > One question to consider is what changes (if any) would need to be
> >> >>> > made to public APIs in either scenario.
> >> >>> >
> >> >>>
> >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> >>> messages
> >> >>> in
> >> >>> the future. This would be a breaking change, but wouldn't change the
> >> >>> wire
> >> >>> format. I think the APIs could be changed backwards compatibly,
> >> >>> though.
> >> >>>
> >> >>>
> >> >>>
> >> >>> > > The other question is how to handle the metadata fields. So far,
> >> >>> > > we've
> >> >>> > > used bytestring fields for application-defined data. This is
> >> >>> > > workable
> >> >>> > > if you want to use Protobuf to define the contents of those
> >> >>> > > fields,
> >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> >>> > > bytestring
> >> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> >> >>> > > typed
> >> >>> > > field, this would be more convenient, but then we'd be exposing
> >> >>> > > Protobuf types. We could alternatively use a combination of a
> >> >>> > > type
> >> >>> > > field and a bytestring field, mimicking what the Protobuf Any
> >> >>> > > type
> >> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> >> any
> >> >>> > > of the language APIs, though.
> >> >>> >
> >> >>> > Leaving the deserialization of the app metadata to the particular
> >> >>> > Flight implementation seems on first principles like the most
> >> flexible
> >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >> >>> > protobuf?
> >> >>> >
> >> >>>
> >> >>>
> >> >>> If Any is used, we could still expose a bytes-based API, but it
> would
> >> >>> have
> >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >> >>> question
> >> >>> would just be how to expose this (would be easier in Java, harder in
> >> >>> C++).
> >> >>>
> >> >>>
> >> >>>
> >> >>> > > David
> >> >>> > >
> >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> >>> > > >
> >> >>> > > > Can one of you explain what is being proposed in non-protobuf
> >> >>> > > > terms?
> >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >> >>> > > >
> >> >>> > > > Regards
> >> >>> > > >
> >> >>> > > > Antoine.
> >> >>> > > >
> >> >>> > > >
> >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> just
> >> be
> >> >>> > > >> application-level logic. (The official guide doesn't even
> >> mention
> >> >>> > > >> it
> >> >>> > > >> in the encoding docs; I found
> >> >>> > > >>
> >> >>> >
> >>
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> >>> > > >> as well.)
> >> >>> > > >>
> >> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> >> >>> > > >> inlining
> >> >>> > > >> the definition of Any, e.g.
> >> >>> > > >>
> >> >>> > > >> message FlightMessage {
> >> >>> > > >>   oneof message {
> >> >>> > > >>     FlightData data = 1;
> >> >>> > > >>     FlightAny metadata = 2;
> >> >>> > > >>   }
> >> >>> > > >> }
> >> >>> > > >>
> >> >>> > > >> message FlightAny {
> >> >>> > > >>   string type = 1;
> >> >>> > > >>   bytes data = 2;
> >> >>> > > >> }
> >> >>> > > >>
> >> >>> > > >> Is this correct?
> >> >>> > > >>
> >> >>> > > >> It might be nice to consider the wrapper message for
> >> >>> > > >> DoGet/DoPut
> >> >>> > > >> as
> >> >>> > > >> well, but at that point, I'd rather we be consistent with all
> >> >>> > > >> of
> >> >>> > > >> them,
> >> >>> > > >> rather than have one of the three methods do its own thing.
> >> >>> > > >>
> >> >>> > > >> Thanks,
> >> >>> > > >> David
> >> >>> > > >>
> >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>> I think we could probably expose the oneof behavior without
> >> >>> > > >>> exposing
> >> >>> > the
> >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> >>> > > >>> expose
> >> >>> > > >>> as
> >> >>> > > >>> two
> >> >>> > > >>> fields: type and data. Then users could use it for whatever
> >> >>> > > >>> but
> >> >>> > > >>> if
> >> >>> > > >>> people
> >> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> >> >>> > > >>> could
> >> >>> > > >>> use
> >> >>> > > >>> any
> >> >>> > > >>> with it easily but they could also use any other mechanism).
> >> >>> > > >>> At
> >> >>> > least in
> >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> language
> >> >>> > > >>> bindings
> >> >>> > > >>> less diy?
> >> >>> > > >>>
> >> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
> >> >>> > > >>> but
> >> >>> > > >>> it
> >> >>> > just
> >> >>> > > >>> seemed a bit janky.
> >> >>> > > >>>
> >> >>> > > >>> Thinking about the control message/wrapper object thing, I
> >> >>> > > >>> wonder
> >> >>> > > >>> if
> >> >>> > we
> >> >>> > > >>> should redefine DoPut and DoGet to have the same property if
> >> >>> > > >>> we
> >> >>> > think it
> >> >>> > > >>> is
> >> >>> > > >>> a good idea...
> >> >>> > > >>>
> >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> li.davidm96@gmail.com>
> >> >>> > wrote:
> >> >>> > > >>>
> >> >>> > > >>>> I was definitely considering having control messages
> without
> >> >>> > > >>>> data,
> >> >>> > and
> >> >>> > > >>>> I thought that could be encoded by a FlightData with only
> >> >>> > app_metadata
> >> >>> > > >>>> set. I think I understand your position now: FlightData
> >> >>> > > >>>> should
> >> >>> > always
> >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> >>> > > >>>>
> >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> >>> > > >>>> documentation
> >> >>> > > >>>> on
> >> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
> >> >>> > > >>>> a
> >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> >>> > > >>>> happening,
> >> >>> > and
> >> >>> > > >>>> overall having a clear separation between data and control
> >> >>> > > >>>> messages
> >> >>> > is
> >> >>> > > >>>> cleaner.
> >> >>> > > >>>>
> >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> >> >>> > > >>>> exposing
> >> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> >> >>> > > >>>>
> >> >>> > > >>>> Best,
> >> >>> > > >>>> David
> >> >>> > > >>>>
> >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>>>> Hey David,
> >> >>> > > >>>>>
> >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> >> >>> > > >>>>> doget/doput
> >> >>> > > >>>>> for
> >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> >> >>> > > >>>>> always
> >> >>> > pattern.
> >> >>> > > >>>>>
> >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> >>> > > >>>>> overloading
> >> >>> > > >>>>> the
> >> >>> > > >>>> message
> >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> >>> > > >>>>> message
> >> >>> > > >>>> independently
> >> >>> > > >>>>> of the data message, I would have to define something like
> >> >>> > > >>>>> an
> >> >>> > > >>>>> empty
> >> >>> > > >>>> flight
> >> >>> > > >>>>> data message that has custom metadata. Why not support a
> >> >>> > > >>>>> container
> >> >>> > > >>>>> object
> >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
> >> >>> > > >>>>> add
> >> >>> > > >>>>> more
> >> >>> > data
> >> >>> > > >>>>> as
> >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> >>> > > >>>>> messages.
> >> >>> > > >>>>>
> >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> >>> > > >>>>> <li...@gmail.com>
> >> >>> > > >>>>> wrote:
> >> >>> > > >>>>>
> >> >>> > > >>>>>> Hi Jacques,
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> Thanks for the comments.
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> >> prior
> >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> carry
> >> >>> > > >>>>>> that
> >> >>> > kind
> >> >>> > > >>>>>> of information.
> >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >> >>> > > >>>>>> incoming
> >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> >> >>> > > >>>>>> quite
> >> >>> > natural
> >> >>> > > >>>>>> to
> >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> >>> > > >>>>>> possible
> >> >>> > > >>>>>> in
> >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> essence,
> >> >>> > gRPC-Java
> >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> >> >>> > experimental
> >> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
> >> >>> > > >>>>>> but
> >> >>> > > >>>>>> those
> >> >>> > > >>>>>> are
> >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> >> >>> > > >>>>>> under
> >> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> >> been
> >> >>> > around
> >> >>> > > >>>>>> for quite a while).
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> Thanks,
> >> >>> > > >>>>>> David
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> >> >>> > > >>>>>>> discussion
> >> >>> > > >>>>>>> here
> >> >>> > > >>>>>>> depending on your thoughts.
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> >>> > > >>>>>>> <li...@gmail.com>
> >> >>> > > >>>> wrote:
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>>> Hey Ryan,
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Thanks for the comments.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> >>> > > >>>>>>>> Python
> >> >>> > strawman.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >> >>> > > >>>>>>>> interleave
> >> >>> > > >>>> uploads
> >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> >>> > > >>>>>>>> synchronous
> >> >>> > APIs
> >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
> >> >>> > > >>>>>>>> wait
> >> >>> > > >>>>>>>> for
> >> >>> > each
> >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
> >> >>> > > >>>>>>>> give
> >> >>> > > >>>>>>>> us
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> >>> > > >>>>>>>> timeout.)
> >> >>> > > >>>>>>>> As
> >> >>> > an
> >> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
> >> >>> > > >>>>>>>> want
> >> >>> > > >>>>>>>> to
> >> >>> > design
> >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> bidirectional
> >> >>> > > >>>>>>>> stream
> >> >>> > > >>>>>>>> like
> >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
> >> >>> > > >>>>>>>> to
> >> >>> > > >>>>>>>> run
> >> >>> > into.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal,
> but
> >> >>> > > >>>>>>>> the
> >> >>> > main
> >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> >>> > > >>>>>>>> separate
> >> >>> > > >>>>>>>> calls
> >> >>> > > >>>>>>>> could
> >> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
> >> >>> > > >>>>>>>> has
> >> >>> > > >>>>>>>> some
> >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> >>> > > >>>>>>>> between
> >> >>> > > >>>>>>>> the
> >> >>> > two
> >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> instance
> >> >>> > available,
> >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> >> without
> >> >>> > > >>>>>>>> warning.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Thanks,
> >> >>> > > >>>>>>>> David
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> >>> > > >>>>>>>>> Hey David,
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
> >> >>> > > >>>>>>>>> and
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> possibility
> >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
> >> >>> > > >>>>>>>>> would
> >> >>> > > >>>>>>>>> help
> >> >>> > me
> >> >>> > > >>>>>> would
> >> >>> > > >>>>>>>> be
> >> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> >> >>> > > >>>>>>>>> Also,
> >> >>> > what
> >> >>> > > >>>>>> would
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> client
> >> >>> > > >>>>>>>>> block
> >> >>> > > >>>>>>>> till
> >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> flight.vector_mult(a,
> >> >>> > > >>>>>>>>> b)
> >> >>> > or
> >> >>> > > >>>>>>>>> would
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
> >> >>> > > >>>>>>>>> If
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> later
> >> >>> > > >>>>>>>>> how
> >> >>> > > >>>>>>>>> is
> >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> >> >>> > > >>>>>>>>> this
> >> >>> > could
> >> >>> > > >>>> be
> >> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> >> rather
> >> >>> > > >>>>>>>>> by a
> >> >>> > > >>>>>>>>> function/util?
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Best,
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Ryan
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> >>> > li.davidm96@gmail.com>
> >> >>> > > >>>>>> wrote:
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>> Hi all,
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far,
> but
> >> we
> >> >>> > > >>>>>>>>>> have
> >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
> >> >>> > > >>>>>>>>>> to
> >> >>> > > >>>>>>>>>> both
> >> >>> > > >>>>>>>>>> send
> >> >>> > > >>>>>>>>>> and
> >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> >> >>> > > >>>>>>>>>> end,
> >> >>> > I've
> >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>
> >> >>> > > >>>>
> >> >>> >
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on
> the
> >> >>> > document.
> >> >>> > > >>>>>>>>>> I'd
> >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >> >>> > > >>>>>>>>>> straightforward
> >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> I've
> >> >>> > > >>>>>>>>>> decided
> >> >>> > > >>>>>>>>>> to
> >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> >>> > > >>>>>>>>>> DoPut),
> >> >>> > > >>>>>>>>>> as
> >> >>> > > >>>>>>>>>> it
> >> >>> > > >>>>>> doesn't
> >> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation
> of
> >> >>> > > >>>>>>>>>> the
> >> >>> > > >>>>>>>>>> semantics.
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> Thanks,
> >> >>> > > >>>>>>>>>> David
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> --
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> >> join
> >> >>> > > >>>>>>>>> our
> >> >>> > > >>>>>>>>> community
> >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> Dremio
> >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>
> >> >>> > > >>>>>
> >> >>> > > >>>>
> >> >>> > > >>>
> >> >>> > > >
> >> >>> >
> >> >>
> >> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Thanks for the feedback.

I do think if we had explicitly embraced gRPC from the beginning,
there are a lot of places where things could be made more ergonomic,
including with the metadata fields. But it would also have locked out
us of potential future transports.

On another note: I hesitate to put too much into this method, but we
are looking at use cases where potentially, a client may want to
upload multiple distinct datasets (with differing schemas). (This is a
little tentative, and I can get more details...) Right now, each
logical stream in Flight must have a single, consistent schema; would
it make sense to look at ways to relax this, or declare this
explicitly out of scope (and require multiple calls and coordination
with the deployment topology) in order to accomplish this?

Best,
David

On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> Fair enough. I'm okay with the bytes approach and the proposal looks good
> to me.
>
> On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
>
>> I've updated the proposal.
>>
>> On the subject of Protobuf Any vs bytes, and how to handle
>> errors/metadata, I still think using bytes is preferable:
>> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
>> - We wouldn't be able to practically expose the Protobuf field to C++
>> users without causing build pains,
>> - We can't let Python users take advantage of the Protobuf field
>> without somehow being compatible with the Protobuf wheels (by linking
>> to the same version, and doing magic to turn the C++ Protobufs into
>> the Python ones),
>> - All our other application-defined fields are already bytes.
>>
>> Applications that want structure can encode JSON or Protobuf Any into
>> the bytes field themselves, much as you can already do for Ticket,
>> commands in FlightDescriptors, and application metadata in
>> DoGet/DoPut. I don't think this is (much) less efficient than using
>> Any directly, since Any itself is a bytes field with a tag, and must
>> invoke the Protobuf deserializer again to read the actual message.
>>
>> If we decide on using bytes, then I don't think it makes sense to
>> define a new message with a oneof either, since it would be redundant.
>>
>> Thanks,
>> David
>>
>> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > I've been extremely backlogged, I will update the proposal when I get
>> > a chance and reply here when done.
>> >
>> > Best,
>> > David
>> >
>> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> >> Bumping this discussion since a couple of weeks have passed. It seems
>> >> there are still some questions here, could we summarize what are the
>> >> alternatives along with any public API implications so we can try to
>> >> render a decision?
>> >>
>> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Wes,
>> >>>
>> >>> Responses inline:
>> >>>
>> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>> >>>
>> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
>> >>> > wrote:
>> >>> > >
>> >>> > > The question is whether to repurpose the existing FlightData
>> >>> > > structure, and allow for the metadata field to be filled in and
>> data
>> >>> > > fields to be blank (as a control message), or to wrap the
>> FlightData
>> >>> > > structure in another structure that explicitly distinguishes
>> between
>> >>> > > control and data messages.
>> >>> >
>> >>> > I'm not super against having metadata-only FlightData with empty
>> body.
>> >>> > One question to consider is what changes (if any) would need to be
>> >>> > made to public APIs in either scenario.
>> >>> >
>> >>>
>> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
>> >>> messages
>> >>> in
>> >>> the future. This would be a breaking change, but wouldn't change the
>> >>> wire
>> >>> format. I think the APIs could be changed backwards compatibly,
>> >>> though.
>> >>>
>> >>>
>> >>>
>> >>> > > The other question is how to handle the metadata fields. So far,
>> >>> > > we've
>> >>> > > used bytestring fields for application-defined data. This is
>> >>> > > workable
>> >>> > > if you want to use Protobuf to define the contents of those
>> >>> > > fields,
>> >>> > > but requires you to pack/unpack your Protobuf into/from the
>> >>> > > bytestring
>> >>> > > field. If we instead used the Protobuf Any field, a dynamically
>> >>> > > typed
>> >>> > > field, this would be more convenient, but then we'd be exposing
>> >>> > > Protobuf types. We could alternatively use a combination of a
>> >>> > > type
>> >>> > > field and a bytestring field, mimicking what the Protobuf Any
>> >>> > > type
>> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
>> any
>> >>> > > of the language APIs, though.
>> >>> >
>> >>> > Leaving the deserialization of the app metadata to the particular
>> >>> > Flight implementation seems on first principles like the most
>> flexible
>> >>> > thing, if Any is used, does that mean the metadata _must_ be a
>> >>> > protobuf?
>> >>> >
>> >>>
>> >>>
>> >>> If Any is used, we could still expose a bytes-based API, but it would
>> >>> have
>> >>> some more wrapping. (We could put a ByteString in Any.) Then the
>> >>> question
>> >>> would just be how to expose this (would be easier in Java, harder in
>> >>> C++).
>> >>>
>> >>>
>> >>>
>> >>> > > David
>> >>> > >
>> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> >>> > > >
>> >>> > > > Can one of you explain what is being proposed in non-protobuf
>> >>> > > > terms?
>> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> >>> > > >
>> >>> > > > Regards
>> >>> > > >
>> >>> > > > Antoine.
>> >>> > > >
>> >>> > > >
>> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
>> be
>> >>> > > >> application-level logic. (The official guide doesn't even
>> mention
>> >>> > > >> it
>> >>> > > >> in the encoding docs; I found
>> >>> > > >>
>> >>> >
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > > >> as well.)
>> >>> > > >>
>> >>> > > >> If I follow you, Jacques, then you are proposing essentially
>> >>> > > >> inlining
>> >>> > > >> the definition of Any, e.g.
>> >>> > > >>
>> >>> > > >> message FlightMessage {
>> >>> > > >>   oneof message {
>> >>> > > >>     FlightData data = 1;
>> >>> > > >>     FlightAny metadata = 2;
>> >>> > > >>   }
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> message FlightAny {
>> >>> > > >>   string type = 1;
>> >>> > > >>   bytes data = 2;
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> Is this correct?
>> >>> > > >>
>> >>> > > >> It might be nice to consider the wrapper message for
>> >>> > > >> DoGet/DoPut
>> >>> > > >> as
>> >>> > > >> well, but at that point, I'd rather we be consistent with all
>> >>> > > >> of
>> >>> > > >> them,
>> >>> > > >> rather than have one of the three methods do its own thing.
>> >>> > > >>
>> >>> > > >> Thanks,
>> >>> > > >> David
>> >>> > > >>
>> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>> I think we could probably expose the oneof behavior without
>> >>> > > >>> exposing
>> >>> > the
>> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
>> >>> > > >>> expose
>> >>> > > >>> as
>> >>> > > >>> two
>> >>> > > >>> fields: type and data. Then users could use it for whatever
>> >>> > > >>> but
>> >>> > > >>> if
>> >>> > > >>> people
>> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
>> >>> > > >>> could
>> >>> > > >>> use
>> >>> > > >>> any
>> >>> > > >>> with it easily but they could also use any other mechanism).
>> >>> > > >>> At
>> >>> > least in
>> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
>> language
>> >>> > > >>> bindings
>> >>> > > >>> less diy?
>> >>> > > >>>
>> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
>> >>> > > >>> but
>> >>> > > >>> it
>> >>> > just
>> >>> > > >>> seemed a bit janky.
>> >>> > > >>>
>> >>> > > >>> Thinking about the control message/wrapper object thing, I
>> >>> > > >>> wonder
>> >>> > > >>> if
>> >>> > we
>> >>> > > >>> should redefine DoPut and DoGet to have the same property if
>> >>> > > >>> we
>> >>> > think it
>> >>> > > >>> is
>> >>> > > >>> a good idea...
>> >>> > > >>>
>> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> li.davidm96@gmail.com>
>> >>> > wrote:
>> >>> > > >>>
>> >>> > > >>>> I was definitely considering having control messages without
>> >>> > > >>>> data,
>> >>> > and
>> >>> > > >>>> I thought that could be encoded by a FlightData with only
>> >>> > app_metadata
>> >>> > > >>>> set. I think I understand your position now: FlightData
>> >>> > > >>>> should
>> >>> > always
>> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > > >>>>
>> >>> > > >>>> That makes sense to me, and is consistent with the
>> >>> > > >>>> documentation
>> >>> > > >>>> on
>> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
>> >>> > > >>>> a
>> >>> > > >>>> redundant metadata field, but oneof prevents that from
>> >>> > > >>>> happening,
>> >>> > and
>> >>> > > >>>> overall having a clear separation between data and control
>> >>> > > >>>> messages
>> >>> > is
>> >>> > > >>>> cleaner.
>> >>> > > >>>>
>> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> >>> > > >>>> exposing
>> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> >>> > > >>>>
>> >>> > > >>>> Best,
>> >>> > > >>>> David
>> >>> > > >>>>
>> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>>>> Hey David,
>> >>> > > >>>>>
>> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> >>> > > >>>>> doget/doput
>> >>> > > >>>>> for
>> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
>> >>> > > >>>>> always
>> >>> > pattern.
>> >>> > > >>>>>
>> >>> > > >>>>> On the comment around the FlightData, I think it is
>> >>> > > >>>>> overloading
>> >>> > > >>>>> the
>> >>> > > >>>> message
>> >>> > > >>>>> to use metadata for this. If I want to send a control
>> >>> > > >>>>> message
>> >>> > > >>>> independently
>> >>> > > >>>>> of the data message, I would have to define something like
>> >>> > > >>>>> an
>> >>> > > >>>>> empty
>> >>> > > >>>> flight
>> >>> > > >>>>> data message that has custom metadata. Why not support a
>> >>> > > >>>>> container
>> >>> > > >>>>> object
>> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
>> >>> > > >>>>> add
>> >>> > > >>>>> more
>> >>> > data
>> >>> > > >>>>> as
>> >>> > > >>>>> desired. The default impl could be a noop for the Any
>> >>> > > >>>>> messages.
>> >>> > > >>>>>
>> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > > >>>>> <li...@gmail.com>
>> >>> > > >>>>> wrote:
>> >>> > > >>>>>
>> >>> > > >>>>>> Hi Jacques,
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks for the comments.
>> >>> > > >>>>>>
>> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > > >>>>>> - FlightData already has metadata fields as a result of
>> prior
>> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> >>> > > >>>>>> that
>> >>> > kind
>> >>> > > >>>>>> of information.
>> >>> > > >>>>>> - I like the suggestion of an async handler to handle
>> >>> > > >>>>>> incoming
>> >>> > > >>>>>> messages as the fundamental API; it would actually be
>> >>> > > >>>>>> quite
>> >>> > natural
>> >>> > > >>>>>> to
>> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
>> >>> > > >>>>>> possible
>> >>> > > >>>>>> in
>> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> >>> > gRPC-Java
>> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> >>> > experimental
>> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
>> >>> > > >>>>>> but
>> >>> > > >>>>>> those
>> >>> > > >>>>>> are
>> >>> > > >>>>>> only in relatively recent gRPC versions and are still
>> >>> > > >>>>>> under
>> >>> > > >>>>>> development (contrary to the interceptor APIs which have
>> been
>> >>> > around
>> >>> > > >>>>>> for quite a while).
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks,
>> >>> > > >>>>>> David
>> >>> > > >>>>>>
>> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> >>> > > >>>>>>> discussion
>> >>> > > >>>>>>> here
>> >>> > > >>>>>>> depending on your thoughts.
>> >>> > > >>>>>>>
>> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > > >>>>>>> <li...@gmail.com>
>> >>> > > >>>> wrote:
>> >>> > > >>>>>>>
>> >>> > > >>>>>>>> Hey Ryan,
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
>> >>> > > >>>>>>>> Python
>> >>> > strawman.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> >>> > > >>>>>>>> interleave
>> >>> > > >>>> uploads
>> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> >>> > > >>>>>>>> synchronous
>> >>> > APIs
>> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
>> >>> > > >>>>>>>> wait
>> >>> > > >>>>>>>> for
>> >>> > each
>> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
>> >>> > > >>>>>>>> give
>> >>> > > >>>>>>>> us
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>> >>> > > >>>>>>>> timeout.)
>> >>> > > >>>>>>>> As
>> >>> > an
>> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
>> >>> > > >>>>>>>> want
>> >>> > > >>>>>>>> to
>> >>> > design
>> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> >>> > > >>>>>>>> stream
>> >>> > > >>>>>>>> like
>> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
>> >>> > > >>>>>>>> to
>> >>> > > >>>>>>>> run
>> >>> > into.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>> >>> > > >>>>>>>> the
>> >>> > main
>> >>> > > >>>>>>>> concern is that depending on how you deploy, two
>> >>> > > >>>>>>>> separate
>> >>> > > >>>>>>>> calls
>> >>> > > >>>>>>>> could
>> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
>> >>> > > >>>>>>>> has
>> >>> > > >>>>>>>> some
>> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
>> >>> > > >>>>>>>> between
>> >>> > > >>>>>>>> the
>> >>> > two
>> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> >>> > available,
>> >>> > > >>>>>>>> the client will happily reconnect to the new server
>> without
>> >>> > > >>>>>>>> warning.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks,
>> >>> > > >>>>>>>> David
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> >>> > > >>>>>>>>> Hey David,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
>> >>> > > >>>>>>>>> and
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> possibility
>> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>>> help
>> >>> > me
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> be
>> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> >>> > > >>>>>>>>> Also,
>> >>> > what
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> client
>> >>> > > >>>>>>>>> block
>> >>> > > >>>>>>>> till
>> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> flight.vector_mult(a,
>> >>> > > >>>>>>>>> b)
>> >>> > or
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
>> >>> > > >>>>>>>>> If
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> later
>> >>> > > >>>>>>>>> how
>> >>> > > >>>>>>>>> is
>> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>> >>> > > >>>>>>>>> this
>> >>> > could
>> >>> > > >>>> be
>> >>> > > >>>>>>>>> implemented without extending the RPC interface but
>> rather
>> >>> > > >>>>>>>>> by a
>> >>> > > >>>>>>>>> function/util?
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Best,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> >>> > li.davidm96@gmail.com>
>> >>> > > >>>>>> wrote:
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>> Hi all,
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
>> we
>> >>> > > >>>>>>>>>> have
>> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> both
>> >>> > > >>>>>>>>>> send
>> >>> > > >>>>>>>>>> and
>> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>> >>> > > >>>>>>>>>> end,
>> >>> > I've
>> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>
>> >>> >
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> >>> > document.
>> >>> > > >>>>>>>>>> I'd
>> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> >>> > > >>>>>>>>>> straightforward
>> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> >>> > > >>>>>>>>>> decided
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> table the other format change I had proposed (on
>> >>> > > >>>>>>>>>> DoPut),
>> >>> > > >>>>>>>>>> as
>> >>> > > >>>>>>>>>> it
>> >>> > > >>>>>> doesn't
>> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>> >>> > > >>>>>>>>>> the
>> >>> > > >>>>>>>>>> semantics.
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Thanks,
>> >>> > > >>>>>>>>>> David
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> --
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
>> join
>> >>> > > >>>>>>>>> our
>> >>> > > >>>>>>>>> community
>> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>>
>> >>> > > >>>>
>> >>> > > >>>
>> >>> > > >
>> >>> >
>> >>
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
Fair enough. I'm okay with the bytes approach and the proposal looks good
to me.

On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:

> I've updated the proposal.
>
> On the subject of Protobuf Any vs bytes, and how to handle
> errors/metadata, I still think using bytes is preferable:
> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
> - We wouldn't be able to practically expose the Protobuf field to C++
> users without causing build pains,
> - We can't let Python users take advantage of the Protobuf field
> without somehow being compatible with the Protobuf wheels (by linking
> to the same version, and doing magic to turn the C++ Protobufs into
> the Python ones),
> - All our other application-defined fields are already bytes.
>
> Applications that want structure can encode JSON or Protobuf Any into
> the bytes field themselves, much as you can already do for Ticket,
> commands in FlightDescriptors, and application metadata in
> DoGet/DoPut. I don't think this is (much) less efficient than using
> Any directly, since Any itself is a bytes field with a tag, and must
> invoke the Protobuf deserializer again to read the actual message.
>
> If we decide on using bytes, then I don't think it makes sense to
> define a new message with a oneof either, since it would be redundant.
>
> Thanks,
> David
>
> On 11/7/19, David Li <li...@gmail.com> wrote:
> > I've been extremely backlogged, I will update the proposal when I get
> > a chance and reply here when done.
> >
> > Best,
> > David
> >
> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> Bumping this discussion since a couple of weeks have passed. It seems
> >> there are still some questions here, could we summarize what are the
> >> alternatives along with any public API implications so we can try to
> >> render a decision?
> >>
> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
> >>>
> >>> Hi Wes,
> >>>
> >>> Responses inline:
> >>>
> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
> >>>
> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > The question is whether to repurpose the existing FlightData
> >>> > > structure, and allow for the metadata field to be filled in and
> data
> >>> > > fields to be blank (as a control message), or to wrap the
> FlightData
> >>> > > structure in another structure that explicitly distinguishes
> between
> >>> > > control and data messages.
> >>> >
> >>> > I'm not super against having metadata-only FlightData with empty
> body.
> >>> > One question to consider is what changes (if any) would need to be
> >>> > made to public APIs in either scenario.
> >>> >
> >>>
> >>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
> >>> in
> >>> the future. This would be a breaking change, but wouldn't change the
> >>> wire
> >>> format. I think the APIs could be changed backwards compatibly, though.
> >>>
> >>>
> >>>
> >>> > > The other question is how to handle the metadata fields. So far,
> >>> > > we've
> >>> > > used bytestring fields for application-defined data. This is
> >>> > > workable
> >>> > > if you want to use Protobuf to define the contents of those fields,
> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >>> > > bytestring
> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> >>> > > typed
> >>> > > field, this would be more convenient, but then we'd be exposing
> >>> > > Protobuf types. We could alternatively use a combination of a type
> >>> > > field and a bytestring field, mimicking what the Protobuf Any type
> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> any
> >>> > > of the language APIs, though.
> >>> >
> >>> > Leaving the deserialization of the app metadata to the particular
> >>> > Flight implementation seems on first principles like the most
> flexible
> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >>> > protobuf?
> >>> >
> >>>
> >>>
> >>> If Any is used, we could still expose a bytes-based API, but it would
> >>> have
> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >>> question
> >>> would just be how to expose this (would be easier in Java, harder in
> >>> C++).
> >>>
> >>>
> >>>
> >>> > > David
> >>> > >
> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >>> > > >
> >>> > > > Can one of you explain what is being proposed in non-protobuf
> >>> > > > terms?
> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >>> > > >
> >>> > > > Regards
> >>> > > >
> >>> > > > Antoine.
> >>> > > >
> >>> > > >
> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
> be
> >>> > > >> application-level logic. (The official guide doesn't even
> mention
> >>> > > >> it
> >>> > > >> in the encoding docs; I found
> >>> > > >>
> >>> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > > >> as well.)
> >>> > > >>
> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> >>> > > >> inlining
> >>> > > >> the definition of Any, e.g.
> >>> > > >>
> >>> > > >> message FlightMessage {
> >>> > > >>   oneof message {
> >>> > > >>     FlightData data = 1;
> >>> > > >>     FlightAny metadata = 2;
> >>> > > >>   }
> >>> > > >> }
> >>> > > >>
> >>> > > >> message FlightAny {
> >>> > > >>   string type = 1;
> >>> > > >>   bytes data = 2;
> >>> > > >> }
> >>> > > >>
> >>> > > >> Is this correct?
> >>> > > >>
> >>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
> >>> > > >> as
> >>> > > >> well, but at that point, I'd rather we be consistent with all of
> >>> > > >> them,
> >>> > > >> rather than have one of the three methods do its own thing.
> >>> > > >>
> >>> > > >> Thanks,
> >>> > > >> David
> >>> > > >>
> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>> I think we could probably expose the oneof behavior without
> >>> > > >>> exposing
> >>> > the
> >>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
> >>> > > >>> as
> >>> > > >>> two
> >>> > > >>> fields: type and data. Then users could use it for whatever but
> >>> > > >>> if
> >>> > > >>> people
> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> >>> > > >>> could
> >>> > > >>> use
> >>> > > >>> any
> >>> > > >>> with it easily but they could also use any other mechanism). At
> >>> > least in
> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> language
> >>> > > >>> bindings
> >>> > > >>> less diy?
> >>> > > >>>
> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
> >>> > > >>> it
> >>> > just
> >>> > > >>> seemed a bit janky.
> >>> > > >>>
> >>> > > >>> Thinking about the control message/wrapper object thing, I
> >>> > > >>> wonder
> >>> > > >>> if
> >>> > we
> >>> > > >>> should redefine DoPut and DoGet to have the same property if we
> >>> > think it
> >>> > > >>> is
> >>> > > >>> a good idea...
> >>> > > >>>
> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> li.davidm96@gmail.com>
> >>> > wrote:
> >>> > > >>>
> >>> > > >>>> I was definitely considering having control messages without
> >>> > > >>>> data,
> >>> > and
> >>> > > >>>> I thought that could be encoded by a FlightData with only
> >>> > app_metadata
> >>> > > >>>> set. I think I understand your position now: FlightData should
> >>> > always
> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > > >>>>
> >>> > > >>>> That makes sense to me, and is consistent with the
> >>> > > >>>> documentation
> >>> > > >>>> on
> >>> > > >>>> FlightData in the Protobuf file. I was worried about having a
> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >>> > > >>>> happening,
> >>> > and
> >>> > > >>>> overall having a clear separation between data and control
> >>> > > >>>> messages
> >>> > is
> >>> > > >>>> cleaner.
> >>> > > >>>>
> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> >>> > > >>>> exposing
> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> >>> > > >>>>
> >>> > > >>>> Best,
> >>> > > >>>> David
> >>> > > >>>>
> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>>>> Hey David,
> >>> > > >>>>>
> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> >>> > > >>>>> doget/doput
> >>> > > >>>>> for
> >>> > > >>>>> async. Yes, more thinking java given java grpc's async always
> >>> > pattern.
> >>> > > >>>>>
> >>> > > >>>>> On the comment around the FlightData, I think it is
> >>> > > >>>>> overloading
> >>> > > >>>>> the
> >>> > > >>>> message
> >>> > > >>>>> to use metadata for this. If I want to send a control message
> >>> > > >>>> independently
> >>> > > >>>>> of the data message, I would have to define something like an
> >>> > > >>>>> empty
> >>> > > >>>> flight
> >>> > > >>>>> data message that has custom metadata. Why not support a
> >>> > > >>>>> container
> >>> > > >>>>> object
> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
> >>> > > >>>>> more
> >>> > data
> >>> > > >>>>> as
> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >>> > > >>>>> messages.
> >>> > > >>>>>
> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > > >>>>> <li...@gmail.com>
> >>> > > >>>>> wrote:
> >>> > > >>>>>
> >>> > > >>>>>> Hi Jacques,
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks for the comments.
> >>> > > >>>>>>
> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> prior
> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
> >>> > > >>>>>> that
> >>> > kind
> >>> > > >>>>>> of information.
> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >>> > > >>>>>> incoming
> >>> > > >>>>>> messages as the fundamental API; it would actually be quite
> >>> > natural
> >>> > > >>>>>> to
> >>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
> >>> > > >>>>>> in
> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
> >>> > gRPC-Java
> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> >>> > experimental
> >>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
> >>> > > >>>>>> those
> >>> > > >>>>>> are
> >>> > > >>>>>> only in relatively recent gRPC versions and are still under
> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> been
> >>> > around
> >>> > > >>>>>> for quite a while).
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks,
> >>> > > >>>>>> David
> >>> > > >>>>>>
> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> >>> > > >>>>>>> discussion
> >>> > > >>>>>>> here
> >>> > > >>>>>>> depending on your thoughts.
> >>> > > >>>>>>>
> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > > >>>>>>> <li...@gmail.com>
> >>> > > >>>> wrote:
> >>> > > >>>>>>>
> >>> > > >>>>>>>> Hey Ryan,
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks for the comments.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> >>> > strawman.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >>> > > >>>>>>>> interleave
> >>> > > >>>> uploads
> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >>> > > >>>>>>>> synchronous
> >>> > APIs
> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
> >>> > > >>>>>>>> for
> >>> > each
> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
> >>> > > >>>>>>>> us
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >>> > > >>>>>>>> timeout.)
> >>> > > >>>>>>>> As
> >>> > an
> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> This is mostly tangential though, eventually we will want
> >>> > > >>>>>>>> to
> >>> > design
> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
> >>> > > >>>>>>>> stream
> >>> > > >>>>>>>> like
> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
> >>> > > >>>>>>>> run
> >>> > into.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
> >>> > > >>>>>>>> the
> >>> > main
> >>> > > >>>>>>>> concern is that depending on how you deploy, two separate
> >>> > > >>>>>>>> calls
> >>> > > >>>>>>>> could
> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
> >>> > > >>>>>>>> some
> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
> >>> > > >>>>>>>> the
> >>> > two
> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
> >>> > available,
> >>> > > >>>>>>>> the client will happily reconnect to the new server
> without
> >>> > > >>>>>>>> warning.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks,
> >>> > > >>>>>>>> David
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >>> > > >>>>>>>>> Hey David,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> possibility
> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
> >>> > > >>>>>>>>> help
> >>> > me
> >>> > > >>>>>> would
> >>> > > >>>>>>>> be
> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> >>> > > >>>>>>>>> Also,
> >>> > what
> >>> > > >>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
> >>> > > >>>>>>>>> client
> >>> > > >>>>>>>>> block
> >>> > > >>>>>>>> till
> >>> > > >>>>>>>>> the bidirectional call return ie c =
> flight.vector_mult(a,
> >>> > > >>>>>>>>> b)
> >>> > or
> >>> > > >>>>>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client wait to be signaled that computation was done. If
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> later
> >>> > > >>>>>>>>> how
> >>> > > >>>>>>>>> is
> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> >>> > > >>>>>>>>> this
> >>> > could
> >>> > > >>>> be
> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> rather
> >>> > > >>>>>>>>> by a
> >>> > > >>>>>>>>> function/util?
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Best,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > li.davidm96@gmail.com>
> >>> > > >>>>>> wrote:
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>> Hi all,
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
> we
> >>> > > >>>>>>>>>> have
> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
> >>> > > >>>>>>>>>> both
> >>> > > >>>>>>>>>> send
> >>> > > >>>>>>>>>> and
> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> >>> > > >>>>>>>>>> end,
> >>> > I've
> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>
> >>> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
> >>> > document.
> >>> > > >>>>>>>>>> I'd
> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >>> > > >>>>>>>>>> straightforward
> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
> >>> > > >>>>>>>>>> decided
> >>> > > >>>>>>>>>> to
> >>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
> >>> > > >>>>>>>>>> as
> >>> > > >>>>>>>>>> it
> >>> > > >>>>>> doesn't
> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
> >>> > > >>>>>>>>>> the
> >>> > > >>>>>>>>>> semantics.
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Thanks,
> >>> > > >>>>>>>>>> David
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> --
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> join
> >>> > > >>>>>>>>> our
> >>> > > >>>>>>>>> community
> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>>
> >>> > > >>>>
> >>> > > >>>
> >>> > > >
> >>> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
I've updated the proposal.

On the subject of Protobuf Any vs bytes, and how to handle
errors/metadata, I still think using bytes is preferable:
- It doesn't require (conditionally) exposing or wrapping Protobuf types,
- We wouldn't be able to practically expose the Protobuf field to C++
users without causing build pains,
- We can't let Python users take advantage of the Protobuf field
without somehow being compatible with the Protobuf wheels (by linking
to the same version, and doing magic to turn the C++ Protobufs into
the Python ones),
- All our other application-defined fields are already bytes.

Applications that want structure can encode JSON or Protobuf Any into
the bytes field themselves, much as you can already do for Ticket,
commands in FlightDescriptors, and application metadata in
DoGet/DoPut. I don't think this is (much) less efficient than using
Any directly, since Any itself is a bytes field with a tag, and must
invoke the Protobuf deserializer again to read the actual message.

If we decide on using bytes, then I don't think it makes sense to
define a new message with a oneof either, since it would be redundant.

Thanks,
David

On 11/7/19, David Li <li...@gmail.com> wrote:
> I've been extremely backlogged, I will update the proposal when I get
> a chance and reply here when done.
>
> Best,
> David
>
> On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> Bumping this discussion since a couple of weeks have passed. It seems
>> there are still some questions here, could we summarize what are the
>> alternatives along with any public API implications so we can try to
>> render a decision?
>>
>> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>>>
>>> Hi Wes,
>>>
>>> Responses inline:
>>>
>>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>>>
>>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
>>> > wrote:
>>> > >
>>> > > The question is whether to repurpose the existing FlightData
>>> > > structure, and allow for the metadata field to be filled in and data
>>> > > fields to be blank (as a control message), or to wrap the FlightData
>>> > > structure in another structure that explicitly distinguishes between
>>> > > control and data messages.
>>> >
>>> > I'm not super against having metadata-only FlightData with empty body.
>>> > One question to consider is what changes (if any) would need to be
>>> > made to public APIs in either scenario.
>>> >
>>>
>>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>>> in
>>> the future. This would be a breaking change, but wouldn't change the
>>> wire
>>> format. I think the APIs could be changed backwards compatibly, though.
>>>
>>>
>>>
>>> > > The other question is how to handle the metadata fields. So far,
>>> > > we've
>>> > > used bytestring fields for application-defined data. This is
>>> > > workable
>>> > > if you want to use Protobuf to define the contents of those fields,
>>> > > but requires you to pack/unpack your Protobuf into/from the
>>> > > bytestring
>>> > > field. If we instead used the Protobuf Any field, a dynamically
>>> > > typed
>>> > > field, this would be more convenient, but then we'd be exposing
>>> > > Protobuf types. We could alternatively use a combination of a type
>>> > > field and a bytestring field, mimicking what the Protobuf Any type
>>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>>> > > of the language APIs, though.
>>> >
>>> > Leaving the deserialization of the app metadata to the particular
>>> > Flight implementation seems on first principles like the most flexible
>>> > thing, if Any is used, does that mean the metadata _must_ be a
>>> > protobuf?
>>> >
>>>
>>>
>>> If Any is used, we could still expose a bytes-based API, but it would
>>> have
>>> some more wrapping. (We could put a ByteString in Any.) Then the
>>> question
>>> would just be how to expose this (would be easier in Java, harder in
>>> C++).
>>>
>>>
>>>
>>> > > David
>>> > >
>>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>>> > > >
>>> > > > Can one of you explain what is being proposed in non-protobuf
>>> > > > terms?
>>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>>> > > >
>>> > > > Regards
>>> > > >
>>> > > > Antoine.
>>> > > >
>>> > > >
>>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>>> > > >> application-level logic. (The official guide doesn't even mention
>>> > > >> it
>>> > > >> in the encoding docs; I found
>>> > > >>
>>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > > >> as well.)
>>> > > >>
>>> > > >> If I follow you, Jacques, then you are proposing essentially
>>> > > >> inlining
>>> > > >> the definition of Any, e.g.
>>> > > >>
>>> > > >> message FlightMessage {
>>> > > >>   oneof message {
>>> > > >>     FlightData data = 1;
>>> > > >>     FlightAny metadata = 2;
>>> > > >>   }
>>> > > >> }
>>> > > >>
>>> > > >> message FlightAny {
>>> > > >>   string type = 1;
>>> > > >>   bytes data = 2;
>>> > > >> }
>>> > > >>
>>> > > >> Is this correct?
>>> > > >>
>>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>>> > > >> as
>>> > > >> well, but at that point, I'd rather we be consistent with all of
>>> > > >> them,
>>> > > >> rather than have one of the three methods do its own thing.
>>> > > >>
>>> > > >> Thanks,
>>> > > >> David
>>> > > >>
>>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>> I think we could probably expose the oneof behavior without
>>> > > >>> exposing
>>> > the
>>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
>>> > > >>> as
>>> > > >>> two
>>> > > >>> fields: type and data. Then users could use it for whatever but
>>> > > >>> if
>>> > > >>> people
>>> > > >>> wanted to treat it as any, it would work. (Basically a user
>>> > > >>> could
>>> > > >>> use
>>> > > >>> any
>>> > > >>> with it easily but they could also use any other mechanism). At
>>> > least in
>>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>>> > > >>> bindings
>>> > > >>> less diy?
>>> > > >>>
>>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
>>> > > >>> it
>>> > just
>>> > > >>> seemed a bit janky.
>>> > > >>>
>>> > > >>> Thinking about the control message/wrapper object thing, I
>>> > > >>> wonder
>>> > > >>> if
>>> > we
>>> > > >>> should redefine DoPut and DoGet to have the same property if we
>>> > think it
>>> > > >>> is
>>> > > >>> a good idea...
>>> > > >>>
>>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
>>> > wrote:
>>> > > >>>
>>> > > >>>> I was definitely considering having control messages without
>>> > > >>>> data,
>>> > and
>>> > > >>>> I thought that could be encoded by a FlightData with only
>>> > app_metadata
>>> > > >>>> set. I think I understand your position now: FlightData should
>>> > always
>>> > > >>>> carry (some) data (with optional metadata)?
>>> > > >>>>
>>> > > >>>> That makes sense to me, and is consistent with the
>>> > > >>>> documentation
>>> > > >>>> on
>>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>>> > > >>>> redundant metadata field, but oneof prevents that from
>>> > > >>>> happening,
>>> > and
>>> > > >>>> overall having a clear separation between data and control
>>> > > >>>> messages
>>> > is
>>> > > >>>> cleaner.
>>> > > >>>>
>>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>>> > > >>>> exposing
>>> > > >>>> Protobuf by using bytes, would we want to change that now?
>>> > > >>>>
>>> > > >>>> Best,
>>> > > >>>> David
>>> > > >>>>
>>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>>>> Hey David,
>>> > > >>>>>
>>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>>> > > >>>>> doget/doput
>>> > > >>>>> for
>>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>>> > pattern.
>>> > > >>>>>
>>> > > >>>>> On the comment around the FlightData, I think it is
>>> > > >>>>> overloading
>>> > > >>>>> the
>>> > > >>>> message
>>> > > >>>>> to use metadata for this. If I want to send a control message
>>> > > >>>> independently
>>> > > >>>>> of the data message, I would have to define something like an
>>> > > >>>>> empty
>>> > > >>>> flight
>>> > > >>>>> data message that has custom metadata. Why not support a
>>> > > >>>>> container
>>> > > >>>>> object
>>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>>> > > >>>>> more
>>> > data
>>> > > >>>>> as
>>> > > >>>>> desired. The default impl could be a noop for the Any
>>> > > >>>>> messages.
>>> > > >>>>>
>>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > > >>>>> <li...@gmail.com>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>>> Hi Jacques,
>>> > > >>>>>>
>>> > > >>>>>> Thanks for the comments.
>>> > > >>>>>>
>>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>>> > > >>>>>> proposals, so I don't think we need a new message to carry
>>> > > >>>>>> that
>>> > kind
>>> > > >>>>>> of information.
>>> > > >>>>>> - I like the suggestion of an async handler to handle
>>> > > >>>>>> incoming
>>> > > >>>>>> messages as the fundamental API; it would actually be quite
>>> > natural
>>> > > >>>>>> to
>>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>>> > > >>>>>> in
>>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>>> > gRPC-Java
>>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>>> > experimental
>>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>>> > > >>>>>> those
>>> > > >>>>>> are
>>> > > >>>>>> only in relatively recent gRPC versions and are still under
>>> > > >>>>>> development (contrary to the interceptor APIs which have been
>>> > around
>>> > > >>>>>> for quite a while).
>>> > > >>>>>>
>>> > > >>>>>> Thanks,
>>> > > >>>>>> David
>>> > > >>>>>>
>>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>>> > > >>>>>>> discussion
>>> > > >>>>>>> here
>>> > > >>>>>>> depending on your thoughts.
>>> > > >>>>>>>
>>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > > >>>>>>> <li...@gmail.com>
>>> > > >>>> wrote:
>>> > > >>>>>>>
>>> > > >>>>>>>> Hey Ryan,
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks for the comments.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>>> > strawman.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>>> > > >>>>>>>> interleave
>>> > > >>>> uploads
>>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>>> > > >>>>>>>> synchronous
>>> > APIs
>>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>>> > > >>>>>>>> for
>>> > each
>>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
>>> > > >>>>>>>> us
>>> > > >>>>>>>> the
>>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>>> > > >>>>>>>> timeout.)
>>> > > >>>>>>>> As
>>> > an
>>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > > >>>>>>>>
>>> > > >>>>>>>> This is mostly tangential though, eventually we will want
>>> > > >>>>>>>> to
>>> > design
>>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>>> > > >>>>>>>> stream
>>> > > >>>>>>>> like
>>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>>> > > >>>>>>>> run
>>> > into.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>>> > > >>>>>>>> the
>>> > main
>>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>>> > > >>>>>>>> calls
>>> > > >>>>>>>> could
>>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>>> > > >>>>>>>> some
>>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>>> > > >>>>>>>> the
>>> > two
>>> > > >>>>>>>> calls, but it then restarts or there is another instance
>>> > available,
>>> > > >>>>>>>> the client will happily reconnect to the new server without
>>> > > >>>>>>>> warning.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks,
>>> > > >>>>>>>> David
>>> > > >>>>>>>>
>>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>>> > > >>>>>>>>> Hey David,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> possibility
>>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>>> > > >>>>>>>>> help
>>> > me
>>> > > >>>>>> would
>>> > > >>>>>>>> be
>>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>>> > > >>>>>>>>> Also,
>>> > what
>>> > > >>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>>> > > >>>>>>>>> client
>>> > > >>>>>>>>> block
>>> > > >>>>>>>> till
>>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>>> > > >>>>>>>>> b)
>>> > or
>>> > > >>>>>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> later
>>> > > >>>>>>>>> how
>>> > > >>>>>>>>> is
>>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>>> > > >>>>>>>>> this
>>> > could
>>> > > >>>> be
>>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>>> > > >>>>>>>>> by a
>>> > > >>>>>>>>> function/util?
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Best,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > li.davidm96@gmail.com>
>>> > > >>>>>> wrote:
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>> Hi all,
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>>> > > >>>>>>>>>> have
>>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>>> > > >>>>>>>>>> both
>>> > > >>>>>>>>>> send
>>> > > >>>>>>>>>> and
>>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>>> > > >>>>>>>>>> end,
>>> > I've
>>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>
>>> > > >>>>
>>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>>> > document.
>>> > > >>>>>>>>>> I'd
>>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>>> > > >>>>>>>>>> straightforward
>>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>>> > > >>>>>>>>>> decided
>>> > > >>>>>>>>>> to
>>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
>>> > > >>>>>>>>>> as
>>> > > >>>>>>>>>> it
>>> > > >>>>>> doesn't
>>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>>> > > >>>>>>>>>> the
>>> > > >>>>>>>>>> semantics.
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Thanks,
>>> > > >>>>>>>>>> David
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> --
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>>> > > >>>>>>>>> our
>>> > > >>>>>>>>> community
>>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>
>>> > > >>>>>>
>>> > > >>>>>
>>> > > >>>>
>>> > > >>>
>>> > > >
>>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
I've been extremely backlogged, I will update the proposal when I get
a chance and reply here when done.

Best,
David

On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> Bumping this discussion since a couple of weeks have passed. It seems
> there are still some questions here, could we summarize what are the
> alternatives along with any public API implications so we can try to
> render a decision?
>
> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>>
>> Hi Wes,
>>
>> Responses inline:
>>
>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>>
>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
>> > >
>> > > The question is whether to repurpose the existing FlightData
>> > > structure, and allow for the metadata field to be filled in and data
>> > > fields to be blank (as a control message), or to wrap the FlightData
>> > > structure in another structure that explicitly distinguishes between
>> > > control and data messages.
>> >
>> > I'm not super against having metadata-only FlightData with empty body.
>> > One question to consider is what changes (if any) would need to be
>> > made to public APIs in either scenario.
>> >
>>
>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>> in
>> the future. This would be a breaking change, but wouldn't change the wire
>> format. I think the APIs could be changed backwards compatibly, though.
>>
>>
>>
>> > > The other question is how to handle the metadata fields. So far,
>> > > we've
>> > > used bytestring fields for application-defined data. This is workable
>> > > if you want to use Protobuf to define the contents of those fields,
>> > > but requires you to pack/unpack your Protobuf into/from the
>> > > bytestring
>> > > field. If we instead used the Protobuf Any field, a dynamically typed
>> > > field, this would be more convenient, but then we'd be exposing
>> > > Protobuf types. We could alternatively use a combination of a type
>> > > field and a bytestring field, mimicking what the Protobuf Any type
>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>> > > of the language APIs, though.
>> >
>> > Leaving the deserialization of the app metadata to the particular
>> > Flight implementation seems on first principles like the most flexible
>> > thing, if Any is used, does that mean the metadata _must_ be a
>> > protobuf?
>> >
>>
>>
>> If Any is used, we could still expose a bytes-based API, but it would
>> have
>> some more wrapping. (We could put a ByteString in Any.) Then the question
>> would just be how to expose this (would be easier in Java, harder in
>> C++).
>>
>>
>>
>> > > David
>> > >
>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > > >
>> > > > Can one of you explain what is being proposed in non-protobuf
>> > > > terms?
>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > > >
>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>> > > >> application-level logic. (The official guide doesn't even mention
>> > > >> it
>> > > >> in the encoding docs; I found
>> > > >>
>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > > >> as well.)
>> > > >>
>> > > >> If I follow you, Jacques, then you are proposing essentially
>> > > >> inlining
>> > > >> the definition of Any, e.g.
>> > > >>
>> > > >> message FlightMessage {
>> > > >>   oneof message {
>> > > >>     FlightData data = 1;
>> > > >>     FlightAny metadata = 2;
>> > > >>   }
>> > > >> }
>> > > >>
>> > > >> message FlightAny {
>> > > >>   string type = 1;
>> > > >>   bytes data = 2;
>> > > >> }
>> > > >>
>> > > >> Is this correct?
>> > > >>
>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>> > > >> as
>> > > >> well, but at that point, I'd rather we be consistent with all of
>> > > >> them,
>> > > >> rather than have one of the three methods do its own thing.
>> > > >>
>> > > >> Thanks,
>> > > >> David
>> > > >>
>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>> I think we could probably expose the oneof behavior without
>> > > >>> exposing
>> > the
>> > > >>> protobuf functions. On the any... hmm. I guess we could expose as
>> > > >>> two
>> > > >>> fields: type and data. Then users could use it for whatever but
>> > > >>> if
>> > > >>> people
>> > > >>> wanted to treat it as any, it would work. (Basically a user could
>> > > >>> use
>> > > >>> any
>> > > >>> with it easily but they could also use any other mechanism). At
>> > least in
>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>> > > >>> bindings
>> > > >>> less diy?
>> > > >>>
>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it
>> > just
>> > > >>> seemed a bit janky.
>> > > >>>
>> > > >>> Thinking about the control message/wrapper object thing, I wonder
>> > > >>> if
>> > we
>> > > >>> should redefine DoPut and DoGet to have the same property if we
>> > think it
>> > > >>> is
>> > > >>> a good idea...
>> > > >>>
>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
>> > wrote:
>> > > >>>
>> > > >>>> I was definitely considering having control messages without
>> > > >>>> data,
>> > and
>> > > >>>> I thought that could be encoded by a FlightData with only
>> > app_metadata
>> > > >>>> set. I think I understand your position now: FlightData should
>> > always
>> > > >>>> carry (some) data (with optional metadata)?
>> > > >>>>
>> > > >>>> That makes sense to me, and is consistent with the documentation
>> > > >>>> on
>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>> > > >>>> redundant metadata field, but oneof prevents that from
>> > > >>>> happening,
>> > and
>> > > >>>> overall having a clear separation between data and control
>> > > >>>> messages
>> > is
>> > > >>>> cleaner.
>> > > >>>>
>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> > > >>>> exposing
>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> > > >>>>
>> > > >>>> Best,
>> > > >>>> David
>> > > >>>>
>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>>>> Hey David,
>> > > >>>>>
>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> > > >>>>> doget/doput
>> > > >>>>> for
>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>> > pattern.
>> > > >>>>>
>> > > >>>>> On the comment around the FlightData, I think it is overloading
>> > > >>>>> the
>> > > >>>> message
>> > > >>>>> to use metadata for this. If I want to send a control message
>> > > >>>> independently
>> > > >>>>> of the data message, I would have to define something like an
>> > > >>>>> empty
>> > > >>>> flight
>> > > >>>>> data message that has custom metadata. Why not support a
>> > > >>>>> container
>> > > >>>>> object
>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>> > > >>>>> more
>> > data
>> > > >>>>> as
>> > > >>>>> desired. The default impl could be a noop for the Any messages.
>> > > >>>>>
>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > > >>>>> <li...@gmail.com>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hi Jacques,
>> > > >>>>>>
>> > > >>>>>> Thanks for the comments.
>> > > >>>>>>
>> > > >>>>>> - I do agree DoExchange is a better name!
>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> > > >>>>>> that
>> > kind
>> > > >>>>>> of information.
>> > > >>>>>> - I like the suggestion of an async handler to handle incoming
>> > > >>>>>> messages as the fundamental API; it would actually be quite
>> > natural
>> > > >>>>>> to
>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>> > > >>>>>> in
>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> > gRPC-Java
>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> > experimental
>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>> > > >>>>>> those
>> > > >>>>>> are
>> > > >>>>>> only in relatively recent gRPC versions and are still under
>> > > >>>>>> development (contrary to the interceptor APIs which have been
>> > around
>> > > >>>>>> for quite a while).
>> > > >>>>>>
>> > > >>>>>> Thanks,
>> > > >>>>>> David
>> > > >>>>>>
>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> > > >>>>>>> discussion
>> > > >>>>>>> here
>> > > >>>>>>> depending on your thoughts.
>> > > >>>>>>>
>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > > >>>>>>> <li...@gmail.com>
>> > > >>>> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> Hey Ryan,
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks for the comments.
>> > > >>>>>>>>
>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>> > strawman.
>> > > >>>>>>>>
>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> > > >>>>>>>> interleave
>> > > >>>> uploads
>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> > > >>>>>>>> synchronous
>> > APIs
>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>> > > >>>>>>>> for
>> > each
>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us
>> > > >>>>>>>> the
>> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.)
>> > > >>>>>>>> As
>> > an
>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> > > >>>>>>>>
>> > > >>>>>>>> This is mostly tangential though, eventually we will want to
>> > design
>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> > > >>>>>>>> stream
>> > > >>>>>>>> like
>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>> > > >>>>>>>> run
>> > into.
>> > > >>>>>>>>
>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
>> > main
>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>> > > >>>>>>>> calls
>> > > >>>>>>>> could
>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>> > > >>>>>>>> some
>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>> > > >>>>>>>> the
>> > two
>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> > available,
>> > > >>>>>>>> the client will happily reconnect to the new server without
>> > > >>>>>>>> warning.
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks,
>> > > >>>>>>>> David
>> > > >>>>>>>>
>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> > > >>>>>>>>> Hey David,
>> > > >>>>>>>>>
>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>> > > >>>>>>>>> the
>> > > >>>>>>>>> possibility
>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>> > > >>>>>>>>> help
>> > me
>> > > >>>>>> would
>> > > >>>>>>>> be
>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> > > >>>>>>>>> Also,
>> > what
>> > > >>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>> > > >>>>>>>>> client
>> > > >>>>>>>>> block
>> > > >>>>>>>> till
>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>> > > >>>>>>>>> b)
>> > or
>> > > >>>>>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>> > > >>>>>>>>> the
>> > > >>>>>>>>> later
>> > > >>>>>>>>> how
>> > > >>>>>>>>> is
>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
>> > could
>> > > >>>> be
>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>> > > >>>>>>>>> by a
>> > > >>>>>>>>> function/util?
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> Best,
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > li.davidm96@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> Hi all,
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>> > > >>>>>>>>>> have
>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>> > > >>>>>>>>>> both
>> > > >>>>>>>>>> send
>> > > >>>>>>>>>> and
>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
>> > I've
>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>
>> > > >>>>
>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> > document.
>> > > >>>>>>>>>> I'd
>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> > > >>>>>>>>>> straightforward
>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> > > >>>>>>>>>> decided
>> > > >>>>>>>>>> to
>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as
>> > > >>>>>>>>>> it
>> > > >>>>>> doesn't
>> > > >>>>>>>>>> functionally change Flight, just the interpretation of the
>> > > >>>>>>>>>> semantics.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Thanks,
>> > > >>>>>>>>>> David
>> > > >>>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> --
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > > >>>>>>>>>
>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > > >>>>>>>>>
>> > > >>>>>>>>> <https://www.dremio.com/>
>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>> > > >>>>>>>>> our
>> > > >>>>>>>>> community
>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > > >>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >
>> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
Bumping this discussion since a couple of weeks have passed. It seems
there are still some questions here, could we summarize what are the
alternatives along with any public API implications so we can try to
render a decision?

On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>
> Hi Wes,
>
> Responses inline:
>
> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>
> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
> > >
> > > The question is whether to repurpose the existing FlightData
> > > structure, and allow for the metadata field to be filled in and data
> > > fields to be blank (as a control message), or to wrap the FlightData
> > > structure in another structure that explicitly distinguishes between
> > > control and data messages.
> >
> > I'm not super against having metadata-only FlightData with empty body.
> > One question to consider is what changes (if any) would need to be
> > made to public APIs in either scenario.
> >
>
> We could leave DoGet/DoPut as-is for now, and allow empty data messages in
> the future. This would be a breaking change, but wouldn't change the wire
> format. I think the APIs could be changed backwards compatibly, though.
>
>
>
> > > The other question is how to handle the metadata fields. So far, we've
> > > used bytestring fields for application-defined data. This is workable
> > > if you want to use Protobuf to define the contents of those fields,
> > > but requires you to pack/unpack your Protobuf into/from the bytestring
> > > field. If we instead used the Protobuf Any field, a dynamically typed
> > > field, this would be more convenient, but then we'd be exposing
> > > Protobuf types. We could alternatively use a combination of a type
> > > field and a bytestring field, mimicking what the Protobuf Any type
> > > looks like on the wire. I'm not sure this is actually cleaner in any
> > > of the language APIs, though.
> >
> > Leaving the deserialization of the app metadata to the particular
> > Flight implementation seems on first principles like the most flexible
> > thing, if Any is used, does that mean the metadata _must_ be a
> > protobuf?
> >
>
>
> If Any is used, we could still expose a bytes-based API, but it would have
> some more wrapping. (We could put a ByteString in Any.) Then the question
> would just be how to expose this (would be easier in Java, harder in C++).
>
>
>
> > > David
> > >
> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > > >
> > > > Can one of you explain what is being proposed in non-protobuf terms?
> > > > Knowledge of protobuf shouldn't be required to use Flight.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > > >> Oneof doesn't actually change the wire encoding; it would just be
> > > >> application-level logic. (The official guide doesn't even mention it
> > > >> in the encoding docs; I found
> > > >>
> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > > >> as well.)
> > > >>
> > > >> If I follow you, Jacques, then you are proposing essentially inlining
> > > >> the definition of Any, e.g.
> > > >>
> > > >> message FlightMessage {
> > > >>   oneof message {
> > > >>     FlightData data = 1;
> > > >>     FlightAny metadata = 2;
> > > >>   }
> > > >> }
> > > >>
> > > >> message FlightAny {
> > > >>   string type = 1;
> > > >>   bytes data = 2;
> > > >> }
> > > >>
> > > >> Is this correct?
> > > >>
> > > >> It might be nice to consider the wrapper message for DoGet/DoPut as
> > > >> well, but at that point, I'd rather we be consistent with all of them,
> > > >> rather than have one of the three methods do its own thing.
> > > >>
> > > >> Thanks,
> > > >> David
> > > >>
> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>> I think we could probably expose the oneof behavior without exposing
> > the
> > > >>> protobuf functions. On the any... hmm. I guess we could expose as two
> > > >>> fields: type and data. Then users could use it for whatever but if
> > > >>> people
> > > >>> wanted to treat it as any, it would work. (Basically a user could use
> > > >>> any
> > > >>> with it easily but they could also use any other mechanism). At
> > least in
> > > >>> java, the any concepts are pretty simple/diy. Are other language
> > > >>> bindings
> > > >>> less diy?
> > > >>>
> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it
> > just
> > > >>> seemed a bit janky.
> > > >>>
> > > >>> Thinking about the control message/wrapper object thing, I wonder if
> > we
> > > >>> should redefine DoPut and DoGet to have the same property if we
> > think it
> > > >>> is
> > > >>> a good idea...
> > > >>>
> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
> > wrote:
> > > >>>
> > > >>>> I was definitely considering having control messages without data,
> > and
> > > >>>> I thought that could be encoded by a FlightData with only
> > app_metadata
> > > >>>> set. I think I understand your position now: FlightData should
> > always
> > > >>>> carry (some) data (with optional metadata)?
> > > >>>>
> > > >>>> That makes sense to me, and is consistent with the documentation on
> > > >>>> FlightData in the Protobuf file. I was worried about having a
> > > >>>> redundant metadata field, but oneof prevents that from happening,
> > and
> > > >>>> overall having a clear separation between data and control messages
> > is
> > > >>>> cleaner.
> > > >>>>
> > > >>>> As for using Protobuf's Any: so far, we've refrained from exposing
> > > >>>> Protobuf by using bytes, would we want to change that now?
> > > >>>>
> > > >>>> Best,
> > > >>>> David
> > > >>>>
> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>>>> Hey David,
> > > >>>>>
> > > >>>>> RE: Async: I was trying to match the pattern we use for doget/doput
> > > >>>>> for
> > > >>>>> async. Yes, more thinking java given java grpc's async always
> > pattern.
> > > >>>>>
> > > >>>>> On the comment around the FlightData, I think it is overloading the
> > > >>>> message
> > > >>>>> to use metadata for this. If I want to send a control message
> > > >>>> independently
> > > >>>>> of the data message, I would have to define something like an empty
> > > >>>> flight
> > > >>>>> data message that has custom metadata. Why not support a container
> > > >>>>> object
> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add more
> > data
> > > >>>>> as
> > > >>>>> desired. The default impl could be a noop for the Any messages.
> > > >>>>>
> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Jacques,
> > > >>>>>>
> > > >>>>>> Thanks for the comments.
> > > >>>>>>
> > > >>>>>> - I do agree DoExchange is a better name!
> > > >>>>>> - FlightData already has metadata fields as a result of prior
> > > >>>>>> proposals, so I don't think we need a new message to carry that
> > kind
> > > >>>>>> of information.
> > > >>>>>> - I like the suggestion of an async handler to handle incoming
> > > >>>>>> messages as the fundamental API; it would actually be quite
> > natural
> > > >>>>>> to
> > > >>>>>> implement in Flight/Java. I will note that it's not possible in
> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
> > gRPC-Java
> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> > experimental
> > > >>>>>> C++ APIs that would let us do something similar to Java, but those
> > > >>>>>> are
> > > >>>>>> only in relatively recent gRPC versions and are still under
> > > >>>>>> development (contrary to the interceptor APIs which have been
> > around
> > > >>>>>> for quite a while).
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> David
> > > >>>>>>
> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>>>>>> I like it. Added some comments to the doc. Might worth discussion
> > > >>>>>>> here
> > > >>>>>>> depending on your thoughts.
> > > >>>>>>>
> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey Ryan,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for the comments.
> > > >>>>>>>>
> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> > strawman.
> > > >>>>>>>>
> > > >>>>>>>> Sync vs async: while I don't touch on it, you could interleave
> > > >>>> uploads
> > > >>>>>>>> and downloads if you were so inclined. Right now, synchronous
> > APIs
> > > >>>>>>>> make this error-prone, e.g. if both client and server wait for
> > each
> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As
> > an
> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > > >>>>>>>>
> > > >>>>>>>> This is mostly tangential though, eventually we will want to
> > design
> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
> > > >>>>>>>> like
> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run
> > into.
> > > >>>>>>>>
> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
> > main
> > > >>>>>>>> concern is that depending on how you deploy, two separate calls
> > > >>>>>>>> could
> > > >>>>>>>> get routed to different instances. Additionally, gRPC has some
> > > >>>>>>>> reconnection behaviors; if the server goes away in between the
> > two
> > > >>>>>>>> calls, but it then restarts or there is another instance
> > available,
> > > >>>>>>>> the client will happily reconnect to the new server without
> > > >>>>>>>> warning.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> David
> > > >>>>>>>>
> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > > >>>>>>>>> Hey David,
> > > >>>>>>>>>
> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and the
> > > >>>>>>>>> possibility
> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would help
> > me
> > > >>>>>> would
> > > >>>>>>>> be
> > > >>>>>>>>> a concrete example of the API in a real life use case. Also,
> > what
> > > >>>>>> would
> > > >>>>>>>> the
> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the client
> > > >>>>>>>>> block
> > > >>>>>>>> till
> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b)
> > or
> > > >>>>>>>>> would
> > > >>>>>>>> the
> > > >>>>>>>>> client wait to be signaled that computation was done. If the
> > > >>>>>>>>> later
> > > >>>>>>>>> how
> > > >>>>>>>>> is
> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
> > could
> > > >>>> be
> > > >>>>>>>>> implemented without extending the RPC interface but rather by a
> > > >>>>>>>>> function/util?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>>
> > > >>>>>>>>> Ryan
> > > >>>>>>>>>
> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > li.davidm96@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi all,
> > > >>>>>>>>>>
> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we have
> > > >>>>>>>>>> identified a new use case on the horizon: being able to both
> > > >>>>>>>>>> send
> > > >>>>>>>>>> and
> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
> > I've
> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>
> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > > >>>>>>>>>>
> > > >>>>>>>>>> Please let me know if you can't view or comment on the
> > document.
> > > >>>>>>>>>> I'd
> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > > >>>>>>>>>> straightforward
> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is a format change and would require a vote. I've decided
> > > >>>>>>>>>> to
> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as it
> > > >>>>>> doesn't
> > > >>>>>>>>>> functionally change Flight, just the interpretation of the
> > > >>>>>>>>>> semantics.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>> David
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>>
> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > > >>>>>>>>>
> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > > >>>>>>>>>
> > > >>>>>>>>> <https://www.dremio.com/>
> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
> > > >>>>>>>>> community
> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> > > >>>>>>>>> <https://www.dremio.com/download>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hi Wes,

Responses inline:

On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:

> On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
> >
> > The question is whether to repurpose the existing FlightData
> > structure, and allow for the metadata field to be filled in and data
> > fields to be blank (as a control message), or to wrap the FlightData
> > structure in another structure that explicitly distinguishes between
> > control and data messages.
>
> I'm not super against having metadata-only FlightData with empty body.
> One question to consider is what changes (if any) would need to be
> made to public APIs in either scenario.
>

We could leave DoGet/DoPut as-is for now, and allow empty data messages in
the future. This would be a breaking change, but wouldn't change the wire
format. I think the APIs could be changed backwards compatibly, though.



> > The other question is how to handle the metadata fields. So far, we've
> > used bytestring fields for application-defined data. This is workable
> > if you want to use Protobuf to define the contents of those fields,
> > but requires you to pack/unpack your Protobuf into/from the bytestring
> > field. If we instead used the Protobuf Any field, a dynamically typed
> > field, this would be more convenient, but then we'd be exposing
> > Protobuf types. We could alternatively use a combination of a type
> > field and a bytestring field, mimicking what the Protobuf Any type
> > looks like on the wire. I'm not sure this is actually cleaner in any
> > of the language APIs, though.
>
> Leaving the deserialization of the app metadata to the particular
> Flight implementation seems on first principles like the most flexible
> thing, if Any is used, does that mean the metadata _must_ be a
> protobuf?
>


If Any is used, we could still expose a bytes-based API, but it would have
some more wrapping. (We could put a ByteString in Any.) Then the question
would just be how to expose this (would be easier in Java, harder in C++).



> > David
> >
> > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > >
> > > Can one of you explain what is being proposed in non-protobuf terms?
> > > Knowledge of protobuf shouldn't be required to use Flight.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> Oneof doesn't actually change the wire encoding; it would just be
> > >> application-level logic. (The official guide doesn't even mention it
> > >> in the encoding docs; I found
> > >>
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> as well.)
> > >>
> > >> If I follow you, Jacques, then you are proposing essentially inlining
> > >> the definition of Any, e.g.
> > >>
> > >> message FlightMessage {
> > >>   oneof message {
> > >>     FlightData data = 1;
> > >>     FlightAny metadata = 2;
> > >>   }
> > >> }
> > >>
> > >> message FlightAny {
> > >>   string type = 1;
> > >>   bytes data = 2;
> > >> }
> > >>
> > >> Is this correct?
> > >>
> > >> It might be nice to consider the wrapper message for DoGet/DoPut as
> > >> well, but at that point, I'd rather we be consistent with all of them,
> > >> rather than have one of the three methods do its own thing.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >>> I think we could probably expose the oneof behavior without exposing
> the
> > >>> protobuf functions. On the any... hmm. I guess we could expose as two
> > >>> fields: type and data. Then users could use it for whatever but if
> > >>> people
> > >>> wanted to treat it as any, it would work. (Basically a user could use
> > >>> any
> > >>> with it easily but they could also use any other mechanism). At
> least in
> > >>> java, the any concepts are pretty simple/diy. Are other language
> > >>> bindings
> > >>> less diy?
> > >>>
> > >>> I'm *not* hardcore against the empty FlightData + metadata but it
> just
> > >>> seemed a bit janky.
> > >>>
> > >>> Thinking about the control message/wrapper object thing, I wonder if
> we
> > >>> should redefine DoPut and DoGet to have the same property if we
> think it
> > >>> is
> > >>> a good idea...
> > >>>
> > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
> wrote:
> > >>>
> > >>>> I was definitely considering having control messages without data,
> and
> > >>>> I thought that could be encoded by a FlightData with only
> app_metadata
> > >>>> set. I think I understand your position now: FlightData should
> always
> > >>>> carry (some) data (with optional metadata)?
> > >>>>
> > >>>> That makes sense to me, and is consistent with the documentation on
> > >>>> FlightData in the Protobuf file. I was worried about having a
> > >>>> redundant metadata field, but oneof prevents that from happening,
> and
> > >>>> overall having a clear separation between data and control messages
> is
> > >>>> cleaner.
> > >>>>
> > >>>> As for using Protobuf's Any: so far, we've refrained from exposing
> > >>>> Protobuf by using bytes, would we want to change that now?
> > >>>>
> > >>>> Best,
> > >>>> David
> > >>>>
> > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >>>>> Hey David,
> > >>>>>
> > >>>>> RE: Async: I was trying to match the pattern we use for doget/doput
> > >>>>> for
> > >>>>> async. Yes, more thinking java given java grpc's async always
> pattern.
> > >>>>>
> > >>>>> On the comment around the FlightData, I think it is overloading the
> > >>>> message
> > >>>>> to use metadata for this. If I want to send a control message
> > >>>> independently
> > >>>>> of the data message, I would have to define something like an empty
> > >>>> flight
> > >>>>> data message that has custom metadata. Why not support a container
> > >>>>> object
> > >>>>> with a oneof{FlightData, Any} in it instead so users can add more
> data
> > >>>>> as
> > >>>>> desired. The default impl could be a noop for the Any messages.
> > >>>>>
> > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Jacques,
> > >>>>>>
> > >>>>>> Thanks for the comments.
> > >>>>>>
> > >>>>>> - I do agree DoExchange is a better name!
> > >>>>>> - FlightData already has metadata fields as a result of prior
> > >>>>>> proposals, so I don't think we need a new message to carry that
> kind
> > >>>>>> of information.
> > >>>>>> - I like the suggestion of an async handler to handle incoming
> > >>>>>> messages as the fundamental API; it would actually be quite
> natural
> > >>>>>> to
> > >>>>>> implement in Flight/Java. I will note that it's not possible in
> > >>>>>> C++/Python without spawning a thread, though. (In essence,
> gRPC-Java
> > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> experimental
> > >>>>>> C++ APIs that would let us do something similar to Java, but those
> > >>>>>> are
> > >>>>>> only in relatively recent gRPC versions and are still under
> > >>>>>> development (contrary to the interceptor APIs which have been
> around
> > >>>>>> for quite a while).
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> David
> > >>>>>>
> > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >>>>>>> I like it. Added some comments to the doc. Might worth discussion
> > >>>>>>> here
> > >>>>>>> depending on your thoughts.
> > >>>>>>>
> > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hey Ryan,
> > >>>>>>>>
> > >>>>>>>> Thanks for the comments.
> > >>>>>>>>
> > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> strawman.
> > >>>>>>>>
> > >>>>>>>> Sync vs async: while I don't touch on it, you could interleave
> > >>>> uploads
> > >>>>>>>> and downloads if you were so inclined. Right now, synchronous
> APIs
> > >>>>>>>> make this error-prone, e.g. if both client and server wait for
> each
> > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
> > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As
> an
> > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > >>>>>>>>
> > >>>>>>>> This is mostly tangential though, eventually we will want to
> design
> > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
> > >>>>>>>> like
> > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run
> into.
> > >>>>>>>>
> > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
> main
> > >>>>>>>> concern is that depending on how you deploy, two separate calls
> > >>>>>>>> could
> > >>>>>>>> get routed to different instances. Additionally, gRPC has some
> > >>>>>>>> reconnection behaviors; if the server goes away in between the
> two
> > >>>>>>>> calls, but it then restarts or there is another instance
> available,
> > >>>>>>>> the client will happily reconnect to the new server without
> > >>>>>>>> warning.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> David
> > >>>>>>>>
> > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > >>>>>>>>> Hey David,
> > >>>>>>>>>
> > >>>>>>>>> I think this proposal makes a lot of sense. I like it and the
> > >>>>>>>>> possibility
> > >>>>>>>>> of remote compute via arrow buffers. One thing that would help
> me
> > >>>>>> would
> > >>>>>>>> be
> > >>>>>>>>> a concrete example of the API in a real life use case. Also,
> what
> > >>>>>> would
> > >>>>>>>> the
> > >>>>>>>>> client experience be in terms of sync vs asyc? Would the client
> > >>>>>>>>> block
> > >>>>>>>> till
> > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b)
> or
> > >>>>>>>>> would
> > >>>>>>>> the
> > >>>>>>>>> client wait to be signaled that computation was done. If the
> > >>>>>>>>> later
> > >>>>>>>>> how
> > >>>>>>>>> is
> > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
> could
> > >>>> be
> > >>>>>>>>> implemented without extending the RPC interface but rather by a
> > >>>>>>>>> function/util?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Ryan
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> li.davidm96@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi all,
> > >>>>>>>>>>
> > >>>>>>>>>> We've been using Flight quite successfully so far, but we have
> > >>>>>>>>>> identified a new use case on the horizon: being able to both
> > >>>>>>>>>> send
> > >>>>>>>>>> and
> > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
> I've
> > >>>>>>>>>> written up a proposal for a new RPC method:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >>>>>>>>>>
> > >>>>>>>>>> Please let me know if you can't view or comment on the
> document.
> > >>>>>>>>>> I'd
> > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > >>>>>>>>>> straightforward
> > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >>>>>>>>>>
> > >>>>>>>>>> This is a format change and would require a vote. I've decided
> > >>>>>>>>>> to
> > >>>>>>>>>> table the other format change I had proposed (on DoPut), as it
> > >>>>>> doesn't
> > >>>>>>>>>> functionally change Flight, just the interpretation of the
> > >>>>>>>>>> semantics.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> David
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >>>>>>>>>
> > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > >>>>>>>>>
> > >>>>>>>>> <https://www.dremio.com/>
> > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
> > >>>>>>>>> community
> > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> > >>>>>>>>> <https://www.dremio.com/download>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
>
> The question is whether to repurpose the existing FlightData
> structure, and allow for the metadata field to be filled in and data
> fields to be blank (as a control message), or to wrap the FlightData
> structure in another structure that explicitly distinguishes between
> control and data messages.

I'm not super against having metadata-only FlightData with empty body.
One question to consider is what changes (if any) would need to be
made to public APIs in either scenario.

> The other question is how to handle the metadata fields. So far, we've
> used bytestring fields for application-defined data. This is workable
> if you want to use Protobuf to define the contents of those fields,
> but requires you to pack/unpack your Protobuf into/from the bytestring
> field. If we instead used the Protobuf Any field, a dynamically typed
> field, this would be more convenient, but then we'd be exposing
> Protobuf types. We could alternatively use a combination of a type
> field and a bytestring field, mimicking what the Protobuf Any type
> looks like on the wire. I'm not sure this is actually cleaner in any
> of the language APIs, though.

Leaving the deserialization of the app metadata to the particular
Flight implementation seems on first principles like the most flexible
thing, if Any is used, does that mean the metadata _must_ be a
protobuf?

> David
>
> On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >
> > Can one of you explain what is being proposed in non-protobuf terms?
> > Knowledge of protobuf shouldn't be required to use Flight.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 21/10/2019 à 15:46, David Li a écrit :
> >> Oneof doesn't actually change the wire encoding; it would just be
> >> application-level logic. (The official guide doesn't even mention it
> >> in the encoding docs; I found
> >> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> as well.)
> >>
> >> If I follow you, Jacques, then you are proposing essentially inlining
> >> the definition of Any, e.g.
> >>
> >> message FlightMessage {
> >>   oneof message {
> >>     FlightData data = 1;
> >>     FlightAny metadata = 2;
> >>   }
> >> }
> >>
> >> message FlightAny {
> >>   string type = 1;
> >>   bytes data = 2;
> >> }
> >>
> >> Is this correct?
> >>
> >> It might be nice to consider the wrapper message for DoGet/DoPut as
> >> well, but at that point, I'd rather we be consistent with all of them,
> >> rather than have one of the three methods do its own thing.
> >>
> >> Thanks,
> >> David
> >>
> >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> I think we could probably expose the oneof behavior without exposing the
> >>> protobuf functions. On the any... hmm. I guess we could expose as two
> >>> fields: type and data. Then users could use it for whatever but if
> >>> people
> >>> wanted to treat it as any, it would work. (Basically a user could use
> >>> any
> >>> with it easily but they could also use any other mechanism). At least in
> >>> java, the any concepts are pretty simple/diy. Are other language
> >>> bindings
> >>> less diy?
> >>>
> >>> I'm *not* hardcore against the empty FlightData + metadata but it just
> >>> seemed a bit janky.
> >>>
> >>> Thinking about the control message/wrapper object thing, I wonder if we
> >>> should redefine DoPut and DoGet to have the same property if we think it
> >>> is
> >>> a good idea...
> >>>
> >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com> wrote:
> >>>
> >>>> I was definitely considering having control messages without data, and
> >>>> I thought that could be encoded by a FlightData with only app_metadata
> >>>> set. I think I understand your position now: FlightData should always
> >>>> carry (some) data (with optional metadata)?
> >>>>
> >>>> That makes sense to me, and is consistent with the documentation on
> >>>> FlightData in the Protobuf file. I was worried about having a
> >>>> redundant metadata field, but oneof prevents that from happening, and
> >>>> overall having a clear separation between data and control messages is
> >>>> cleaner.
> >>>>
> >>>> As for using Protobuf's Any: so far, we've refrained from exposing
> >>>> Protobuf by using bytes, would we want to change that now?
> >>>>
> >>>> Best,
> >>>> David
> >>>>
> >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>>>> Hey David,
> >>>>>
> >>>>> RE: Async: I was trying to match the pattern we use for doget/doput
> >>>>> for
> >>>>> async. Yes, more thinking java given java grpc's async always pattern.
> >>>>>
> >>>>> On the comment around the FlightData, I think it is overloading the
> >>>> message
> >>>>> to use metadata for this. If I want to send a control message
> >>>> independently
> >>>>> of the data message, I would have to define something like an empty
> >>>> flight
> >>>>> data message that has custom metadata. Why not support a container
> >>>>> object
> >>>>> with a oneof{FlightData, Any} in it instead so users can add more data
> >>>>> as
> >>>>> desired. The default impl could be a noop for the Any messages.
> >>>>>
> >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Jacques,
> >>>>>>
> >>>>>> Thanks for the comments.
> >>>>>>
> >>>>>> - I do agree DoExchange is a better name!
> >>>>>> - FlightData already has metadata fields as a result of prior
> >>>>>> proposals, so I don't think we need a new message to carry that kind
> >>>>>> of information.
> >>>>>> - I like the suggestion of an async handler to handle incoming
> >>>>>> messages as the fundamental API; it would actually be quite natural
> >>>>>> to
> >>>>>> implement in Flight/Java. I will note that it's not possible in
> >>>>>> C++/Python without spawning a thread, though. (In essence, gRPC-Java
> >>>>>> is async-always and gRPC-C++ is sync-always.) There are experimental
> >>>>>> C++ APIs that would let us do something similar to Java, but those
> >>>>>> are
> >>>>>> only in relatively recent gRPC versions and are still under
> >>>>>> development (contrary to the interceptor APIs which have been around
> >>>>>> for quite a while).
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>>>>>> I like it. Added some comments to the doc. Might worth discussion
> >>>>>>> here
> >>>>>>> depending on your thoughts.
> >>>>>>>
> >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hey Ryan,
> >>>>>>>>
> >>>>>>>> Thanks for the comments.
> >>>>>>>>
> >>>>>>>> Concrete example: I've edited the doc to provide a Python strawman.
> >>>>>>>>
> >>>>>>>> Sync vs async: while I don't touch on it, you could interleave
> >>>> uploads
> >>>>>>>> and downloads if you were so inclined. Right now, synchronous APIs
> >>>>>>>> make this error-prone, e.g. if both client and server wait for each
> >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
> >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As an
> >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >>>>>>>>
> >>>>>>>> This is mostly tangential though, eventually we will want to design
> >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
> >>>>>>>> like
> >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run into.
> >>>>>>>>
> >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the main
> >>>>>>>> concern is that depending on how you deploy, two separate calls
> >>>>>>>> could
> >>>>>>>> get routed to different instances. Additionally, gRPC has some
> >>>>>>>> reconnection behaviors; if the server goes away in between the two
> >>>>>>>> calls, but it then restarts or there is another instance available,
> >>>>>>>> the client will happily reconnect to the new server without
> >>>>>>>> warning.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> David
> >>>>>>>>
> >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >>>>>>>>> Hey David,
> >>>>>>>>>
> >>>>>>>>> I think this proposal makes a lot of sense. I like it and the
> >>>>>>>>> possibility
> >>>>>>>>> of remote compute via arrow buffers. One thing that would help me
> >>>>>> would
> >>>>>>>> be
> >>>>>>>>> a concrete example of the API in a real life use case. Also, what
> >>>>>> would
> >>>>>>>> the
> >>>>>>>>> client experience be in terms of sync vs asyc? Would the client
> >>>>>>>>> block
> >>>>>>>> till
> >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b) or
> >>>>>>>>> would
> >>>>>>>> the
> >>>>>>>>> client wait to be signaled that computation was done. If the
> >>>>>>>>> later
> >>>>>>>>> how
> >>>>>>>>> is
> >>>>>>>>> that different from a DoPut then DoGet? I suppose that this could
> >>>> be
> >>>>>>>>> implemented without extending the RPC interface but rather by a
> >>>>>>>>> function/util?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Ryan
> >>>>>>>>>
> >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> We've been using Flight quite successfully so far, but we have
> >>>>>>>>>> identified a new use case on the horizon: being able to both
> >>>>>>>>>> send
> >>>>>>>>>> and
> >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, I've
> >>>>>>>>>> written up a proposal for a new RPC method:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>>>>>>>>>
> >>>>>>>>>> Please let me know if you can't view or comment on the document.
> >>>>>>>>>> I'd
> >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >>>>>>>>>> straightforward
> >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >>>>>>>>>>
> >>>>>>>>>> This is a format change and would require a vote. I've decided
> >>>>>>>>>> to
> >>>>>>>>>> table the other format change I had proposed (on DoPut), as it
> >>>>>> doesn't
> >>>>>>>>>> functionally change Flight, just the interpretation of the
> >>>>>>>>>> semantics.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> David
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>>>>>>>>
> >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>>>>>>>>
> >>>>>>>>> <https://www.dremio.com/>
> >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
> >>>>>>>>> community
> >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> >>>>>>>>> <https://www.dremio.com/download>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
The question is whether to repurpose the existing FlightData
structure, and allow for the metadata field to be filled in and data
fields to be blank (as a control message), or to wrap the FlightData
structure in another structure that explicitly distinguishes between
control and data messages.

The other question is how to handle the metadata fields. So far, we've
used bytestring fields for application-defined data. This is workable
if you want to use Protobuf to define the contents of those fields,
but requires you to pack/unpack your Protobuf into/from the bytestring
field. If we instead used the Protobuf Any field, a dynamically typed
field, this would be more convenient, but then we'd be exposing
Protobuf types. We could alternatively use a combination of a type
field and a bytestring field, mimicking what the Protobuf Any type
looks like on the wire. I'm not sure this is actually cleaner in any
of the language APIs, though.

David

On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>
> Can one of you explain what is being proposed in non-protobuf terms?
> Knowledge of protobuf shouldn't be required to use Flight.
>
> Regards
>
> Antoine.
>
>
> Le 21/10/2019 à 15:46, David Li a écrit :
>> Oneof doesn't actually change the wire encoding; it would just be
>> application-level logic. (The official guide doesn't even mention it
>> in the encoding docs; I found
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> as well.)
>>
>> If I follow you, Jacques, then you are proposing essentially inlining
>> the definition of Any, e.g.
>>
>> message FlightMessage {
>>   oneof message {
>>     FlightData data = 1;
>>     FlightAny metadata = 2;
>>   }
>> }
>>
>> message FlightAny {
>>   string type = 1;
>>   bytes data = 2;
>> }
>>
>> Is this correct?
>>
>> It might be nice to consider the wrapper message for DoGet/DoPut as
>> well, but at that point, I'd rather we be consistent with all of them,
>> rather than have one of the three methods do its own thing.
>>
>> Thanks,
>> David
>>
>> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> I think we could probably expose the oneof behavior without exposing the
>>> protobuf functions. On the any... hmm. I guess we could expose as two
>>> fields: type and data. Then users could use it for whatever but if
>>> people
>>> wanted to treat it as any, it would work. (Basically a user could use
>>> any
>>> with it easily but they could also use any other mechanism). At least in
>>> java, the any concepts are pretty simple/diy. Are other language
>>> bindings
>>> less diy?
>>>
>>> I'm *not* hardcore against the empty FlightData + metadata but it just
>>> seemed a bit janky.
>>>
>>> Thinking about the control message/wrapper object thing, I wonder if we
>>> should redefine DoPut and DoGet to have the same property if we think it
>>> is
>>> a good idea...
>>>
>>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com> wrote:
>>>
>>>> I was definitely considering having control messages without data, and
>>>> I thought that could be encoded by a FlightData with only app_metadata
>>>> set. I think I understand your position now: FlightData should always
>>>> carry (some) data (with optional metadata)?
>>>>
>>>> That makes sense to me, and is consistent with the documentation on
>>>> FlightData in the Protobuf file. I was worried about having a
>>>> redundant metadata field, but oneof prevents that from happening, and
>>>> overall having a clear separation between data and control messages is
>>>> cleaner.
>>>>
>>>> As for using Protobuf's Any: so far, we've refrained from exposing
>>>> Protobuf by using bytes, would we want to change that now?
>>>>
>>>> Best,
>>>> David
>>>>
>>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>>>>> Hey David,
>>>>>
>>>>> RE: Async: I was trying to match the pattern we use for doget/doput
>>>>> for
>>>>> async. Yes, more thinking java given java grpc's async always pattern.
>>>>>
>>>>> On the comment around the FlightData, I think it is overloading the
>>>> message
>>>>> to use metadata for this. If I want to send a control message
>>>> independently
>>>>> of the data message, I would have to define something like an empty
>>>> flight
>>>>> data message that has custom metadata. Why not support a container
>>>>> object
>>>>> with a oneof{FlightData, Any} in it instead so users can add more data
>>>>> as
>>>>> desired. The default impl could be a noop for the Any messages.
>>>>>
>>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Jacques,
>>>>>>
>>>>>> Thanks for the comments.
>>>>>>
>>>>>> - I do agree DoExchange is a better name!
>>>>>> - FlightData already has metadata fields as a result of prior
>>>>>> proposals, so I don't think we need a new message to carry that kind
>>>>>> of information.
>>>>>> - I like the suggestion of an async handler to handle incoming
>>>>>> messages as the fundamental API; it would actually be quite natural
>>>>>> to
>>>>>> implement in Flight/Java. I will note that it's not possible in
>>>>>> C++/Python without spawning a thread, though. (In essence, gRPC-Java
>>>>>> is async-always and gRPC-C++ is sync-always.) There are experimental
>>>>>> C++ APIs that would let us do something similar to Java, but those
>>>>>> are
>>>>>> only in relatively recent gRPC versions and are still under
>>>>>> development (contrary to the interceptor APIs which have been around
>>>>>> for quite a while).
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>>>>>>> I like it. Added some comments to the doc. Might worth discussion
>>>>>>> here
>>>>>>> depending on your thoughts.
>>>>>>>
>>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> Hey Ryan,
>>>>>>>>
>>>>>>>> Thanks for the comments.
>>>>>>>>
>>>>>>>> Concrete example: I've edited the doc to provide a Python strawman.
>>>>>>>>
>>>>>>>> Sync vs async: while I don't touch on it, you could interleave
>>>> uploads
>>>>>>>> and downloads if you were so inclined. Right now, synchronous APIs
>>>>>>>> make this error-prone, e.g. if both client and server wait for each
>>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
>>>>>>>> ability to have per-read timeouts, only an overall timeout.) As an
>>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>>>>>>>
>>>>>>>> This is mostly tangential though, eventually we will want to design
>>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
>>>>>>>> like
>>>>>>>> this (and like DoPut) just makes these pitfalls easier to run into.
>>>>>>>>
>>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the main
>>>>>>>> concern is that depending on how you deploy, two separate calls
>>>>>>>> could
>>>>>>>> get routed to different instances. Additionally, gRPC has some
>>>>>>>> reconnection behaviors; if the server goes away in between the two
>>>>>>>> calls, but it then restarts or there is another instance available,
>>>>>>>> the client will happily reconnect to the new server without
>>>>>>>> warning.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>>>>>>>>> Hey David,
>>>>>>>>>
>>>>>>>>> I think this proposal makes a lot of sense. I like it and the
>>>>>>>>> possibility
>>>>>>>>> of remote compute via arrow buffers. One thing that would help me
>>>>>> would
>>>>>>>> be
>>>>>>>>> a concrete example of the API in a real life use case. Also, what
>>>>>> would
>>>>>>>> the
>>>>>>>>> client experience be in terms of sync vs asyc? Would the client
>>>>>>>>> block
>>>>>>>> till
>>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b) or
>>>>>>>>> would
>>>>>>>> the
>>>>>>>>> client wait to be signaled that computation was done. If the
>>>>>>>>> later
>>>>>>>>> how
>>>>>>>>> is
>>>>>>>>> that different from a DoPut then DoGet? I suppose that this could
>>>> be
>>>>>>>>> implemented without extending the RPC interface but rather by a
>>>>>>>>> function/util?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Ryan
>>>>>>>>>
>>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> We've been using Flight quite successfully so far, but we have
>>>>>>>>>> identified a new use case on the horizon: being able to both
>>>>>>>>>> send
>>>>>>>>>> and
>>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, I've
>>>>>>>>>> written up a proposal for a new RPC method:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>>>>>>>>>
>>>>>>>>>> Please let me know if you can't view or comment on the document.
>>>>>>>>>> I'd
>>>>>>>>>> appreciate any feedback; I think this is a relatively
>>>>>>>>>> straightforward
>>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>>>>>>>>>
>>>>>>>>>> This is a format change and would require a vote. I've decided
>>>>>>>>>> to
>>>>>>>>>> table the other format change I had proposed (on DoPut), as it
>>>>>> doesn't
>>>>>>>>>> functionally change Flight, just the interpretation of the
>>>>>>>>>> semantics.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>>>>>>>>
>>>>>>>>> +447540852009 | rymurr@dremio.com
>>>>>>>>>
>>>>>>>>> <https://www.dremio.com/>
>>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
>>>>>>>>> community
>>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>>>>>>>> <https://www.dremio.com/download>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Antoine Pitrou <an...@python.org>.
Can one of you explain what is being proposed in non-protobuf terms?
Knowledge of protobuf shouldn't be required to use Flight.

Regards

Antoine.


Le 21/10/2019 à 15:46, David Li a écrit :
> Oneof doesn't actually change the wire encoding; it would just be
> application-level logic. (The official guide doesn't even mention it
> in the encoding docs; I found
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> as well.)
> 
> If I follow you, Jacques, then you are proposing essentially inlining
> the definition of Any, e.g.
> 
> message FlightMessage {
>   oneof message {
>     FlightData data = 1;
>     FlightAny metadata = 2;
>   }
> }
> 
> message FlightAny {
>   string type = 1;
>   bytes data = 2;
> }
> 
> Is this correct?
> 
> It might be nice to consider the wrapper message for DoGet/DoPut as
> well, but at that point, I'd rather we be consistent with all of them,
> rather than have one of the three methods do its own thing.
> 
> Thanks,
> David
> 
> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> I think we could probably expose the oneof behavior without exposing the
>> protobuf functions. On the any... hmm. I guess we could expose as two
>> fields: type and data. Then users could use it for whatever but if people
>> wanted to treat it as any, it would work. (Basically a user could use any
>> with it easily but they could also use any other mechanism). At least in
>> java, the any concepts are pretty simple/diy. Are other language bindings
>> less diy?
>>
>> I'm *not* hardcore against the empty FlightData + metadata but it just
>> seemed a bit janky.
>>
>> Thinking about the control message/wrapper object thing, I wonder if we
>> should redefine DoPut and DoGet to have the same property if we think it is
>> a good idea...
>>
>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com> wrote:
>>
>>> I was definitely considering having control messages without data, and
>>> I thought that could be encoded by a FlightData with only app_metadata
>>> set. I think I understand your position now: FlightData should always
>>> carry (some) data (with optional metadata)?
>>>
>>> That makes sense to me, and is consistent with the documentation on
>>> FlightData in the Protobuf file. I was worried about having a
>>> redundant metadata field, but oneof prevents that from happening, and
>>> overall having a clear separation between data and control messages is
>>> cleaner.
>>>
>>> As for using Protobuf's Any: so far, we've refrained from exposing
>>> Protobuf by using bytes, would we want to change that now?
>>>
>>> Best,
>>> David
>>>
>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>>>> Hey David,
>>>>
>>>> RE: Async: I was trying to match the pattern we use for doget/doput for
>>>> async. Yes, more thinking java given java grpc's async always pattern.
>>>>
>>>> On the comment around the FlightData, I think it is overloading the
>>> message
>>>> to use metadata for this. If I want to send a control message
>>> independently
>>>> of the data message, I would have to define something like an empty
>>> flight
>>>> data message that has custom metadata. Why not support a container
>>>> object
>>>> with a oneof{FlightData, Any} in it instead so users can add more data
>>>> as
>>>> desired. The default impl could be a noop for the Any messages.
>>>>
>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:
>>>>
>>>>> Hi Jacques,
>>>>>
>>>>> Thanks for the comments.
>>>>>
>>>>> - I do agree DoExchange is a better name!
>>>>> - FlightData already has metadata fields as a result of prior
>>>>> proposals, so I don't think we need a new message to carry that kind
>>>>> of information.
>>>>> - I like the suggestion of an async handler to handle incoming
>>>>> messages as the fundamental API; it would actually be quite natural to
>>>>> implement in Flight/Java. I will note that it's not possible in
>>>>> C++/Python without spawning a thread, though. (In essence, gRPC-Java
>>>>> is async-always and gRPC-C++ is sync-always.) There are experimental
>>>>> C++ APIs that would let us do something similar to Java, but those are
>>>>> only in relatively recent gRPC versions and are still under
>>>>> development (contrary to the interceptor APIs which have been around
>>>>> for quite a while).
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>>>>>> I like it. Added some comments to the doc. Might worth discussion
>>>>>> here
>>>>>> depending on your thoughts.
>>>>>>
>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
>>> wrote:
>>>>>>
>>>>>>> Hey Ryan,
>>>>>>>
>>>>>>> Thanks for the comments.
>>>>>>>
>>>>>>> Concrete example: I've edited the doc to provide a Python strawman.
>>>>>>>
>>>>>>> Sync vs async: while I don't touch on it, you could interleave
>>> uploads
>>>>>>> and downloads if you were so inclined. Right now, synchronous APIs
>>>>>>> make this error-prone, e.g. if both client and server wait for each
>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
>>>>>>> ability to have per-read timeouts, only an overall timeout.) As an
>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>>>>>>
>>>>>>> This is mostly tangential though, eventually we will want to design
>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
>>>>>>> like
>>>>>>> this (and like DoPut) just makes these pitfalls easier to run into.
>>>>>>>
>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the main
>>>>>>> concern is that depending on how you deploy, two separate calls
>>>>>>> could
>>>>>>> get routed to different instances. Additionally, gRPC has some
>>>>>>> reconnection behaviors; if the server goes away in between the two
>>>>>>> calls, but it then restarts or there is another instance available,
>>>>>>> the client will happily reconnect to the new server without
>>>>>>> warning.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>>>>>>>> Hey David,
>>>>>>>>
>>>>>>>> I think this proposal makes a lot of sense. I like it and the
>>>>>>>> possibility
>>>>>>>> of remote compute via arrow buffers. One thing that would help me
>>>>> would
>>>>>>> be
>>>>>>>> a concrete example of the API in a real life use case. Also, what
>>>>> would
>>>>>>> the
>>>>>>>> client experience be in terms of sync vs asyc? Would the client
>>>>>>>> block
>>>>>>> till
>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b) or
>>>>>>>> would
>>>>>>> the
>>>>>>>> client wait to be signaled that computation was done. If the
>>>>>>>> later
>>>>>>>> how
>>>>>>>> is
>>>>>>>> that different from a DoPut then DoGet? I suppose that this could
>>> be
>>>>>>>> implemented without extending the RPC interface but rather by a
>>>>>>>> function/util?
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Ryan
>>>>>>>>
>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> We've been using Flight quite successfully so far, but we have
>>>>>>>>> identified a new use case on the horizon: being able to both
>>>>>>>>> send
>>>>>>>>> and
>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, I've
>>>>>>>>> written up a proposal for a new RPC method:
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>>>>>>>>
>>>>>>>>> Please let me know if you can't view or comment on the document.
>>>>>>>>> I'd
>>>>>>>>> appreciate any feedback; I think this is a relatively
>>>>>>>>> straightforward
>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>>>>>>>>
>>>>>>>>> This is a format change and would require a vote. I've decided
>>>>>>>>> to
>>>>>>>>> table the other format change I had proposed (on DoPut), as it
>>>>> doesn't
>>>>>>>>> functionally change Flight, just the interpretation of the
>>>>>>>>> semantics.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>>>>>>>
>>>>>>>> +447540852009 | rymurr@dremio.com
>>>>>>>>
>>>>>>>> <https://www.dremio.com/>
>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
>>>>>>>> community
>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>>>>>>> <https://www.dremio.com/download>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Oneof doesn't actually change the wire encoding; it would just be
application-level logic. (The official guide doesn't even mention it
in the encoding docs; I found
https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
as well.)

If I follow you, Jacques, then you are proposing essentially inlining
the definition of Any, e.g.

message FlightMessage {
  oneof message {
    FlightData data = 1;
    FlightAny metadata = 2;
  }
}

message FlightAny {
  string type = 1;
  bytes data = 2;
}

Is this correct?

It might be nice to consider the wrapper message for DoGet/DoPut as
well, but at that point, I'd rather we be consistent with all of them,
rather than have one of the three methods do its own thing.

Thanks,
David

On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> I think we could probably expose the oneof behavior without exposing the
> protobuf functions. On the any... hmm. I guess we could expose as two
> fields: type and data. Then users could use it for whatever but if people
> wanted to treat it as any, it would work. (Basically a user could use any
> with it easily but they could also use any other mechanism). At least in
> java, the any concepts are pretty simple/diy. Are other language bindings
> less diy?
>
> I'm *not* hardcore against the empty FlightData + metadata but it just
> seemed a bit janky.
>
> Thinking about the control message/wrapper object thing, I wonder if we
> should redefine DoPut and DoGet to have the same property if we think it is
> a good idea...
>
> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com> wrote:
>
>> I was definitely considering having control messages without data, and
>> I thought that could be encoded by a FlightData with only app_metadata
>> set. I think I understand your position now: FlightData should always
>> carry (some) data (with optional metadata)?
>>
>> That makes sense to me, and is consistent with the documentation on
>> FlightData in the Protobuf file. I was worried about having a
>> redundant metadata field, but oneof prevents that from happening, and
>> overall having a clear separation between data and control messages is
>> cleaner.
>>
>> As for using Protobuf's Any: so far, we've refrained from exposing
>> Protobuf by using bytes, would we want to change that now?
>>
>> Best,
>> David
>>
>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > Hey David,
>> >
>> > RE: Async: I was trying to match the pattern we use for doget/doput for
>> > async. Yes, more thinking java given java grpc's async always pattern.
>> >
>> > On the comment around the FlightData, I think it is overloading the
>> message
>> > to use metadata for this. If I want to send a control message
>> independently
>> > of the data message, I would have to define something like an empty
>> flight
>> > data message that has custom metadata. Why not support a container
>> > object
>> > with a oneof{FlightData, Any} in it instead so users can add more data
>> > as
>> > desired. The default impl could be a noop for the Any messages.
>> >
>> > On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:
>> >
>> >> Hi Jacques,
>> >>
>> >> Thanks for the comments.
>> >>
>> >> - I do agree DoExchange is a better name!
>> >> - FlightData already has metadata fields as a result of prior
>> >> proposals, so I don't think we need a new message to carry that kind
>> >> of information.
>> >> - I like the suggestion of an async handler to handle incoming
>> >> messages as the fundamental API; it would actually be quite natural to
>> >> implement in Flight/Java. I will note that it's not possible in
>> >> C++/Python without spawning a thread, though. (In essence, gRPC-Java
>> >> is async-always and gRPC-C++ is sync-always.) There are experimental
>> >> C++ APIs that would let us do something similar to Java, but those are
>> >> only in relatively recent gRPC versions and are still under
>> >> development (contrary to the interceptor APIs which have been around
>> >> for quite a while).
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >> > I like it. Added some comments to the doc. Might worth discussion
>> >> > here
>> >> > depending on your thoughts.
>> >> >
>> >> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hey Ryan,
>> >> >>
>> >> >> Thanks for the comments.
>> >> >>
>> >> >> Concrete example: I've edited the doc to provide a Python strawman.
>> >> >>
>> >> >> Sync vs async: while I don't touch on it, you could interleave
>> uploads
>> >> >> and downloads if you were so inclined. Right now, synchronous APIs
>> >> >> make this error-prone, e.g. if both client and server wait for each
>> >> >> other due to an application logic bug. (gRPC doesn't give us the
>> >> >> ability to have per-read timeouts, only an overall timeout.) As an
>> >> >> example of this happening with DoPut, see ARROW-6063:
>> >> >> https://issues.apache.org/jira/browse/ARROW-6063
>> >> >>
>> >> >> This is mostly tangential though, eventually we will want to design
>> >> >> asynchronous APIs for Flight as a whole. A bidirectional stream
>> >> >> like
>> >> >> this (and like DoPut) just makes these pitfalls easier to run into.
>> >> >>
>> >> >> Using DoPut+DoGet: I discussed this in the proposal, but the main
>> >> >> concern is that depending on how you deploy, two separate calls
>> >> >> could
>> >> >> get routed to different instances. Additionally, gRPC has some
>> >> >> reconnection behaviors; if the server goes away in between the two
>> >> >> calls, but it then restarts or there is another instance available,
>> >> >> the client will happily reconnect to the new server without
>> >> >> warning.
>> >> >>
>> >> >> Thanks,
>> >> >> David
>> >> >>
>> >> >> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> >> >> > Hey David,
>> >> >> >
>> >> >> > I think this proposal makes a lot of sense. I like it and the
>> >> >> > possibility
>> >> >> > of remote compute via arrow buffers. One thing that would help me
>> >> would
>> >> >> be
>> >> >> > a concrete example of the API in a real life use case. Also, what
>> >> would
>> >> >> the
>> >> >> > client experience be in terms of sync vs asyc? Would the client
>> >> >> > block
>> >> >> till
>> >> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or
>> >> >> > would
>> >> >> the
>> >> >> > client wait to be signaled that computation was done. If the
>> >> >> > later
>> >> >> > how
>> >> >> > is
>> >> >> > that different from a DoPut then DoGet? I suppose that this could
>> be
>> >> >> > implemented without extending the RPC interface but rather by a
>> >> >> > function/util?
>> >> >> >
>> >> >> >
>> >> >> > Best,
>> >> >> >
>> >> >> > Ryan
>> >> >> >
>> >> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
>> >> wrote:
>> >> >> >
>> >> >> >> Hi all,
>> >> >> >>
>> >> >> >> We've been using Flight quite successfully so far, but we have
>> >> >> >> identified a new use case on the horizon: being able to both
>> >> >> >> send
>> >> >> >> and
>> >> >> >> retrieve Arrow data within a single RPC call. To that end, I've
>> >> >> >> written up a proposal for a new RPC method:
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >> >> >>
>> >> >> >> Please let me know if you can't view or comment on the document.
>> >> >> >> I'd
>> >> >> >> appreciate any feedback; I think this is a relatively
>> >> >> >> straightforward
>> >> >> >> addition - it is essentially "DoPutThenGet".
>> >> >> >>
>> >> >> >> This is a format change and would require a vote. I've decided
>> >> >> >> to
>> >> >> >> table the other format change I had proposed (on DoPut), as it
>> >> doesn't
>> >> >> >> functionally change Flight, just the interpretation of the
>> >> >> >> semantics.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> David
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > Ryan Murray  | Principal Consulting Engineer
>> >> >> >
>> >> >> > +447540852009 | rymurr@dremio.com
>> >> >> >
>> >> >> > <https://www.dremio.com/>
>> >> >> > Check out our GitHub <https://www.github.com/dremio>, join our
>> >> >> > community
>> >> >> > site <https://community.dremio.com/> & Download Dremio
>> >> >> > <https://www.dremio.com/download>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I think we could probably expose the oneof behavior without exposing the
protobuf functions. On the any... hmm. I guess we could expose as two
fields: type and data. Then users could use it for whatever but if people
wanted to treat it as any, it would work. (Basically a user could use any
with it easily but they could also use any other mechanism). At least in
java, the any concepts are pretty simple/diy. Are other language bindings
less diy?

I'm *not* hardcore against the empty FlightData + metadata but it just
seemed a bit janky.

Thinking about the control message/wrapper object thing, I wonder if we
should redefine DoPut and DoGet to have the same property if we think it is
a good idea...

On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com> wrote:

> I was definitely considering having control messages without data, and
> I thought that could be encoded by a FlightData with only app_metadata
> set. I think I understand your position now: FlightData should always
> carry (some) data (with optional metadata)?
>
> That makes sense to me, and is consistent with the documentation on
> FlightData in the Protobuf file. I was worried about having a
> redundant metadata field, but oneof prevents that from happening, and
> overall having a clear separation between data and control messages is
> cleaner.
>
> As for using Protobuf's Any: so far, we've refrained from exposing
> Protobuf by using bytes, would we want to change that now?
>
> Best,
> David
>
> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > Hey David,
> >
> > RE: Async: I was trying to match the pattern we use for doget/doput for
> > async. Yes, more thinking java given java grpc's async always pattern.
> >
> > On the comment around the FlightData, I think it is overloading the
> message
> > to use metadata for this. If I want to send a control message
> independently
> > of the data message, I would have to define something like an empty
> flight
> > data message that has custom metadata. Why not support a container object
> > with a oneof{FlightData, Any} in it instead so users can add more data as
> > desired. The default impl could be a noop for the Any messages.
> >
> > On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:
> >
> >> Hi Jacques,
> >>
> >> Thanks for the comments.
> >>
> >> - I do agree DoExchange is a better name!
> >> - FlightData already has metadata fields as a result of prior
> >> proposals, so I don't think we need a new message to carry that kind
> >> of information.
> >> - I like the suggestion of an async handler to handle incoming
> >> messages as the fundamental API; it would actually be quite natural to
> >> implement in Flight/Java. I will note that it's not possible in
> >> C++/Python without spawning a thread, though. (In essence, gRPC-Java
> >> is async-always and gRPC-C++ is sync-always.) There are experimental
> >> C++ APIs that would let us do something similar to Java, but those are
> >> only in relatively recent gRPC versions and are still under
> >> development (contrary to the interceptor APIs which have been around
> >> for quite a while).
> >>
> >> Thanks,
> >> David
> >>
> >> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > I like it. Added some comments to the doc. Might worth discussion here
> >> > depending on your thoughts.
> >> >
> >> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
> wrote:
> >> >
> >> >> Hey Ryan,
> >> >>
> >> >> Thanks for the comments.
> >> >>
> >> >> Concrete example: I've edited the doc to provide a Python strawman.
> >> >>
> >> >> Sync vs async: while I don't touch on it, you could interleave
> uploads
> >> >> and downloads if you were so inclined. Right now, synchronous APIs
> >> >> make this error-prone, e.g. if both client and server wait for each
> >> >> other due to an application logic bug. (gRPC doesn't give us the
> >> >> ability to have per-read timeouts, only an overall timeout.) As an
> >> >> example of this happening with DoPut, see ARROW-6063:
> >> >> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>
> >> >> This is mostly tangential though, eventually we will want to design
> >> >> asynchronous APIs for Flight as a whole. A bidirectional stream like
> >> >> this (and like DoPut) just makes these pitfalls easier to run into.
> >> >>
> >> >> Using DoPut+DoGet: I discussed this in the proposal, but the main
> >> >> concern is that depending on how you deploy, two separate calls could
> >> >> get routed to different instances. Additionally, gRPC has some
> >> >> reconnection behaviors; if the server goes away in between the two
> >> >> calls, but it then restarts or there is another instance available,
> >> >> the client will happily reconnect to the new server without warning.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> >> > Hey David,
> >> >> >
> >> >> > I think this proposal makes a lot of sense. I like it and the
> >> >> > possibility
> >> >> > of remote compute via arrow buffers. One thing that would help me
> >> would
> >> >> be
> >> >> > a concrete example of the API in a real life use case. Also, what
> >> would
> >> >> the
> >> >> > client experience be in terms of sync vs asyc? Would the client
> >> >> > block
> >> >> till
> >> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or
> >> >> > would
> >> >> the
> >> >> > client wait to be signaled that computation was done. If the later
> >> >> > how
> >> >> > is
> >> >> > that different from a DoPut then DoGet? I suppose that this could
> be
> >> >> > implemented without extending the RPC interface but rather by a
> >> >> > function/util?
> >> >> >
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Ryan
> >> >> >
> >> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Hi all,
> >> >> >>
> >> >> >> We've been using Flight quite successfully so far, but we have
> >> >> >> identified a new use case on the horizon: being able to both send
> >> >> >> and
> >> >> >> retrieve Arrow data within a single RPC call. To that end, I've
> >> >> >> written up a proposal for a new RPC method:
> >> >> >>
> >> >> >>
> >> >>
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >> >>
> >> >> >> Please let me know if you can't view or comment on the document.
> >> >> >> I'd
> >> >> >> appreciate any feedback; I think this is a relatively
> >> >> >> straightforward
> >> >> >> addition - it is essentially "DoPutThenGet".
> >> >> >>
> >> >> >> This is a format change and would require a vote. I've decided to
> >> >> >> table the other format change I had proposed (on DoPut), as it
> >> doesn't
> >> >> >> functionally change Flight, just the interpretation of the
> >> >> >> semantics.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> David
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Ryan Murray  | Principal Consulting Engineer
> >> >> >
> >> >> > +447540852009 | rymurr@dremio.com
> >> >> >
> >> >> > <https://www.dremio.com/>
> >> >> > Check out our GitHub <https://www.github.com/dremio>, join our
> >> >> > community
> >> >> > site <https://community.dremio.com/> & Download Dremio
> >> >> > <https://www.dremio.com/download>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
I was definitely considering having control messages without data, and
I thought that could be encoded by a FlightData with only app_metadata
set. I think I understand your position now: FlightData should always
carry (some) data (with optional metadata)?

That makes sense to me, and is consistent with the documentation on
FlightData in the Protobuf file. I was worried about having a
redundant metadata field, but oneof prevents that from happening, and
overall having a clear separation between data and control messages is
cleaner.

As for using Protobuf's Any: so far, we've refrained from exposing
Protobuf by using bytes, would we want to change that now?

Best,
David

On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> Hey David,
>
> RE: Async: I was trying to match the pattern we use for doget/doput for
> async. Yes, more thinking java given java grpc's async always pattern.
>
> On the comment around the FlightData, I think it is overloading the message
> to use metadata for this. If I want to send a control message independently
> of the data message, I would have to define something like an empty flight
> data message that has custom metadata. Why not support a container object
> with a oneof{FlightData, Any} in it instead so users can add more data as
> desired. The default impl could be a noop for the Any messages.
>
> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:
>
>> Hi Jacques,
>>
>> Thanks for the comments.
>>
>> - I do agree DoExchange is a better name!
>> - FlightData already has metadata fields as a result of prior
>> proposals, so I don't think we need a new message to carry that kind
>> of information.
>> - I like the suggestion of an async handler to handle incoming
>> messages as the fundamental API; it would actually be quite natural to
>> implement in Flight/Java. I will note that it's not possible in
>> C++/Python without spawning a thread, though. (In essence, gRPC-Java
>> is async-always and gRPC-C++ is sync-always.) There are experimental
>> C++ APIs that would let us do something similar to Java, but those are
>> only in relatively recent gRPC versions and are still under
>> development (contrary to the interceptor APIs which have been around
>> for quite a while).
>>
>> Thanks,
>> David
>>
>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > I like it. Added some comments to the doc. Might worth discussion here
>> > depending on your thoughts.
>> >
>> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com> wrote:
>> >
>> >> Hey Ryan,
>> >>
>> >> Thanks for the comments.
>> >>
>> >> Concrete example: I've edited the doc to provide a Python strawman.
>> >>
>> >> Sync vs async: while I don't touch on it, you could interleave uploads
>> >> and downloads if you were so inclined. Right now, synchronous APIs
>> >> make this error-prone, e.g. if both client and server wait for each
>> >> other due to an application logic bug. (gRPC doesn't give us the
>> >> ability to have per-read timeouts, only an overall timeout.) As an
>> >> example of this happening with DoPut, see ARROW-6063:
>> >> https://issues.apache.org/jira/browse/ARROW-6063
>> >>
>> >> This is mostly tangential though, eventually we will want to design
>> >> asynchronous APIs for Flight as a whole. A bidirectional stream like
>> >> this (and like DoPut) just makes these pitfalls easier to run into.
>> >>
>> >> Using DoPut+DoGet: I discussed this in the proposal, but the main
>> >> concern is that depending on how you deploy, two separate calls could
>> >> get routed to different instances. Additionally, gRPC has some
>> >> reconnection behaviors; if the server goes away in between the two
>> >> calls, but it then restarts or there is another instance available,
>> >> the client will happily reconnect to the new server without warning.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> >> > Hey David,
>> >> >
>> >> > I think this proposal makes a lot of sense. I like it and the
>> >> > possibility
>> >> > of remote compute via arrow buffers. One thing that would help me
>> would
>> >> be
>> >> > a concrete example of the API in a real life use case. Also, what
>> would
>> >> the
>> >> > client experience be in terms of sync vs asyc? Would the client
>> >> > block
>> >> till
>> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or
>> >> > would
>> >> the
>> >> > client wait to be signaled that computation was done. If the later
>> >> > how
>> >> > is
>> >> > that different from a DoPut then DoGet? I suppose that this could be
>> >> > implemented without extending the RPC interface but rather by a
>> >> > function/util?
>> >> >
>> >> >
>> >> > Best,
>> >> >
>> >> > Ryan
>> >> >
>> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hi all,
>> >> >>
>> >> >> We've been using Flight quite successfully so far, but we have
>> >> >> identified a new use case on the horizon: being able to both send
>> >> >> and
>> >> >> retrieve Arrow data within a single RPC call. To that end, I've
>> >> >> written up a proposal for a new RPC method:
>> >> >>
>> >> >>
>> >>
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >> >>
>> >> >> Please let me know if you can't view or comment on the document.
>> >> >> I'd
>> >> >> appreciate any feedback; I think this is a relatively
>> >> >> straightforward
>> >> >> addition - it is essentially "DoPutThenGet".
>> >> >>
>> >> >> This is a format change and would require a vote. I've decided to
>> >> >> table the other format change I had proposed (on DoPut), as it
>> doesn't
>> >> >> functionally change Flight, just the interpretation of the
>> >> >> semantics.
>> >> >>
>> >> >> Thanks,
>> >> >> David
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Ryan Murray  | Principal Consulting Engineer
>> >> >
>> >> > +447540852009 | rymurr@dremio.com
>> >> >
>> >> > <https://www.dremio.com/>
>> >> > Check out our GitHub <https://www.github.com/dremio>, join our
>> >> > community
>> >> > site <https://community.dremio.com/> & Download Dremio
>> >> > <https://www.dremio.com/download>
>> >> >
>> >>
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
Hey David,

RE: Async: I was trying to match the pattern we use for doget/doput for
async. Yes, more thinking java given java grpc's async always pattern.

On the comment around the FlightData, I think it is overloading the message
to use metadata for this. If I want to send a control message independently
of the data message, I would have to define something like an empty flight
data message that has custom metadata. Why not support a container object
with a oneof{FlightData, Any} in it instead so users can add more data as
desired. The default impl could be a noop for the Any messages.

On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com> wrote:

> Hi Jacques,
>
> Thanks for the comments.
>
> - I do agree DoExchange is a better name!
> - FlightData already has metadata fields as a result of prior
> proposals, so I don't think we need a new message to carry that kind
> of information.
> - I like the suggestion of an async handler to handle incoming
> messages as the fundamental API; it would actually be quite natural to
> implement in Flight/Java. I will note that it's not possible in
> C++/Python without spawning a thread, though. (In essence, gRPC-Java
> is async-always and gRPC-C++ is sync-always.) There are experimental
> C++ APIs that would let us do something similar to Java, but those are
> only in relatively recent gRPC versions and are still under
> development (contrary to the interceptor APIs which have been around
> for quite a while).
>
> Thanks,
> David
>
> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > I like it. Added some comments to the doc. Might worth discussion here
> > depending on your thoughts.
> >
> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com> wrote:
> >
> >> Hey Ryan,
> >>
> >> Thanks for the comments.
> >>
> >> Concrete example: I've edited the doc to provide a Python strawman.
> >>
> >> Sync vs async: while I don't touch on it, you could interleave uploads
> >> and downloads if you were so inclined. Right now, synchronous APIs
> >> make this error-prone, e.g. if both client and server wait for each
> >> other due to an application logic bug. (gRPC doesn't give us the
> >> ability to have per-read timeouts, only an overall timeout.) As an
> >> example of this happening with DoPut, see ARROW-6063:
> >> https://issues.apache.org/jira/browse/ARROW-6063
> >>
> >> This is mostly tangential though, eventually we will want to design
> >> asynchronous APIs for Flight as a whole. A bidirectional stream like
> >> this (and like DoPut) just makes these pitfalls easier to run into.
> >>
> >> Using DoPut+DoGet: I discussed this in the proposal, but the main
> >> concern is that depending on how you deploy, two separate calls could
> >> get routed to different instances. Additionally, gRPC has some
> >> reconnection behaviors; if the server goes away in between the two
> >> calls, but it then restarts or there is another instance available,
> >> the client will happily reconnect to the new server without warning.
> >>
> >> Thanks,
> >> David
> >>
> >> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> > Hey David,
> >> >
> >> > I think this proposal makes a lot of sense. I like it and the
> >> > possibility
> >> > of remote compute via arrow buffers. One thing that would help me
> would
> >> be
> >> > a concrete example of the API in a real life use case. Also, what
> would
> >> the
> >> > client experience be in terms of sync vs asyc? Would the client block
> >> till
> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or would
> >> the
> >> > client wait to be signaled that computation was done. If the later how
> >> > is
> >> > that different from a DoPut then DoGet? I suppose that this could be
> >> > implemented without extending the RPC interface but rather by a
> >> > function/util?
> >> >
> >> >
> >> > Best,
> >> >
> >> > Ryan
> >> >
> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com>
> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> We've been using Flight quite successfully so far, but we have
> >> >> identified a new use case on the horizon: being able to both send and
> >> >> retrieve Arrow data within a single RPC call. To that end, I've
> >> >> written up a proposal for a new RPC method:
> >> >>
> >> >>
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>
> >> >> Please let me know if you can't view or comment on the document. I'd
> >> >> appreciate any feedback; I think this is a relatively straightforward
> >> >> addition - it is essentially "DoPutThenGet".
> >> >>
> >> >> This is a format change and would require a vote. I've decided to
> >> >> table the other format change I had proposed (on DoPut), as it
> doesn't
> >> >> functionally change Flight, just the interpretation of the semantics.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >
> >> >
> >> > --
> >> >
> >> > Ryan Murray  | Principal Consulting Engineer
> >> >
> >> > +447540852009 | rymurr@dremio.com
> >> >
> >> > <https://www.dremio.com/>
> >> > Check out our GitHub <https://www.github.com/dremio>, join our
> >> > community
> >> > site <https://community.dremio.com/> & Download Dremio
> >> > <https://www.dremio.com/download>
> >> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hi Jacques,

Thanks for the comments.

- I do agree DoExchange is a better name!
- FlightData already has metadata fields as a result of prior
proposals, so I don't think we need a new message to carry that kind
of information.
- I like the suggestion of an async handler to handle incoming
messages as the fundamental API; it would actually be quite natural to
implement in Flight/Java. I will note that it's not possible in
C++/Python without spawning a thread, though. (In essence, gRPC-Java
is async-always and gRPC-C++ is sync-always.) There are experimental
C++ APIs that would let us do something similar to Java, but those are
only in relatively recent gRPC versions and are still under
development (contrary to the interceptor APIs which have been around
for quite a while).

Thanks,
David

On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> I like it. Added some comments to the doc. Might worth discussion here
> depending on your thoughts.
>
> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com> wrote:
>
>> Hey Ryan,
>>
>> Thanks for the comments.
>>
>> Concrete example: I've edited the doc to provide a Python strawman.
>>
>> Sync vs async: while I don't touch on it, you could interleave uploads
>> and downloads if you were so inclined. Right now, synchronous APIs
>> make this error-prone, e.g. if both client and server wait for each
>> other due to an application logic bug. (gRPC doesn't give us the
>> ability to have per-read timeouts, only an overall timeout.) As an
>> example of this happening with DoPut, see ARROW-6063:
>> https://issues.apache.org/jira/browse/ARROW-6063
>>
>> This is mostly tangential though, eventually we will want to design
>> asynchronous APIs for Flight as a whole. A bidirectional stream like
>> this (and like DoPut) just makes these pitfalls easier to run into.
>>
>> Using DoPut+DoGet: I discussed this in the proposal, but the main
>> concern is that depending on how you deploy, two separate calls could
>> get routed to different instances. Additionally, gRPC has some
>> reconnection behaviors; if the server goes away in between the two
>> calls, but it then restarts or there is another instance available,
>> the client will happily reconnect to the new server without warning.
>>
>> Thanks,
>> David
>>
>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> > Hey David,
>> >
>> > I think this proposal makes a lot of sense. I like it and the
>> > possibility
>> > of remote compute via arrow buffers. One thing that would help me would
>> be
>> > a concrete example of the API in a real life use case. Also, what would
>> the
>> > client experience be in terms of sync vs asyc? Would the client block
>> till
>> > the bidirectional call return ie c = flight.vector_mult(a, b) or would
>> the
>> > client wait to be signaled that computation was done. If the later how
>> > is
>> > that different from a DoPut then DoGet? I suppose that this could be
>> > implemented without extending the RPC interface but rather by a
>> > function/util?
>> >
>> >
>> > Best,
>> >
>> > Ryan
>> >
>> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> We've been using Flight quite successfully so far, but we have
>> >> identified a new use case on the horizon: being able to both send and
>> >> retrieve Arrow data within a single RPC call. To that end, I've
>> >> written up a proposal for a new RPC method:
>> >>
>> >>
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>
>> >> Please let me know if you can't view or comment on the document. I'd
>> >> appreciate any feedback; I think this is a relatively straightforward
>> >> addition - it is essentially "DoPutThenGet".
>> >>
>> >> This is a format change and would require a vote. I've decided to
>> >> table the other format change I had proposed (on DoPut), as it doesn't
>> >> functionally change Flight, just the interpretation of the semantics.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >
>> >
>> > --
>> >
>> > Ryan Murray  | Principal Consulting Engineer
>> >
>> > +447540852009 | rymurr@dremio.com
>> >
>> > <https://www.dremio.com/>
>> > Check out our GitHub <https://www.github.com/dremio>, join our
>> > community
>> > site <https://community.dremio.com/> & Download Dremio
>> > <https://www.dremio.com/download>
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I like it. Added some comments to the doc. Might worth discussion here
depending on your thoughts.

On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com> wrote:

> Hey Ryan,
>
> Thanks for the comments.
>
> Concrete example: I've edited the doc to provide a Python strawman.
>
> Sync vs async: while I don't touch on it, you could interleave uploads
> and downloads if you were so inclined. Right now, synchronous APIs
> make this error-prone, e.g. if both client and server wait for each
> other due to an application logic bug. (gRPC doesn't give us the
> ability to have per-read timeouts, only an overall timeout.) As an
> example of this happening with DoPut, see ARROW-6063:
> https://issues.apache.org/jira/browse/ARROW-6063
>
> This is mostly tangential though, eventually we will want to design
> asynchronous APIs for Flight as a whole. A bidirectional stream like
> this (and like DoPut) just makes these pitfalls easier to run into.
>
> Using DoPut+DoGet: I discussed this in the proposal, but the main
> concern is that depending on how you deploy, two separate calls could
> get routed to different instances. Additionally, gRPC has some
> reconnection behaviors; if the server goes away in between the two
> calls, but it then restarts or there is another instance available,
> the client will happily reconnect to the new server without warning.
>
> Thanks,
> David
>
> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > Hey David,
> >
> > I think this proposal makes a lot of sense. I like it and the possibility
> > of remote compute via arrow buffers. One thing that would help me would
> be
> > a concrete example of the API in a real life use case. Also, what would
> the
> > client experience be in terms of sync vs asyc? Would the client block
> till
> > the bidirectional call return ie c = flight.vector_mult(a, b) or would
> the
> > client wait to be signaled that computation was done. If the later how is
> > that different from a DoPut then DoGet? I suppose that this could be
> > implemented without extending the RPC interface but rather by a
> > function/util?
> >
> >
> > Best,
> >
> > Ryan
> >
> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> We've been using Flight quite successfully so far, but we have
> >> identified a new use case on the horizon: being able to both send and
> >> retrieve Arrow data within a single RPC call. To that end, I've
> >> written up a proposal for a new RPC method:
> >>
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>
> >> Please let me know if you can't view or comment on the document. I'd
> >> appreciate any feedback; I think this is a relatively straightforward
> >> addition - it is essentially "DoPutThenGet".
> >>
> >> This is a format change and would require a vote. I've decided to
> >> table the other format change I had proposed (on DoPut), as it doesn't
> >> functionally change Flight, just the interpretation of the semantics.
> >>
> >> Thanks,
> >> David
> >>
> >
> >
> > --
> >
> > Ryan Murray  | Principal Consulting Engineer
> >
> > +447540852009 | rymurr@dremio.com
> >
> > <https://www.dremio.com/>
> > Check out our GitHub <https://www.github.com/dremio>, join our community
> > site <https://community.dremio.com/> & Download Dremio
> > <https://www.dremio.com/download>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Ryan,

Thanks for the comments.

Concrete example: I've edited the doc to provide a Python strawman.

Sync vs async: while I don't touch on it, you could interleave uploads
and downloads if you were so inclined. Right now, synchronous APIs
make this error-prone, e.g. if both client and server wait for each
other due to an application logic bug. (gRPC doesn't give us the
ability to have per-read timeouts, only an overall timeout.) As an
example of this happening with DoPut, see ARROW-6063:
https://issues.apache.org/jira/browse/ARROW-6063

This is mostly tangential though, eventually we will want to design
asynchronous APIs for Flight as a whole. A bidirectional stream like
this (and like DoPut) just makes these pitfalls easier to run into.

Using DoPut+DoGet: I discussed this in the proposal, but the main
concern is that depending on how you deploy, two separate calls could
get routed to different instances. Additionally, gRPC has some
reconnection behaviors; if the server goes away in between the two
calls, but it then restarts or there is another instance available,
the client will happily reconnect to the new server without warning.

Thanks,
David

On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> Hey David,
>
> I think this proposal makes a lot of sense. I like it and the possibility
> of remote compute via arrow buffers. One thing that would help me would be
> a concrete example of the API in a real life use case. Also, what would the
> client experience be in terms of sync vs asyc? Would the client block till
> the bidirectional call return ie c = flight.vector_mult(a, b) or would the
> client wait to be signaled that computation was done. If the later how is
> that different from a DoPut then DoGet? I suppose that this could be
> implemented without extending the RPC interface but rather by a
> function/util?
>
>
> Best,
>
> Ryan
>
> On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com> wrote:
>
>> Hi all,
>>
>> We've been using Flight quite successfully so far, but we have
>> identified a new use case on the horizon: being able to both send and
>> retrieve Arrow data within a single RPC call. To that end, I've
>> written up a proposal for a new RPC method:
>>
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>
>> Please let me know if you can't view or comment on the document. I'd
>> appreciate any feedback; I think this is a relatively straightforward
>> addition - it is essentially "DoPutThenGet".
>>
>> This is a format change and would require a vote. I've decided to
>> table the other format change I had proposed (on DoPut), as it doesn't
>> functionally change Flight, just the interpretation of the semantics.
>>
>> Thanks,
>> David
>>
>
>
> --
>
> Ryan Murray  | Principal Consulting Engineer
>
> +447540852009 | rymurr@dremio.com
>
> <https://www.dremio.com/>
> Check out our GitHub <https://www.github.com/dremio>, join our community
> site <https://community.dremio.com/> & Download Dremio
> <https://www.dremio.com/download>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Ryan Murray <ry...@dremio.com>.
Hey David,

I think this proposal makes a lot of sense. I like it and the possibility
of remote compute via arrow buffers. One thing that would help me would be
a concrete example of the API in a real life use case. Also, what would the
client experience be in terms of sync vs asyc? Would the client block till
the bidirectional call return ie c = flight.vector_mult(a, b) or would the
client wait to be signaled that computation was done. If the later how is
that different from a DoPut then DoGet? I suppose that this could be
implemented without extending the RPC interface but rather by a
function/util?


Best,

Ryan

On Sun, Oct 13, 2019 at 9:24 PM David Li <li...@gmail.com> wrote:

> Hi all,
>
> We've been using Flight quite successfully so far, but we have
> identified a new use case on the horizon: being able to both send and
> retrieve Arrow data within a single RPC call. To that end, I've
> written up a proposal for a new RPC method:
>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>
> Please let me know if you can't view or comment on the document. I'd
> appreciate any feedback; I think this is a relatively straightforward
> addition - it is essentially "DoPutThenGet".
>
> This is a format change and would require a vote. I've decided to
> table the other format change I had proposed (on DoPut), as it doesn't
> functionally change Flight, just the interpretation of the semantics.
>
> Thanks,
> David
>


-- 

Ryan Murray  | Principal Consulting Engineer

+447540852009 | rymurr@dremio.com

<https://www.dremio.com/>
Check out our GitHub <https://www.github.com/dremio>, join our community
site <https://community.dremio.com/> & Download Dremio
<https://www.dremio.com/download>