You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/11/07 16:59:30 UTC

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Bumping this discussion since a couple of weeks have passed. It seems
there are still some questions here, could we summarize what are the
alternatives along with any public API implications so we can try to
render a decision?

On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>
> Hi Wes,
>
> Responses inline:
>
> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>
> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
> > >
> > > The question is whether to repurpose the existing FlightData
> > > structure, and allow for the metadata field to be filled in and data
> > > fields to be blank (as a control message), or to wrap the FlightData
> > > structure in another structure that explicitly distinguishes between
> > > control and data messages.
> >
> > I'm not super against having metadata-only FlightData with empty body.
> > One question to consider is what changes (if any) would need to be
> > made to public APIs in either scenario.
> >
>
> We could leave DoGet/DoPut as-is for now, and allow empty data messages in
> the future. This would be a breaking change, but wouldn't change the wire
> format. I think the APIs could be changed backwards compatibly, though.
>
>
>
> > > The other question is how to handle the metadata fields. So far, we've
> > > used bytestring fields for application-defined data. This is workable
> > > if you want to use Protobuf to define the contents of those fields,
> > > but requires you to pack/unpack your Protobuf into/from the bytestring
> > > field. If we instead used the Protobuf Any field, a dynamically typed
> > > field, this would be more convenient, but then we'd be exposing
> > > Protobuf types. We could alternatively use a combination of a type
> > > field and a bytestring field, mimicking what the Protobuf Any type
> > > looks like on the wire. I'm not sure this is actually cleaner in any
> > > of the language APIs, though.
> >
> > Leaving the deserialization of the app metadata to the particular
> > Flight implementation seems on first principles like the most flexible
> > thing, if Any is used, does that mean the metadata _must_ be a
> > protobuf?
> >
>
>
> If Any is used, we could still expose a bytes-based API, but it would have
> some more wrapping. (We could put a ByteString in Any.) Then the question
> would just be how to expose this (would be easier in Java, harder in C++).
>
>
>
> > > David
> > >
> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > > >
> > > > Can one of you explain what is being proposed in non-protobuf terms?
> > > > Knowledge of protobuf shouldn't be required to use Flight.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > > >> Oneof doesn't actually change the wire encoding; it would just be
> > > >> application-level logic. (The official guide doesn't even mention it
> > > >> in the encoding docs; I found
> > > >>
> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > > >> as well.)
> > > >>
> > > >> If I follow you, Jacques, then you are proposing essentially inlining
> > > >> the definition of Any, e.g.
> > > >>
> > > >> message FlightMessage {
> > > >>   oneof message {
> > > >>     FlightData data = 1;
> > > >>     FlightAny metadata = 2;
> > > >>   }
> > > >> }
> > > >>
> > > >> message FlightAny {
> > > >>   string type = 1;
> > > >>   bytes data = 2;
> > > >> }
> > > >>
> > > >> Is this correct?
> > > >>
> > > >> It might be nice to consider the wrapper message for DoGet/DoPut as
> > > >> well, but at that point, I'd rather we be consistent with all of them,
> > > >> rather than have one of the three methods do its own thing.
> > > >>
> > > >> Thanks,
> > > >> David
> > > >>
> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>> I think we could probably expose the oneof behavior without exposing
> > the
> > > >>> protobuf functions. On the any... hmm. I guess we could expose as two
> > > >>> fields: type and data. Then users could use it for whatever but if
> > > >>> people
> > > >>> wanted to treat it as any, it would work. (Basically a user could use
> > > >>> any
> > > >>> with it easily but they could also use any other mechanism). At
> > least in
> > > >>> java, the any concepts are pretty simple/diy. Are other language
> > > >>> bindings
> > > >>> less diy?
> > > >>>
> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it
> > just
> > > >>> seemed a bit janky.
> > > >>>
> > > >>> Thinking about the control message/wrapper object thing, I wonder if
> > we
> > > >>> should redefine DoPut and DoGet to have the same property if we
> > think it
> > > >>> is
> > > >>> a good idea...
> > > >>>
> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
> > wrote:
> > > >>>
> > > >>>> I was definitely considering having control messages without data,
> > and
> > > >>>> I thought that could be encoded by a FlightData with only
> > app_metadata
> > > >>>> set. I think I understand your position now: FlightData should
> > always
> > > >>>> carry (some) data (with optional metadata)?
> > > >>>>
> > > >>>> That makes sense to me, and is consistent with the documentation on
> > > >>>> FlightData in the Protobuf file. I was worried about having a
> > > >>>> redundant metadata field, but oneof prevents that from happening,
> > and
> > > >>>> overall having a clear separation between data and control messages
> > is
> > > >>>> cleaner.
> > > >>>>
> > > >>>> As for using Protobuf's Any: so far, we've refrained from exposing
> > > >>>> Protobuf by using bytes, would we want to change that now?
> > > >>>>
> > > >>>> Best,
> > > >>>> David
> > > >>>>
> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>>>> Hey David,
> > > >>>>>
> > > >>>>> RE: Async: I was trying to match the pattern we use for doget/doput
> > > >>>>> for
> > > >>>>> async. Yes, more thinking java given java grpc's async always
> > pattern.
> > > >>>>>
> > > >>>>> On the comment around the FlightData, I think it is overloading the
> > > >>>> message
> > > >>>>> to use metadata for this. If I want to send a control message
> > > >>>> independently
> > > >>>>> of the data message, I would have to define something like an empty
> > > >>>> flight
> > > >>>>> data message that has custom metadata. Why not support a container
> > > >>>>> object
> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add more
> > data
> > > >>>>> as
> > > >>>>> desired. The default impl could be a noop for the Any messages.
> > > >>>>>
> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Jacques,
> > > >>>>>>
> > > >>>>>> Thanks for the comments.
> > > >>>>>>
> > > >>>>>> - I do agree DoExchange is a better name!
> > > >>>>>> - FlightData already has metadata fields as a result of prior
> > > >>>>>> proposals, so I don't think we need a new message to carry that
> > kind
> > > >>>>>> of information.
> > > >>>>>> - I like the suggestion of an async handler to handle incoming
> > > >>>>>> messages as the fundamental API; it would actually be quite
> > natural
> > > >>>>>> to
> > > >>>>>> implement in Flight/Java. I will note that it's not possible in
> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
> > gRPC-Java
> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> > experimental
> > > >>>>>> C++ APIs that would let us do something similar to Java, but those
> > > >>>>>> are
> > > >>>>>> only in relatively recent gRPC versions and are still under
> > > >>>>>> development (contrary to the interceptor APIs which have been
> > around
> > > >>>>>> for quite a while).
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> David
> > > >>>>>>
> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > >>>>>>> I like it. Added some comments to the doc. Might worth discussion
> > > >>>>>>> here
> > > >>>>>>> depending on your thoughts.
> > > >>>>>>>
> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey Ryan,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for the comments.
> > > >>>>>>>>
> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> > strawman.
> > > >>>>>>>>
> > > >>>>>>>> Sync vs async: while I don't touch on it, you could interleave
> > > >>>> uploads
> > > >>>>>>>> and downloads if you were so inclined. Right now, synchronous
> > APIs
> > > >>>>>>>> make this error-prone, e.g. if both client and server wait for
> > each
> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As
> > an
> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > > >>>>>>>>
> > > >>>>>>>> This is mostly tangential though, eventually we will want to
> > design
> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
> > > >>>>>>>> like
> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run
> > into.
> > > >>>>>>>>
> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
> > main
> > > >>>>>>>> concern is that depending on how you deploy, two separate calls
> > > >>>>>>>> could
> > > >>>>>>>> get routed to different instances. Additionally, gRPC has some
> > > >>>>>>>> reconnection behaviors; if the server goes away in between the
> > two
> > > >>>>>>>> calls, but it then restarts or there is another instance
> > available,
> > > >>>>>>>> the client will happily reconnect to the new server without
> > > >>>>>>>> warning.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> David
> > > >>>>>>>>
> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > > >>>>>>>>> Hey David,
> > > >>>>>>>>>
> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and the
> > > >>>>>>>>> possibility
> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would help
> > me
> > > >>>>>> would
> > > >>>>>>>> be
> > > >>>>>>>>> a concrete example of the API in a real life use case. Also,
> > what
> > > >>>>>> would
> > > >>>>>>>> the
> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the client
> > > >>>>>>>>> block
> > > >>>>>>>> till
> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b)
> > or
> > > >>>>>>>>> would
> > > >>>>>>>> the
> > > >>>>>>>>> client wait to be signaled that computation was done. If the
> > > >>>>>>>>> later
> > > >>>>>>>>> how
> > > >>>>>>>>> is
> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
> > could
> > > >>>> be
> > > >>>>>>>>> implemented without extending the RPC interface but rather by a
> > > >>>>>>>>> function/util?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>>
> > > >>>>>>>>> Ryan
> > > >>>>>>>>>
> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > li.davidm96@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi all,
> > > >>>>>>>>>>
> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we have
> > > >>>>>>>>>> identified a new use case on the horizon: being able to both
> > > >>>>>>>>>> send
> > > >>>>>>>>>> and
> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
> > I've
> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>
> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > > >>>>>>>>>>
> > > >>>>>>>>>> Please let me know if you can't view or comment on the
> > document.
> > > >>>>>>>>>> I'd
> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > > >>>>>>>>>> straightforward
> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is a format change and would require a vote. I've decided
> > > >>>>>>>>>> to
> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as it
> > > >>>>>> doesn't
> > > >>>>>>>>>> functionally change Flight, just the interpretation of the
> > > >>>>>>>>>> semantics.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>> David
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>>
> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > > >>>>>>>>>
> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > > >>>>>>>>>
> > > >>>>>>>>> <https://www.dremio.com/>
> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
> > > >>>>>>>>> community
> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> > > >>>>>>>>> <https://www.dremio.com/download>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
I agree we should handle the issue of potentially-multiple-streams
separately from the BiDirectional RPC design / implementation

On Thu, Dec 12, 2019 at 2:20 PM David Li <li...@gmail.com> wrote:
>
> Just following up here again, any other thoughts?
>
> I think we do have justifications for potentially separate streams in
> a call, but that's more of an orthogonal question - it doesn't need to
> be addressed here. I do agree that it very much complicates things.
>
> Thanks,
> David
>
> On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > I would generally agree with this. Note that you have the possibility
> > to use unions-of-structs to send record batches with different schemas
> > in the same stream, though with some added complexity on each side
> >
> > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
> >>
> >> I'd vote for explicitly not supported. We should keep our primitives
> >> narrow.
> >>
> >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> >>
> >> > Thanks for the feedback.
> >> >
> >> > I do think if we had explicitly embraced gRPC from the beginning,
> >> > there are a lot of places where things could be made more ergonomic,
> >> > including with the metadata fields. But it would also have locked out
> >> > us of potential future transports.
> >> >
> >> > On another note: I hesitate to put too much into this method, but we
> >> > are looking at use cases where potentially, a client may want to
> >> > upload multiple distinct datasets (with differing schemas). (This is a
> >> > little tentative, and I can get more details...) Right now, each
> >> > logical stream in Flight must have a single, consistent schema; would
> >> > it make sense to look at ways to relax this, or declare this
> >> > explicitly out of scope (and require multiple calls and coordination
> >> > with the deployment topology) in order to accomplish this?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > > Fair enough. I'm okay with the bytes approach and the proposal looks
> >> > > good
> >> > > to me.
> >> > >
> >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I've updated the proposal.
> >> > >>
> >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> > >> errors/metadata, I still think using bytes is preferable:
> >> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> >> > types,
> >> > >> - We wouldn't be able to practically expose the Protobuf field to
> >> > >> C++
> >> > >> users without causing build pains,
> >> > >> - We can't let Python users take advantage of the Protobuf field
> >> > >> without somehow being compatible with the Protobuf wheels (by
> >> > >> linking
> >> > >> to the same version, and doing magic to turn the C++ Protobufs into
> >> > >> the Python ones),
> >> > >> - All our other application-defined fields are already bytes.
> >> > >>
> >> > >> Applications that want structure can encode JSON or Protobuf Any
> >> > >> into
> >> > >> the bytes field themselves, much as you can already do for Ticket,
> >> > >> commands in FlightDescriptors, and application metadata in
> >> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> > >> Any directly, since Any itself is a bytes field with a tag, and must
> >> > >> invoke the Protobuf deserializer again to read the actual message.
> >> > >>
> >> > >> If we decide on using bytes, then I don't think it makes sense to
> >> > >> define a new message with a oneof either, since it would be
> >> > >> redundant.
> >> > >>
> >> > >> Thanks,
> >> > >> David
> >> > >>
> >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > >> > I've been extremely backlogged, I will update the proposal when I
> >> > >> > get
> >> > >> > a chance and reply here when done.
> >> > >> >
> >> > >> > Best,
> >> > >> > David
> >> > >> >
> >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> > >> >> Bumping this discussion since a couple of weeks have passed. It
> >> > >> >> seems
> >> > >> >> there are still some questions here, could we summarize what are
> >> > >> >> the
> >> > >> >> alternatives along with any public API implications so we can try
> >> > >> >> to
> >> > >> >> render a decision?
> >> > >> >>
> >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> >> > >> >> wrote:
> >> > >> >>>
> >> > >> >>> Hi Wes,
> >> > >> >>>
> >> > >> >>> Responses inline:
> >> > >> >>>
> >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > >> >>>
> >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> > >> >>> > <li...@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > >
> >> > >> >>> > > The question is whether to repurpose the existing FlightData
> >> > >> >>> > > structure, and allow for the metadata field to be filled in
> >> > >> >>> > > and
> >> > >> data
> >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> >> > >> FlightData
> >> > >> >>> > > structure in another structure that explicitly distinguishes
> >> > >> between
> >> > >> >>> > > control and data messages.
> >> > >> >>> >
> >> > >> >>> > I'm not super against having metadata-only FlightData with
> >> > >> >>> > empty
> >> > >> body.
> >> > >> >>> > One question to consider is what changes (if any) would need to
> >> > >> >>> > be
> >> > >> >>> > made to public APIs in either scenario.
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> > >> >>> messages
> >> > >> >>> in
> >> > >> >>> the future. This would be a breaking change, but wouldn't change
> >> > >> >>> the
> >> > >> >>> wire
> >> > >> >>> format. I think the APIs could be changed backwards compatibly,
> >> > >> >>> though.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > The other question is how to handle the metadata fields. So
> >> > >> >>> > > far,
> >> > >> >>> > > we've
> >> > >> >>> > > used bytestring fields for application-defined data. This is
> >> > >> >>> > > workable
> >> > >> >>> > > if you want to use Protobuf to define the contents of those
> >> > >> >>> > > fields,
> >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> > >> >>> > > bytestring
> >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> > >> >>> > > dynamically
> >> > >> >>> > > typed
> >> > >> >>> > > field, this would be more convenient, but then we'd be
> >> > >> >>> > > exposing
> >> > >> >>> > > Protobuf types. We could alternatively use a combination of
> >> > >> >>> > > a
> >> > >> >>> > > type
> >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> >> > >> >>> > > Any
> >> > >> >>> > > type
> >> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner
> >> > >> >>> > > in
> >> > >> any
> >> > >> >>> > > of the language APIs, though.
> >> > >> >>> >
> >> > >> >>> > Leaving the deserialization of the app metadata to the
> >> > >> >>> > particular
> >> > >> >>> > Flight implementation seems on first principles like the most
> >> > >> flexible
> >> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >> > >> >>> > protobuf?
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> >> > would
> >> > >> >>> have
> >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >> > >> >>> question
> >> > >> >>> would just be how to expose this (would be easier in Java, harder
> >> > >> >>> in
> >> > >> >>> C++).
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > David
> >> > >> >>> > >
> >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> > >> >>> > > >
> >> > >> >>> > > > Can one of you explain what is being proposed in
> >> > >> >>> > > > non-protobuf
> >> > >> >>> > > > terms?
> >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >> > >> >>> > > >
> >> > >> >>> > > > Regards
> >> > >> >>> > > >
> >> > >> >>> > > > Antoine.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> >> > just
> >> > >> be
> >> > >> >>> > > >> application-level logic. (The official guide doesn't even
> >> > >> mention
> >> > >> >>> > > >> it
> >> > >> >>> > > >> in the encoding docs; I found
> >> > >> >>> > > >>
> >> > >> >>> >
> >> > >>
> >> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> > >> >>> > > >> as well.)
> >> > >> >>> > > >>
> >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> > >> >>> > > >> essentially
> >> > >> >>> > > >> inlining
> >> > >> >>> > > >> the definition of Any, e.g.
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightMessage {
> >> > >> >>> > > >>   oneof message {
> >> > >> >>> > > >>     FlightData data = 1;
> >> > >> >>> > > >>     FlightAny metadata = 2;
> >> > >> >>> > > >>   }
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightAny {
> >> > >> >>> > > >>   string type = 1;
> >> > >> >>> > > >>   bytes data = 2;
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> Is this correct?
> >> > >> >>> > > >>
> >> > >> >>> > > >> It might be nice to consider the wrapper message for
> >> > >> >>> > > >> DoGet/DoPut
> >> > >> >>> > > >> as
> >> > >> >>> > > >> well, but at that point, I'd rather we be consistent with
> >> > >> >>> > > >> all
> >> > >> >>> > > >> of
> >> > >> >>> > > >> them,
> >> > >> >>> > > >> rather than have one of the three methods do its own
> >> > >> >>> > > >> thing.
> >> > >> >>> > > >>
> >> > >> >>> > > >> Thanks,
> >> > >> >>> > > >> David
> >> > >> >>> > > >>
> >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> >> > >> >>> > > >>> without
> >> > >> >>> > > >>> exposing
> >> > >> >>> > the
> >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> > >> >>> > > >>> expose
> >> > >> >>> > > >>> as
> >> > >> >>> > > >>> two
> >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >> > >> >>> > > >>> whatever
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> people
> >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
> >> > >> >>> > > >>> user
> >> > >> >>> > > >>> could
> >> > >> >>> > > >>> use
> >> > >> >>> > > >>> any
> >> > >> >>> > > >>> with it easily but they could also use any other
> >> > >> >>> > > >>> mechanism).
> >> > >> >>> > > >>> At
> >> > >> >>> > least in
> >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> > >> language
> >> > >> >>> > > >>> bindings
> >> > >> >>> > > >>> less diy?
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >> > >> >>> > > >>> metadata
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> it
> >> > >> >>> > just
> >> > >> >>> > > >>> seemed a bit janky.
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> Thinking about the control message/wrapper object thing,
> >> > >> >>> > > >>> I
> >> > >> >>> > > >>> wonder
> >> > >> >>> > > >>> if
> >> > >> >>> > we
> >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> we
> >> > >> >>> > think it
> >> > >> >>> > > >>> is
> >> > >> >>> > > >>> a good idea...
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> > >> li.davidm96@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > > >>>
> >> > >> >>> > > >>>> I was definitely considering having control messages
> >> > without
> >> > >> >>> > > >>>> data,
> >> > >> >>> > and
> >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> >> > >> >>> > > >>>> only
> >> > >> >>> > app_metadata
> >> > >> >>> > > >>>> set. I think I understand your position now: FlightData
> >> > >> >>> > > >>>> should
> >> > >> >>> > always
> >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> > >> >>> > > >>>> documentation
> >> > >> >>> > > >>>> on
> >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> >> > >> >>> > > >>>> having
> >> > >> >>> > > >>>> a
> >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> > >> >>> > > >>>> happening,
> >> > >> >>> > and
> >> > >> >>> > > >>>> overall having a clear separation between data and
> >> > >> >>> > > >>>> control
> >> > >> >>> > > >>>> messages
> >> > >> >>> > is
> >> > >> >>> > > >>>> cleaner.
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> >> > >> >>> > > >>>> from
> >> > >> >>> > > >>>> exposing
> >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
> >> > >> >>> > > >>>> now?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> Best,
> >> > >> >>> > > >>>> David
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>>>> Hey David,
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> doget/doput
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> >> > >> >>> > > >>>>> always
> >> > >> >>> > pattern.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> > >> >>> > > >>>>> overloading
> >> > >> >>> > > >>>>> the
> >> > >> >>> > > >>>> message
> >> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> > >> >>> > > >>>>> message
> >> > >> >>> > > >>>> independently
> >> > >> >>> > > >>>>> of the data message, I would have to define something
> >> > >> >>> > > >>>>> like
> >> > >> >>> > > >>>>> an
> >> > >> >>> > > >>>>> empty
> >> > >> >>> > > >>>> flight
> >> > >> >>> > > >>>>> data message that has custom metadata. Why not support
> >> > >> >>> > > >>>>> a
> >> > >> >>> > > >>>>> container
> >> > >> >>> > > >>>>> object
> >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
> >> > >> >>> > > >>>>> can
> >> > >> >>> > > >>>>> add
> >> > >> >>> > > >>>>> more
> >> > >> >>> > data
> >> > >> >>> > > >>>>> as
> >> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> > >> >>> > > >>>>> messages.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> > >> >>> > > >>>>> <li...@gmail.com>
> >> > >> >>> > > >>>>> wrote:
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>>> Hi Jacques,
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
> >> > >> >>> > > >>>>>> of
> >> > >> prior
> >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> >> > carry
> >> > >> >>> > > >>>>>> that
> >> > >> >>> > kind
> >> > >> >>> > > >>>>>> of information.
> >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >> > >> >>> > > >>>>>> incoming
> >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> >> > >> >>> > > >>>>>> quite
> >> > >> >>> > natural
> >> > >> >>> > > >>>>>> to
> >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> > >> >>> > > >>>>>> possible
> >> > >> >>> > > >>>>>> in
> >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> >> > essence,
> >> > >> >>> > gRPC-Java
> >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > experimental
> >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> >> > >> >>> > > >>>>>> Java,
> >> > >> >>> > > >>>>>> but
> >> > >> >>> > > >>>>>> those
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> >> > >> >>> > > >>>>>> under
> >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
> >> > >> >>> > > >>>>>> have
> >> > >> been
> >> > >> >>> > around
> >> > >> >>> > > >>>>>> for quite a while).
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks,
> >> > >> >>> > > >>>>>> David
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> >> > >> >>> > > >>>>>>> worth
> >> > >> >>> > > >>>>>>> discussion
> >> > >> >>> > > >>>>>>> here
> >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> > >> >>> > > >>>> wrote:
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> > >> >>> > > >>>>>>>> Python
> >> > >> >>> > strawman.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >> > >> >>> > > >>>>>>>> interleave
> >> > >> >>> > > >>>> uploads
> >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> > >> >>> > > >>>>>>>> synchronous
> >> > >> >>> > APIs
> >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> >> > >> >>> > > >>>>>>>> server
> >> > >> >>> > > >>>>>>>> wait
> >> > >> >>> > > >>>>>>>> for
> >> > >> >>> > each
> >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >> > >> >>> > > >>>>>>>> doesn't
> >> > >> >>> > > >>>>>>>> give
> >> > >> >>> > > >>>>>>>> us
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> > >> >>> > > >>>>>>>> timeout.)
> >> > >> >>> > > >>>>>>>> As
> >> > >> >>> > an
> >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> >> > >> >>> > > >>>>>>>> will
> >> > >> >>> > > >>>>>>>> want
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > design
> >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> > bidirectional
> >> > >> >>> > > >>>>>>>> stream
> >> > >> >>> > > >>>>>>>> like
> >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> >> > >> >>> > > >>>>>>>> easier
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > > >>>>>>>> run
> >> > >> >>> > into.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> > >> >>> > > >>>>>>>> proposal,
> >> > but
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > main
> >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> > >> >>> > > >>>>>>>> separate
> >> > >> >>> > > >>>>>>>> calls
> >> > >> >>> > > >>>>>>>> could
> >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> >> > >> >>> > > >>>>>>>> gRPC
> >> > >> >>> > > >>>>>>>> has
> >> > >> >>> > > >>>>>>>> some
> >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> > >> >>> > > >>>>>>>> between
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > two
> >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> >> > instance
> >> > >> >>> > available,
> >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> >> > >> without
> >> > >> >>> > > >>>>>>>> warning.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>> David
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> > >> >>> > > >>>>>>>>> Hey David,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like
> >> > >> >>> > > >>>>>>>>> it
> >> > >> >>> > > >>>>>>>>> and
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> possibility
> >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>>> help
> >> > >> >>> > me
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> be
> >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
> >> > >> >>> > > >>>>>>>>> case.
> >> > >> >>> > > >>>>>>>>> Also,
> >> > >> >>> > what
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> >> > >> >>> > > >>>>>>>>> Would
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client
> >> > >> >>> > > >>>>>>>>> block
> >> > >> >>> > > >>>>>>>> till
> >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> > >> flight.vector_mult(a,
> >> > >> >>> > > >>>>>>>>> b)
> >> > >> >>> > or
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> >> > >> >>> > > >>>>>>>>> done.
> >> > >> >>> > > >>>>>>>>> If
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> later
> >> > >> >>> > > >>>>>>>>> how
> >> > >> >>> > > >>>>>>>>> is
> >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> this
> >> > >> >>> > could
> >> > >> >>> > > >>>> be
> >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> >> > >> >>> > > >>>>>>>>> but
> >> > >> rather
> >> > >> >>> > > >>>>>>>>> by a
> >> > >> >>> > > >>>>>>>>> function/util?
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Best,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> > >> >>> > li.davidm96@gmail.com>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Hi all,
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> >> > >> >>> > > >>>>>>>>>> far,
> >> > but
> >> > >> we
> >> > >> >>> > > >>>>>>>>>> have
> >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
> >> > >> >>> > > >>>>>>>>>> able
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> both
> >> > >> >>> > > >>>>>>>>>> send
> >> > >> >>> > > >>>>>>>>>> and
> >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
> >> > >> >>> > > >>>>>>>>>> that
> >> > >> >>> > > >>>>>>>>>> end,
> >> > >> >>> > I've
> >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> >
> >> > >>
> >> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
> >> > >> >>> > > >>>>>>>>>> on
> >> > the
> >> > >> >>> > document.
> >> > >> >>> > > >>>>>>>>>> I'd
> >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >> > >> >>> > > >>>>>>>>>> relatively
> >> > >> >>> > > >>>>>>>>>> straightforward
> >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> >> > I've
> >> > >> >>> > > >>>>>>>>>> decided
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> > >> >>> > > >>>>>>>>>> DoPut),
> >> > >> >>> > > >>>>>>>>>> as
> >> > >> >>> > > >>>>>>>>>> it
> >> > >> >>> > > >>>>>> doesn't
> >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> > >> >>> > > >>>>>>>>>> interpretation
> >> > of
> >> > >> >>> > > >>>>>>>>>> the
> >> > >> >>> > > >>>>>>>>>> semantics.
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>>>> David
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> --
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> > >> join
> >> > >> >>> > > >>>>>>>>> our
> >> > >> >>> > > >>>>>>>>> community
> >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> >> > Dremio
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>
> >> > >> >>> > > >
> >> > >> >>> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
Looks like there is consensus about this. I'll start a vote about the
format change soon if no further comments.

On Mon, Mar 23, 2020 at 7:41 AM David Li <li...@gmail.com> wrote:
>
> Hey Wes,
>
> Thanks for the review. I've broken out the format change into this PR:
> https://github.com/apache/arrow/pull/6686
>
> Best,
> David
>
> On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> > hi David,
> >
> > I did a preliminary view and things look to be on the right track
> > there. What do you think about breaking out the protocol changes (and
> > adding appropriate comments) so we can have a vote on that in
> > relatively short order?
> >
> > - Wes
> >
> > On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
> >>
> >> Following up here, I've submitted a draft implementation for C++:
> >> https://github.com/apache/arrow/pull/6656
> >>
> >> The core functionality is there, but there are still holes that I need
> >> to implement. Compared to the draft spec, the client also sends a
> >> FlightDescriptor to begin with, though it's currently not exposed.
> >> This provides consistency with DoGet/DoPut which also send a message
> >> to begin with to describe the stream to the server.
> >>
> >> Andy, I hope this helps clarify whether it meets your needs.
> >>
> >> Best,
> >> David
> >>
> >> On 2/25/20, David Li <li...@gmail.com> wrote:
> >> > Hey Andy,
> >> >
> >> > I've been rather busy unfortunately. I had started on an
> >> > implementation in C++ to provide as part of this discussion, but it's
> >> > not complete. I'm hoping to have more done in March.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> >> I was wondering if there had been any momentum on this (the
> >> >> BiDirectional
> >> >> RPC design)?
> >> >>
> >> >> I'm interested in this for the use case of Apache Spark sending a
> >> >> stream
> >> >> of
> >> >> data to another process to invoke custom code and then receive a
> >> >> stream
> >> >> back with the transformed data.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Andy.
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> >> wrote:
> >> >>
> >> >>> I support moving forward with the current proposal.
> >> >>>
> >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Just following up here again, any other thoughts?
> >> >>> >
> >> >>> > I think we do have justifications for potentially separate streams
> >> >>> > in
> >> >>> > a call, but that's more of an orthogonal question - it doesn't need
> >> >>> > to
> >> >>> > be addressed here. I do agree that it very much complicates things.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > David
> >> >>> >
> >> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > > I would generally agree with this. Note that you have the
> >> >>> > > possibility
> >> >>> > > to use unions-of-structs to send record batches with different
> >> >>> > > schemas
> >> >>> > > in the same stream, though with some added complexity on each
> >> >>> > > side
> >> >>> > >
> >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
> >> >>> > > <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >>
> >> >>> > >> I'd vote for explicitly not supported. We should keep our
> >> >>> > >> primitives
> >> >>> > >> narrow.
> >> >>> > >>
> >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > Thanks for the feedback.
> >> >>> > >> >
> >> >>> > >> > I do think if we had explicitly embraced gRPC from the
> >> >>> > >> > beginning,
> >> >>> > >> > there are a lot of places where things could be made more
> >> >>> > >> > ergonomic,
> >> >>> > >> > including with the metadata fields. But it would also have
> >> >>> > >> > locked
> >> >>> out
> >> >>> > >> > us of potential future transports.
> >> >>> > >> >
> >> >>> > >> > On another note: I hesitate to put too much into this method,
> >> >>> > >> > but
> >> >>> > >> > we
> >> >>> > >> > are looking at use cases where potentially, a client may want
> >> >>> > >> > to
> >> >>> > >> > upload multiple distinct datasets (with differing schemas).
> >> >>> > >> > (This
> >> >>> is a
> >> >>> > >> > little tentative, and I can get more details...) Right now,
> >> >>> > >> > each
> >> >>> > >> > logical stream in Flight must have a single, consistent
> >> >>> > >> > schema;
> >> >>> would
> >> >>> > >> > it make sense to look at ways to relax this, or declare this
> >> >>> > >> > explicitly out of scope (and require multiple calls and
> >> >>> > >> > coordination
> >> >>> > >> > with the deployment topology) in order to accomplish this?
> >> >>> > >> >
> >> >>> > >> > Best,
> >> >>> > >> > David
> >> >>> > >> >
> >> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
> >> >>> > >> > > proposal
> >> >>> looks
> >> >>> > >> > > good
> >> >>> > >> > > to me.
> >> >>> > >> > >
> >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >> >>> > >> > > <li...@gmail.com>
> >> >>> > >> > > wrote:
> >> >>> > >> > >
> >> >>> > >> > >> I've updated the proposal.
> >> >>> > >> > >>
> >> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >> >>> Protobuf
> >> >>> > >> > types,
> >> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> to
> >> >>> > >> > >> C++
> >> >>> > >> > >> users without causing build pains,
> >> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> without somehow being compatible with the Protobuf wheels
> >> >>> > >> > >> (by
> >> >>> > >> > >> linking
> >> >>> > >> > >> to the same version, and doing magic to turn the C++
> >> >>> > >> > >> Protobufs
> >> >>> into
> >> >>> > >> > >> the Python ones),
> >> >>> > >> > >> - All our other application-defined fields are already
> >> >>> > >> > >> bytes.
> >> >>> > >> > >>
> >> >>> > >> > >> Applications that want structure can encode JSON or
> >> >>> > >> > >> Protobuf
> >> >>> > >> > >> Any
> >> >>> > >> > >> into
> >> >>> > >> > >> the bytes field themselves, much as you can already do for
> >> >>> Ticket,
> >> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
> >> >>> > >> > >> than
> >> >>> using
> >> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >> >>> > >> > >> and
> >> >>> > must
> >> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >> >>> message.
> >> >>> > >> > >>
> >> >>> > >> > >> If we decide on using bytes, then I don't think it makes
> >> >>> > >> > >> sense
> >> >>> > >> > >> to
> >> >>> > >> > >> define a new message with a oneof either, since it would be
> >> >>> > >> > >> redundant.
> >> >>> > >> > >>
> >> >>> > >> > >> Thanks,
> >> >>> > >> > >> David
> >> >>> > >> > >>
> >> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> >>> > >> > >> > I've been extremely backlogged, I will update the
> >> >>> > >> > >> > proposal
> >> >>> when I
> >> >>> > >> > >> > get
> >> >>> > >> > >> > a chance and reply here when done.
> >> >>> > >> > >> >
> >> >>> > >> > >> > Best,
> >> >>> > >> > >> > David
> >> >>> > >> > >> >
> >> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >> >>> > >> > >> >> passed.
> >> >>> It
> >> >>> > >> > >> >> seems
> >> >>> > >> > >> >> there are still some questions here, could we summarize
> >> >>> > >> > >> >> what
> >> >>> are
> >> >>> > >> > >> >> the
> >> >>> > >> > >> >> alternatives along with any public API implications so
> >> >>> > >> > >> >> we
> >> >>> > >> > >> >> can
> >> >>> > try
> >> >>> > >> > >> >> to
> >> >>> > >> > >> >> render a decision?
> >> >>> > >> > >> >>
> >> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >> >>> li.davidm96@gmail.com
> >> >>> > >
> >> >>> > >> > >> >> wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Hi Wes,
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Responses inline:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >> >>> wesmckinn@gmail.com>
> >> >>> > >> > wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> >>> > >> > >> >>> > <li...@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >> >>> > >> > >> >>> > > filled
> >> >>> in
> >> >>> > >> > >> >>> > > and
> >> >>> > >> > >> data
> >> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
> >> >>> > >> > >> >>> > > wrap
> >> >>> > >> > >> >>> > > the
> >> >>> > >> > >> FlightData
> >> >>> > >> > >> >>> > > structure in another structure that explicitly
> >> >>> > distinguishes
> >> >>> > >> > >> between
> >> >>> > >> > >> >>> > > control and data messages.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >> >>> > >> > >> >>> > with
> >> >>> > >> > >> >>> > empty
> >> >>> > >> > >> body.
> >> >>> > >> > >> >>> > One question to consider is what changes (if any)
> >> >>> > >> > >> >>> > would
> >> >>> need
> >> >>> > to
> >> >>> > >> > >> >>> > be
> >> >>> > >> > >> >>> > made to public APIs in either scenario.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
> >> >>> > >> > >> >>> empty
> >> >>> data
> >> >>> > >> > >> >>> messages
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> the future. This would be a breaking change, but
> >> >>> > >> > >> >>> wouldn't
> >> >>> > change
> >> >>> > >> > >> >>> the
> >> >>> > >> > >> >>> wire
> >> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >> >>> compatibly,
> >> >>> > >> > >> >>> though.
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >> >>> > >> > >> >>> > > fields.
> >> >>> So
> >> >>> > >> > >> >>> > > far,
> >> >>> > >> > >> >>> > > we've
> >> >>> > >> > >> >>> > > used bytestring fields for application-defined
> >> >>> > >> > >> >>> > > data.
> >> >>> > >> > >> >>> > > This
> >> >>> > is
> >> >>> > >> > >> >>> > > workable
> >> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
> >> >>> > >> > >> >>> > > of
> >> >>> those
> >> >>> > >> > >> >>> > > fields,
> >> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >> >>> > >> > >> >>> > > into/from
> >> >>> the
> >> >>> > >> > >> >>> > > bytestring
> >> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> >>> > >> > >> >>> > > dynamically
> >> >>> > >> > >> >>> > > typed
> >> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
> >> >>> > >> > >> >>> > > be
> >> >>> > >> > >> >>> > > exposing
> >> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >> >>> > >> > >> >>> > > combination
> >> >>> of
> >> >>> > >> > >> >>> > > a
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >> >>> > >> > >> >>> > > Protobuf
> >> >>> > >> > >> >>> > > Any
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
> >> >>> > >> > >> >>> > > actually
> >> >>> > cleaner
> >> >>> > >> > >> >>> > > in
> >> >>> > >> > >> any
> >> >>> > >> > >> >>> > > of the language APIs, though.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > particular
> >> >>> > >> > >> >>> > Flight implementation seems on first principles like
> >> >>> > >> > >> >>> > the
> >> >>> most
> >> >>> > >> > >> flexible
> >> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >> >>> > >> > >> >>> > _must_
> >> >>> be
> >> >>> > a
> >> >>> > >> > >> >>> > protobuf?
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
> >> >>> > >> > >> >>> API,
> >> >>> > >> > >> >>> but
> >> >>> it
> >> >>> > >> > would
> >> >>> > >> > >> >>> have
> >> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >> >>> > >> > >> >>> Then
> >> >>> > the
> >> >>> > >> > >> >>> question
> >> >>> > >> > >> >>> would just be how to expose this (would be easier in
> >> >>> > >> > >> >>> Java,
> >> >>> > harder
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> C++).
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > David
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >> >>> > >> > >> >>> > > wrote:
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >> >>> > >> > >> >>> > > > non-protobuf
> >> >>> > >> > >> >>> > > > terms?
> >> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
> >> >>> > >> > >> >>> > > > use
> >> >>> > Flight.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Regards
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Antoine.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
> >> >>> > >> > >> >>> > > >> it
> >> >>> > would
> >> >>> > >> > just
> >> >>> > >> > >> be
> >> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >> >>> > >> > >> >>> > > >> doesn't
> >> >>> > even
> >> >>> > >> > >> mention
> >> >>> > >> > >> >>> > > >> it
> >> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> >>> > >> > >> >>> > > >> as well.)
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> >>> > >> > >> >>> > > >> essentially
> >> >>> > >> > >> >>> > > >> inlining
> >> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightMessage {
> >> >>> > >> > >> >>> > > >>   oneof message {
> >> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >> >>> > >> > >> >>> > > >>   }
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightAny {
> >> >>> > >> > >> >>> > > >>   string type = 1;
> >> >>> > >> > >> >>> > > >>   bytes data = 2;
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Is this correct?
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >> >>> > >> > >> >>> > > >> for
> >> >>> > >> > >> >>> > > >> DoGet/DoPut
> >> >>> > >> > >> >>> > > >> as
> >> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >> >>> > >> > >> >>> > > >> consistent
> >> >>> > with
> >> >>> > >> > >> >>> > > >> all
> >> >>> > >> > >> >>> > > >> of
> >> >>> > >> > >> >>> > > >> them,
> >> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >> >>> > >> > >> >>> > > >> own
> >> >>> > >> > >> >>> > > >> thing.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Thanks,
> >> >>> > >> > >> >>> > > >> David
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >> >>> wrote:
> >> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >> >>> > >> > >> >>> > > >>> behavior
> >> >>> > >> > >> >>> > > >>> without
> >> >>> > >> > >> >>> > > >>> exposing
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
> >> >>> > >> > >> >>> > > >>> we
> >> >>> could
> >> >>> > >> > >> >>> > > >>> expose
> >> >>> > >> > >> >>> > > >>> as
> >> >>> > >> > >> >>> > > >>> two
> >> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
> >> >>> > >> > >> >>> > > >>> for
> >> >>> > >> > >> >>> > > >>> whatever
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> people
> >> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >> >>> > >> > >> >>> > > >>> (Basically
> >> >>> a
> >> >>> > >> > >> >>> > > >>> user
> >> >>> > >> > >> >>> > > >>> could
> >> >>> > >> > >> >>> > > >>> use
> >> >>> > >> > >> >>> > > >>> any
> >> >>> > >> > >> >>> > > >>> with it easily but they could also use any
> >> >>> > >> > >> >>> > > >>> other
> >> >>> > >> > >> >>> > > >>> mechanism).
> >> >>> > >> > >> >>> > > >>> At
> >> >>> > >> > >> >>> > least in
> >> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
> >> >>> > >> > >> >>> > > >>> Are
> >> >>> other
> >> >>> > >> > >> language
> >> >>> > >> > >> >>> > > >>> bindings
> >> >>> > >> > >> >>> > > >>> less diy?
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
> >> >>> > >> > >> >>> > > >>> +
> >> >>> > >> > >> >>> > > >>> metadata
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> it
> >> >>> > >> > >> >>> > just
> >> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
> >> >>> > >> > >> >>> > > >>> object
> >> >>> > thing,
> >> >>> > >> > >> >>> > > >>> I
> >> >>> > >> > >> >>> > > >>> wonder
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > we
> >> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
> >> >>> > >> > >> >>> > > >>> same
> >> >>> > property
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> we
> >> >>> > >> > >> >>> > think it
> >> >>> > >> > >> >>> > > >>> is
> >> >>> > >> > >> >>> > > >>> a good idea...
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> >>> > >> > >> li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > without
> >> >>> > >> > >> >>> > > >>>> data,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
> >> >>> > >> > >> >>> > > >>>> FlightData
> >> >>> > >> > >> >>> > > >>>> with
> >> >>> > >> > >> >>> > > >>>> only
> >> >>> > >> > >> >>> > app_metadata
> >> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > >>>> should
> >> >>> > >> > >> >>> > always
> >> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >> >>> > >> > >> >>> > > >>>> the
> >> >>> > >> > >> >>> > > >>>> documentation
> >> >>> > >> > >> >>> > > >>>> on
> >> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >> >>> > >> > >> >>> > > >>>> about
> >> >>> > >> > >> >>> > > >>>> having
> >> >>> > >> > >> >>> > > >>>> a
> >> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
> >> >>> > >> > >> >>> > > >>>> that
> >> >>> from
> >> >>> > >> > >> >>> > > >>>> happening,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >> >>> > >> > >> >>> > > >>>> and
> >> >>> > >> > >> >>> > > >>>> control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > >> >>> > is
> >> >>> > >> > >> >>> > > >>>> cleaner.
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >> >>> > >> > >> >>> > > >>>> refrained
> >> >>> > >> > >> >>> > > >>>> from
> >> >>> > >> > >> >>> > > >>>> exposing
> >> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
> >> >>> > >> > >> >>> > > >>>> change
> >> >>> that
> >> >>> > >> > >> >>> > > >>>> now?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> Best,
> >> >>> > >> > >> >>> > > >>>> David
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
> >> >>> > >> > >> >>> > > >>>>> we
> >> >>> > >> > >> >>> > > >>>>> use
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> doget/doput
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
> >> >>> > >> > >> >>> > > >>>>> grpc's
> >> >>> > async
> >> >>> > >> > >> >>> > > >>>>> always
> >> >>> > >> > >> >>> > pattern.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
> >> >>> > >> > >> >>> > > >>>>> it
> >> >>> > >> > >> >>> > > >>>>> is
> >> >>> > >> > >> >>> > > >>>>> overloading
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> > >> > >> >>> > > >>>> message
> >> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >> >>> control
> >> >>> > >> > >> >>> > > >>>>> message
> >> >>> > >> > >> >>> > > >>>> independently
> >> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >> >>> something
> >> >>> > >> > >> >>> > > >>>>> like
> >> >>> > >> > >> >>> > > >>>>> an
> >> >>> > >> > >> >>> > > >>>>> empty
> >> >>> > >> > >> >>> > > >>>> flight
> >> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
> >> >>> > >> > >> >>> > > >>>>> not
> >> >>> > support
> >> >>> > >> > >> >>> > > >>>>> a
> >> >>> > >> > >> >>> > > >>>>> container
> >> >>> > >> > >> >>> > > >>>>> object
> >> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
> >> >>> > >> > >> >>> > > >>>>> so
> >> >>> users
> >> >>> > >> > >> >>> > > >>>>> can
> >> >>> > >> > >> >>> > > >>>>> add
> >> >>> > >> > >> >>> > > >>>>> more
> >> >>> > >> > >> >>> > data
> >> >>> > >> > >> >>> > > >>>>> as
> >> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> Any
> >> >>> > >> > >> >>> > > >>>>> messages.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
> >> >>> > >> > >> >>> > > >>>>>> a
> >> >>> result
> >> >>> > >> > >> >>> > > >>>>>> of
> >> >>> > >> > >> prior
> >> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >> >>> > >> > >> >>> > > >>>>>> message
> >> >>> to
> >> >>> > >> > carry
> >> >>> > >> > >> >>> > > >>>>>> that
> >> >>> > >> > >> >>> > kind
> >> >>> > >> > >> >>> > > >>>>>> of information.
> >> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > handle
> >> >>> > >> > >> >>> > > >>>>>> incoming
> >> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >> >>> > >> > >> >>> > > >>>>>> actually
> >> >>> > be
> >> >>> > >> > >> >>> > > >>>>>> quite
> >> >>> > >> > >> >>> > natural
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
> >> >>> > >> > >> >>> > > >>>>>> it's
> >> >>> not
> >> >>> > >> > >> >>> > > >>>>>> possible
> >> >>> > >> > >> >>> > > >>>>>> in
> >> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
> >> >>> > >> > >> >>> > > >>>>>> though.
> >> >>> > >> > >> >>> > > >>>>>> (In
> >> >>> > >> > essence,
> >> >>> > >> > >> >>> > gRPC-Java
> >> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
> >> >>> > >> > >> >>> > > >>>>>> sync-always.)
> >> >>> There
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > experimental
> >> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
> >> >>> > >> > >> >>> > > >>>>>> similar
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> Java,
> >> >>> > >> > >> >>> > > >>>>>> but
> >> >>> > >> > >> >>> > > >>>>>> those
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > still
> >> >>> > >> > >> >>> > > >>>>>> under
> >> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
> >> >>> > >> > >> >>> > > >>>>>> APIs
> >> >>> which
> >> >>> > >> > >> >>> > > >>>>>> have
> >> >>> > >> > >> been
> >> >>> > >> > >> >>> > around
> >> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >> >>> > >> > >> >>> > > >>>>>>> Might
> >> >>> > >> > >> >>> > > >>>>>>> worth
> >> >>> > >> > >> >>> > > >>>>>>> discussion
> >> >>> > >> > >> >>> > > >>>>>>> here
> >> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >> >>> provide a
> >> >>> > >> > >> >>> > > >>>>>>>> Python
> >> >>> > >> > >> >>> > strawman.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
> >> >>> > >> > >> >>> > > >>>>>>>> you
> >> >>> > could
> >> >>> > >> > >> >>> > > >>>>>>>> interleave
> >> >>> > >> > >> >>> > > >>>> uploads
> >> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
> >> >>> > >> > >> >>> > > >>>>>>>> Right
> >> >>> now,
> >> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >> >>> > >> > >> >>> > APIs
> >> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >> >>> > >> > >> >>> > > >>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>> server
> >> >>> > >> > >> >>> > > >>>>>>>> wait
> >> >>> > >> > >> >>> > > >>>>>>>> for
> >> >>> > >> > >> >>> > each
> >> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
> >> >>> > >> > >> >>> > > >>>>>>>> (gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>> give
> >> >>> > >> > >> >>> > > >>>>>>>> us
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >> >>> overall
> >> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >> >>> > >> > >> >>> > > >>>>>>>> As
> >> >>> > >> > >> >>> > an
> >> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
> >> >>> > >> > >> >>> > > >>>>>>>> eventually
> >> >>> > >> > >> >>> > > >>>>>>>> we
> >> >>> > >> > >> >>> > > >>>>>>>> will
> >> >>> > >> > >> >>> > > >>>>>>>> want
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > design
> >> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> >>> > >> > bidirectional
> >> >>> > >> > >> >>> > > >>>>>>>> stream
> >> >>> > >> > >> >>> > > >>>>>>>> like
> >> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >> >>> > >> > >> >>> > > >>>>>>>> easier
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>> run
> >> >>> > >> > >> >>> > into.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >> >>> > >> > but
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > main
> >> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
> >> >>> > >> > >> >>> > > >>>>>>>> deploy,
> >> >>> > >> > >> >>> > > >>>>>>>> two
> >> >>> > >> > >> >>> > > >>>>>>>> separate
> >> >>> > >> > >> >>> > > >>>>>>>> calls
> >> >>> > >> > >> >>> > > >>>>>>>> could
> >> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> has
> >> >>> > >> > >> >>> > > >>>>>>>> some
> >> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >> >>> > >> > >> >>> > > >>>>>>>> away
> >> >>> in
> >> >>> > >> > >> >>> > > >>>>>>>> between
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > two
> >> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >> >>> > >> > >> >>> > > >>>>>>>> another
> >> >>> > >> > instance
> >> >>> > >> > >> >>> > available,
> >> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
> >> >>> > >> > >> >>> > > >>>>>>>> new
> >> >>> > server
> >> >>> > >> > >> without
> >> >>> > >> > >> >>> > > >>>>>>>> warning.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
> >> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
> >> >>> > >> > >> >>> > > >>>>>>>>> sense.
> >> >>> > >> > >> >>> > > >>>>>>>>> I
> >> >>> > like
> >> >>> > >> > >> >>> > > >>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >> >>> > >> > >> >>> > > >>>>>>>>> thing
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>>> help
> >> >>> > >> > >> >>> > me
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
> >> >>> > >> > >> >>> > > >>>>>>>>> life
> >> >>> use
> >> >>> > >> > >> >>> > > >>>>>>>>> case.
> >> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >> >>> > >> > >> >>> > what
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >> >>> > >> > >> >>> > > >>>>>>>>> Would
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client
> >> >>> > >> > >> >>> > > >>>>>>>>> block
> >> >>> > >> > >> >>> > > >>>>>>>> till
> >> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> >>> > >> > >> flight.vector_mult(a,
> >> >>> > >> > >> >>> > > >>>>>>>>> b)
> >> >>> > >> > >> >>> > or
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
> >> >>> > >> > >> >>> > > >>>>>>>>> computation
> >> >>> > >> > >> >>> > > >>>>>>>>> was
> >> >>> > >> > >> >>> > > >>>>>>>>> done.
> >> >>> > >> > >> >>> > > >>>>>>>>> If
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> later
> >> >>> > >> > >> >>> > > >>>>>>>>> how
> >> >>> > >> > >> >>> > > >>>>>>>>> is
> >> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >> >>> suppose
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> this
> >> >>> > >> > >> >>> > could
> >> >>> > >> > >> >>> > > >>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >> >>> > >> > >> >>> > > >>>>>>>>> interface
> >> >>> > >> > >> >>> > > >>>>>>>>> but
> >> >>> > >> > >> rather
> >> >>> > >> > >> >>> > > >>>>>>>>> by a
> >> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
> >> >>> > >> > >> >>> > > >>>>>>>>> <
> >> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
> >> >>> > >> > >> >>> > > >>>>>>>>>> successfully
> >> >>> > >> > >> >>> > > >>>>>>>>>> so
> >> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >> >>> > >> > but
> >> >>> > >> > >> we
> >> >>> > >> > >> >>> > > >>>>>>>>>> have
> >> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
> >> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
> >> >>> being
> >> >>> > >> > >> >>> > > >>>>>>>>>> able
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> both
> >> >>> > >> > >> >>> > > >>>>>>>>>> send
> >> >>> > >> > >> >>> > > >>>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >> >>> To
> >> >>> > >> > >> >>> > > >>>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >> >>> > >> > >> >>> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> method:
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >> >>> comment
> >> >>> > >> > >> >>> > > >>>>>>>>>> on
> >> >>> > >> > the
> >> >>> > >> > >> >>> > document.
> >> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > vote.
> >> >>> > >> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >> >>> (on
> >> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >> >>> > >> > >> >>> > > >>>>>>>>>> as
> >> >>> > >> > >> >>> > > >>>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >> >>> > >> > of
> >> >>> > >> > >> >>> > > >>>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> --
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
> >> >>> > >> > >> >>> > > >>>>>>>>> Engineer
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> >>> > >> > >> join
> >> >>> > >> > >> >>> > > >>>>>>>>> our
> >> >>> > >> > >> >>> > > >>>>>>>>> community
> >> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >> >>> Download
> >> >>> > >> > Dremio
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>
> >> >>> > >> > >> >
> >> >>> > >> > >>
> >> >>> > >> > >
> >> >>> > >> >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >
> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Wes,

Thanks for the review. I've broken out the format change into this PR:
https://github.com/apache/arrow/pull/6686

Best,
David

On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> hi David,
>
> I did a preliminary view and things look to be on the right track
> there. What do you think about breaking out the protocol changes (and
> adding appropriate comments) so we can have a vote on that in
> relatively short order?
>
> - Wes
>
> On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>>
>> Following up here, I've submitted a draft implementation for C++:
>> https://github.com/apache/arrow/pull/6656
>>
>> The core functionality is there, but there are still holes that I need
>> to implement. Compared to the draft spec, the client also sends a
>> FlightDescriptor to begin with, though it's currently not exposed.
>> This provides consistency with DoGet/DoPut which also send a message
>> to begin with to describe the stream to the server.
>>
>> Andy, I hope this helps clarify whether it meets your needs.
>>
>> Best,
>> David
>>
>> On 2/25/20, David Li <li...@gmail.com> wrote:
>> > Hey Andy,
>> >
>> > I've been rather busy unfortunately. I had started on an
>> > implementation in C++ to provide as part of this discussion, but it's
>> > not complete. I'm hoping to have more done in March.
>> >
>> > Best,
>> > David
>> >
>> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> >> I was wondering if there had been any momentum on this (the
>> >> BiDirectional
>> >> RPC design)?
>> >>
>> >> I'm interested in this for the use case of Apache Spark sending a
>> >> stream
>> >> of
>> >> data to another process to invoke custom code and then receive a
>> >> stream
>> >> back with the transformed data.
>> >>
>> >> Thanks,
>> >>
>> >> Andy.
>> >>
>> >>
>> >>
>> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> >> wrote:
>> >>
>> >>> I support moving forward with the current proposal.
>> >>>
>> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Just following up here again, any other thoughts?
>> >>> >
>> >>> > I think we do have justifications for potentially separate streams
>> >>> > in
>> >>> > a call, but that's more of an orthogonal question - it doesn't need
>> >>> > to
>> >>> > be addressed here. I do agree that it very much complicates things.
>> >>> >
>> >>> > Thanks,
>> >>> > David
>> >>> >
>> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > > I would generally agree with this. Note that you have the
>> >>> > > possibility
>> >>> > > to use unions-of-structs to send record batches with different
>> >>> > > schemas
>> >>> > > in the same stream, though with some added complexity on each
>> >>> > > side
>> >>> > >
>> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
>> >>> > > <ja...@apache.org>
>> >>> > wrote:
>> >>> > >>
>> >>> > >> I'd vote for explicitly not supported. We should keep our
>> >>> > >> primitives
>> >>> > >> narrow.
>> >>> > >>
>> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>> >>> > >> wrote:
>> >>> > >>
>> >>> > >> > Thanks for the feedback.
>> >>> > >> >
>> >>> > >> > I do think if we had explicitly embraced gRPC from the
>> >>> > >> > beginning,
>> >>> > >> > there are a lot of places where things could be made more
>> >>> > >> > ergonomic,
>> >>> > >> > including with the metadata fields. But it would also have
>> >>> > >> > locked
>> >>> out
>> >>> > >> > us of potential future transports.
>> >>> > >> >
>> >>> > >> > On another note: I hesitate to put too much into this method,
>> >>> > >> > but
>> >>> > >> > we
>> >>> > >> > are looking at use cases where potentially, a client may want
>> >>> > >> > to
>> >>> > >> > upload multiple distinct datasets (with differing schemas).
>> >>> > >> > (This
>> >>> is a
>> >>> > >> > little tentative, and I can get more details...) Right now,
>> >>> > >> > each
>> >>> > >> > logical stream in Flight must have a single, consistent
>> >>> > >> > schema;
>> >>> would
>> >>> > >> > it make sense to look at ways to relax this, or declare this
>> >>> > >> > explicitly out of scope (and require multiple calls and
>> >>> > >> > coordination
>> >>> > >> > with the deployment topology) in order to accomplish this?
>> >>> > >> >
>> >>> > >> > Best,
>> >>> > >> > David
>> >>> > >> >
>> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
>> >>> > >> > > proposal
>> >>> looks
>> >>> > >> > > good
>> >>> > >> > > to me.
>> >>> > >> > >
>> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>> >>> > >> > > <li...@gmail.com>
>> >>> > >> > > wrote:
>> >>> > >> > >
>> >>> > >> > >> I've updated the proposal.
>> >>> > >> > >>
>> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> >>> Protobuf
>> >>> > >> > types,
>> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> to
>> >>> > >> > >> C++
>> >>> > >> > >> users without causing build pains,
>> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> without somehow being compatible with the Protobuf wheels
>> >>> > >> > >> (by
>> >>> > >> > >> linking
>> >>> > >> > >> to the same version, and doing magic to turn the C++
>> >>> > >> > >> Protobufs
>> >>> into
>> >>> > >> > >> the Python ones),
>> >>> > >> > >> - All our other application-defined fields are already
>> >>> > >> > >> bytes.
>> >>> > >> > >>
>> >>> > >> > >> Applications that want structure can encode JSON or
>> >>> > >> > >> Protobuf
>> >>> > >> > >> Any
>> >>> > >> > >> into
>> >>> > >> > >> the bytes field themselves, much as you can already do for
>> >>> Ticket,
>> >>> > >> > >> commands in FlightDescriptors, and application metadata in
>> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
>> >>> > >> > >> than
>> >>> using
>> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>> >>> > >> > >> and
>> >>> > must
>> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> >>> message.
>> >>> > >> > >>
>> >>> > >> > >> If we decide on using bytes, then I don't think it makes
>> >>> > >> > >> sense
>> >>> > >> > >> to
>> >>> > >> > >> define a new message with a oneof either, since it would be
>> >>> > >> > >> redundant.
>> >>> > >> > >>
>> >>> > >> > >> Thanks,
>> >>> > >> > >> David
>> >>> > >> > >>
>> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> >>> > >> > >> > I've been extremely backlogged, I will update the
>> >>> > >> > >> > proposal
>> >>> when I
>> >>> > >> > >> > get
>> >>> > >> > >> > a chance and reply here when done.
>> >>> > >> > >> >
>> >>> > >> > >> > Best,
>> >>> > >> > >> > David
>> >>> > >> > >> >
>> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
>> >>> > >> > >> >> passed.
>> >>> It
>> >>> > >> > >> >> seems
>> >>> > >> > >> >> there are still some questions here, could we summarize
>> >>> > >> > >> >> what
>> >>> are
>> >>> > >> > >> >> the
>> >>> > >> > >> >> alternatives along with any public API implications so
>> >>> > >> > >> >> we
>> >>> > >> > >> >> can
>> >>> > try
>> >>> > >> > >> >> to
>> >>> > >> > >> >> render a decision?
>> >>> > >> > >> >>
>> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>> >>> li.davidm96@gmail.com
>> >>> > >
>> >>> > >> > >> >> wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Hi Wes,
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Responses inline:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>> >>> wesmckinn@gmail.com>
>> >>> > >> > wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> >>> > >> > >> >>> > <li...@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > The question is whether to repurpose the existing
>> >>> > FlightData
>> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
>> >>> > >> > >> >>> > > filled
>> >>> in
>> >>> > >> > >> >>> > > and
>> >>> > >> > >> data
>> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
>> >>> > >> > >> >>> > > wrap
>> >>> > >> > >> >>> > > the
>> >>> > >> > >> FlightData
>> >>> > >> > >> >>> > > structure in another structure that explicitly
>> >>> > distinguishes
>> >>> > >> > >> between
>> >>> > >> > >> >>> > > control and data messages.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>> >>> > >> > >> >>> > with
>> >>> > >> > >> >>> > empty
>> >>> > >> > >> body.
>> >>> > >> > >> >>> > One question to consider is what changes (if any)
>> >>> > >> > >> >>> > would
>> >>> need
>> >>> > to
>> >>> > >> > >> >>> > be
>> >>> > >> > >> >>> > made to public APIs in either scenario.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
>> >>> > >> > >> >>> empty
>> >>> data
>> >>> > >> > >> >>> messages
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> the future. This would be a breaking change, but
>> >>> > >> > >> >>> wouldn't
>> >>> > change
>> >>> > >> > >> >>> the
>> >>> > >> > >> >>> wire
>> >>> > >> > >> >>> format. I think the APIs could be changed backwards
>> >>> compatibly,
>> >>> > >> > >> >>> though.
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > The other question is how to handle the metadata
>> >>> > >> > >> >>> > > fields.
>> >>> So
>> >>> > >> > >> >>> > > far,
>> >>> > >> > >> >>> > > we've
>> >>> > >> > >> >>> > > used bytestring fields for application-defined
>> >>> > >> > >> >>> > > data.
>> >>> > >> > >> >>> > > This
>> >>> > is
>> >>> > >> > >> >>> > > workable
>> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
>> >>> > >> > >> >>> > > of
>> >>> those
>> >>> > >> > >> >>> > > fields,
>> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>> >>> > >> > >> >>> > > into/from
>> >>> the
>> >>> > >> > >> >>> > > bytestring
>> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> >>> > >> > >> >>> > > dynamically
>> >>> > >> > >> >>> > > typed
>> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
>> >>> > >> > >> >>> > > be
>> >>> > >> > >> >>> > > exposing
>> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>> >>> > >> > >> >>> > > combination
>> >>> of
>> >>> > >> > >> >>> > > a
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>> >>> > >> > >> >>> > > Protobuf
>> >>> > >> > >> >>> > > Any
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
>> >>> > >> > >> >>> > > actually
>> >>> > cleaner
>> >>> > >> > >> >>> > > in
>> >>> > >> > >> any
>> >>> > >> > >> >>> > > of the language APIs, though.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > particular
>> >>> > >> > >> >>> > Flight implementation seems on first principles like
>> >>> > >> > >> >>> > the
>> >>> most
>> >>> > >> > >> flexible
>> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>> >>> > >> > >> >>> > _must_
>> >>> be
>> >>> > a
>> >>> > >> > >> >>> > protobuf?
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
>> >>> > >> > >> >>> API,
>> >>> > >> > >> >>> but
>> >>> it
>> >>> > >> > would
>> >>> > >> > >> >>> have
>> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>> >>> > >> > >> >>> Then
>> >>> > the
>> >>> > >> > >> >>> question
>> >>> > >> > >> >>> would just be how to expose this (would be easier in
>> >>> > >> > >> >>> Java,
>> >>> > harder
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> C++).
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > David
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>> >>> > >> > >> >>> > > wrote:
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>> >>> > >> > >> >>> > > > non-protobuf
>> >>> > >> > >> >>> > > > terms?
>> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
>> >>> > >> > >> >>> > > > use
>> >>> > Flight.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Regards
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Antoine.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
>> >>> > >> > >> >>> > > >> it
>> >>> > would
>> >>> > >> > just
>> >>> > >> > >> be
>> >>> > >> > >> >>> > > >> application-level logic. (The official guide
>> >>> > >> > >> >>> > > >> doesn't
>> >>> > even
>> >>> > >> > >> mention
>> >>> > >> > >> >>> > > >> it
>> >>> > >> > >> >>> > > >> in the encoding docs; I found
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > >> > >> >>> > > >> as well.)
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> >>> > >> > >> >>> > > >> essentially
>> >>> > >> > >> >>> > > >> inlining
>> >>> > >> > >> >>> > > >> the definition of Any, e.g.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightMessage {
>> >>> > >> > >> >>> > > >>   oneof message {
>> >>> > >> > >> >>> > > >>     FlightData data = 1;
>> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>> >>> > >> > >> >>> > > >>   }
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightAny {
>> >>> > >> > >> >>> > > >>   string type = 1;
>> >>> > >> > >> >>> > > >>   bytes data = 2;
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Is this correct?
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>> >>> > >> > >> >>> > > >> for
>> >>> > >> > >> >>> > > >> DoGet/DoPut
>> >>> > >> > >> >>> > > >> as
>> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>> >>> > >> > >> >>> > > >> consistent
>> >>> > with
>> >>> > >> > >> >>> > > >> all
>> >>> > >> > >> >>> > > >> of
>> >>> > >> > >> >>> > > >> them,
>> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
>> >>> > >> > >> >>> > > >> own
>> >>> > >> > >> >>> > > >> thing.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Thanks,
>> >>> > >> > >> >>> > > >> David
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>> >>> wrote:
>> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>> >>> > >> > >> >>> > > >>> behavior
>> >>> > >> > >> >>> > > >>> without
>> >>> > >> > >> >>> > > >>> exposing
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
>> >>> > >> > >> >>> > > >>> we
>> >>> could
>> >>> > >> > >> >>> > > >>> expose
>> >>> > >> > >> >>> > > >>> as
>> >>> > >> > >> >>> > > >>> two
>> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
>> >>> > >> > >> >>> > > >>> for
>> >>> > >> > >> >>> > > >>> whatever
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> people
>> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>> >>> > >> > >> >>> > > >>> (Basically
>> >>> a
>> >>> > >> > >> >>> > > >>> user
>> >>> > >> > >> >>> > > >>> could
>> >>> > >> > >> >>> > > >>> use
>> >>> > >> > >> >>> > > >>> any
>> >>> > >> > >> >>> > > >>> with it easily but they could also use any
>> >>> > >> > >> >>> > > >>> other
>> >>> > >> > >> >>> > > >>> mechanism).
>> >>> > >> > >> >>> > > >>> At
>> >>> > >> > >> >>> > least in
>> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
>> >>> > >> > >> >>> > > >>> Are
>> >>> other
>> >>> > >> > >> language
>> >>> > >> > >> >>> > > >>> bindings
>> >>> > >> > >> >>> > > >>> less diy?
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
>> >>> > >> > >> >>> > > >>> +
>> >>> > >> > >> >>> > > >>> metadata
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> it
>> >>> > >> > >> >>> > just
>> >>> > >> > >> >>> > > >>> seemed a bit janky.
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
>> >>> > >> > >> >>> > > >>> object
>> >>> > thing,
>> >>> > >> > >> >>> > > >>> I
>> >>> > >> > >> >>> > > >>> wonder
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > we
>> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
>> >>> > >> > >> >>> > > >>> same
>> >>> > property
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> we
>> >>> > >> > >> >>> > think it
>> >>> > >> > >> >>> > > >>> is
>> >>> > >> > >> >>> > > >>> a good idea...
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> >>> > >> > >> li.davidm96@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>>> I was definitely considering having control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > without
>> >>> > >> > >> >>> > > >>>> data,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
>> >>> > >> > >> >>> > > >>>> FlightData
>> >>> > >> > >> >>> > > >>>> with
>> >>> > >> > >> >>> > > >>>> only
>> >>> > >> > >> >>> > app_metadata
>> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>> >>> > FlightData
>> >>> > >> > >> >>> > > >>>> should
>> >>> > >> > >> >>> > always
>> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>> >>> > >> > >> >>> > > >>>> the
>> >>> > >> > >> >>> > > >>>> documentation
>> >>> > >> > >> >>> > > >>>> on
>> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>> >>> > >> > >> >>> > > >>>> about
>> >>> > >> > >> >>> > > >>>> having
>> >>> > >> > >> >>> > > >>>> a
>> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
>> >>> > >> > >> >>> > > >>>> that
>> >>> from
>> >>> > >> > >> >>> > > >>>> happening,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
>> >>> > >> > >> >>> > > >>>> and
>> >>> > >> > >> >>> > > >>>> control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > >> >>> > is
>> >>> > >> > >> >>> > > >>>> cleaner.
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>> >>> > >> > >> >>> > > >>>> refrained
>> >>> > >> > >> >>> > > >>>> from
>> >>> > >> > >> >>> > > >>>> exposing
>> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
>> >>> > >> > >> >>> > > >>>> change
>> >>> that
>> >>> > >> > >> >>> > > >>>> now?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> Best,
>> >>> > >> > >> >>> > > >>>> David
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
>> >>> > >> > >> >>> > > >>>>> we
>> >>> > >> > >> >>> > > >>>>> use
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> doget/doput
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
>> >>> > >> > >> >>> > > >>>>> grpc's
>> >>> > async
>> >>> > >> > >> >>> > > >>>>> always
>> >>> > >> > >> >>> > pattern.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
>> >>> > >> > >> >>> > > >>>>> it
>> >>> > >> > >> >>> > > >>>>> is
>> >>> > >> > >> >>> > > >>>>> overloading
>> >>> > >> > >> >>> > > >>>>> the
>> >>> > >> > >> >>> > > >>>> message
>> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>> >>> control
>> >>> > >> > >> >>> > > >>>>> message
>> >>> > >> > >> >>> > > >>>> independently
>> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>> >>> something
>> >>> > >> > >> >>> > > >>>>> like
>> >>> > >> > >> >>> > > >>>>> an
>> >>> > >> > >> >>> > > >>>>> empty
>> >>> > >> > >> >>> > > >>>> flight
>> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
>> >>> > >> > >> >>> > > >>>>> not
>> >>> > support
>> >>> > >> > >> >>> > > >>>>> a
>> >>> > >> > >> >>> > > >>>>> container
>> >>> > >> > >> >>> > > >>>>> object
>> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
>> >>> > >> > >> >>> > > >>>>> so
>> >>> users
>> >>> > >> > >> >>> > > >>>>> can
>> >>> > >> > >> >>> > > >>>>> add
>> >>> > >> > >> >>> > > >>>>> more
>> >>> > >> > >> >>> > data
>> >>> > >> > >> >>> > > >>>>> as
>> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>> >>> > >> > >> >>> > > >>>>> the
>> >>> Any
>> >>> > >> > >> >>> > > >>>>> messages.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
>> >>> > >> > >> >>> > > >>>>>> a
>> >>> result
>> >>> > >> > >> >>> > > >>>>>> of
>> >>> > >> > >> prior
>> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>> >>> > >> > >> >>> > > >>>>>> message
>> >>> to
>> >>> > >> > carry
>> >>> > >> > >> >>> > > >>>>>> that
>> >>> > >> > >> >>> > kind
>> >>> > >> > >> >>> > > >>>>>> of information.
>> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > handle
>> >>> > >> > >> >>> > > >>>>>> incoming
>> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>> >>> > >> > >> >>> > > >>>>>> actually
>> >>> > be
>> >>> > >> > >> >>> > > >>>>>> quite
>> >>> > >> > >> >>> > natural
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
>> >>> > >> > >> >>> > > >>>>>> it's
>> >>> not
>> >>> > >> > >> >>> > > >>>>>> possible
>> >>> > >> > >> >>> > > >>>>>> in
>> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
>> >>> > >> > >> >>> > > >>>>>> though.
>> >>> > >> > >> >>> > > >>>>>> (In
>> >>> > >> > essence,
>> >>> > >> > >> >>> > gRPC-Java
>> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
>> >>> > >> > >> >>> > > >>>>>> sync-always.)
>> >>> There
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > experimental
>> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
>> >>> > >> > >> >>> > > >>>>>> similar
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> Java,
>> >>> > >> > >> >>> > > >>>>>> but
>> >>> > >> > >> >>> > > >>>>>> those
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > still
>> >>> > >> > >> >>> > > >>>>>> under
>> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
>> >>> > >> > >> >>> > > >>>>>> APIs
>> >>> which
>> >>> > >> > >> >>> > > >>>>>> have
>> >>> > >> > >> been
>> >>> > >> > >> >>> > around
>> >>> > >> > >> >>> > > >>>>>> for quite a while).
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>> David
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>> >>> > >> > >> >>> > > >>>>>>> Might
>> >>> > >> > >> >>> > > >>>>>>> worth
>> >>> > >> > >> >>> > > >>>>>>> discussion
>> >>> > >> > >> >>> > > >>>>>>> here
>> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>> >>> provide a
>> >>> > >> > >> >>> > > >>>>>>>> Python
>> >>> > >> > >> >>> > strawman.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
>> >>> > >> > >> >>> > > >>>>>>>> you
>> >>> > could
>> >>> > >> > >> >>> > > >>>>>>>> interleave
>> >>> > >> > >> >>> > > >>>> uploads
>> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
>> >>> > >> > >> >>> > > >>>>>>>> Right
>> >>> now,
>> >>> > >> > >> >>> > > >>>>>>>> synchronous
>> >>> > >> > >> >>> > APIs
>> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>> >>> > >> > >> >>> > > >>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>> server
>> >>> > >> > >> >>> > > >>>>>>>> wait
>> >>> > >> > >> >>> > > >>>>>>>> for
>> >>> > >> > >> >>> > each
>> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
>> >>> > >> > >> >>> > > >>>>>>>> (gRPC
>> >>> > >> > >> >>> > > >>>>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>> give
>> >>> > >> > >> >>> > > >>>>>>>> us
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>> >>> overall
>> >>> > >> > >> >>> > > >>>>>>>> timeout.)
>> >>> > >> > >> >>> > > >>>>>>>> As
>> >>> > >> > >> >>> > an
>> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
>> >>> > >> > >> >>> > > >>>>>>>> eventually
>> >>> > >> > >> >>> > > >>>>>>>> we
>> >>> > >> > >> >>> > > >>>>>>>> will
>> >>> > >> > >> >>> > > >>>>>>>> want
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > design
>> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> >>> > >> > bidirectional
>> >>> > >> > >> >>> > > >>>>>>>> stream
>> >>> > >> > >> >>> > > >>>>>>>> like
>> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>> >>> > >> > >> >>> > > >>>>>>>> pitfalls
>> >>> > >> > >> >>> > > >>>>>>>> easier
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>> run
>> >>> > >> > >> >>> > into.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> >>> > >> > >> >>> > > >>>>>>>> proposal,
>> >>> > >> > but
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > main
>> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
>> >>> > >> > >> >>> > > >>>>>>>> deploy,
>> >>> > >> > >> >>> > > >>>>>>>> two
>> >>> > >> > >> >>> > > >>>>>>>> separate
>> >>> > >> > >> >>> > > >>>>>>>> calls
>> >>> > >> > >> >>> > > >>>>>>>> could
>> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>> >>> > >> > >> >>> > > >>>>>>>> Additionally,
>> >>> > >> > >> >>> > > >>>>>>>> gRPC
>> >>> > >> > >> >>> > > >>>>>>>> has
>> >>> > >> > >> >>> > > >>>>>>>> some
>> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>> >>> > >> > >> >>> > > >>>>>>>> away
>> >>> in
>> >>> > >> > >> >>> > > >>>>>>>> between
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > two
>> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>> >>> > >> > >> >>> > > >>>>>>>> another
>> >>> > >> > instance
>> >>> > >> > >> >>> > available,
>> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
>> >>> > >> > >> >>> > > >>>>>>>> new
>> >>> > server
>> >>> > >> > >> without
>> >>> > >> > >> >>> > > >>>>>>>> warning.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
>> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
>> >>> > >> > >> >>> > > >>>>>>>>> sense.
>> >>> > >> > >> >>> > > >>>>>>>>> I
>> >>> > like
>> >>> > >> > >> >>> > > >>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> possibility
>> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>> >>> > >> > >> >>> > > >>>>>>>>> thing
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>>> help
>> >>> > >> > >> >>> > me
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
>> >>> > >> > >> >>> > > >>>>>>>>> life
>> >>> use
>> >>> > >> > >> >>> > > >>>>>>>>> case.
>> >>> > >> > >> >>> > > >>>>>>>>> Also,
>> >>> > >> > >> >>> > what
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>> >>> > >> > >> >>> > > >>>>>>>>> asyc?
>> >>> > >> > >> >>> > > >>>>>>>>> Would
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client
>> >>> > >> > >> >>> > > >>>>>>>>> block
>> >>> > >> > >> >>> > > >>>>>>>> till
>> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> >>> > >> > >> flight.vector_mult(a,
>> >>> > >> > >> >>> > > >>>>>>>>> b)
>> >>> > >> > >> >>> > or
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
>> >>> > >> > >> >>> > > >>>>>>>>> computation
>> >>> > >> > >> >>> > > >>>>>>>>> was
>> >>> > >> > >> >>> > > >>>>>>>>> done.
>> >>> > >> > >> >>> > > >>>>>>>>> If
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> later
>> >>> > >> > >> >>> > > >>>>>>>>> how
>> >>> > >> > >> >>> > > >>>>>>>>> is
>> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>> >>> suppose
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> this
>> >>> > >> > >> >>> > could
>> >>> > >> > >> >>> > > >>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>> >>> > >> > >> >>> > > >>>>>>>>> interface
>> >>> > >> > >> >>> > > >>>>>>>>> but
>> >>> > >> > >> rather
>> >>> > >> > >> >>> > > >>>>>>>>> by a
>> >>> > >> > >> >>> > > >>>>>>>>> function/util?
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Best,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
>> >>> > >> > >> >>> > > >>>>>>>>> <
>> >>> > >> > >> >>> > li.davidm96@gmail.com>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
>> >>> > >> > >> >>> > > >>>>>>>>>> successfully
>> >>> > >> > >> >>> > > >>>>>>>>>> so
>> >>> > >> > >> >>> > > >>>>>>>>>> far,
>> >>> > >> > but
>> >>> > >> > >> we
>> >>> > >> > >> >>> > > >>>>>>>>>> have
>> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
>> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
>> >>> being
>> >>> > >> > >> >>> > > >>>>>>>>>> able
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> both
>> >>> > >> > >> >>> > > >>>>>>>>>> send
>> >>> > >> > >> >>> > > >>>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> call.
>> >>> To
>> >>> > >> > >> >>> > > >>>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>>> end,
>> >>> > >> > >> >>> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> method:
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>> >>> comment
>> >>> > >> > >> >>> > > >>>>>>>>>> on
>> >>> > >> > the
>> >>> > >> > >> >>> > document.
>> >>> > >> > >> >>> > > >>>>>>>>>> I'd
>> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > >> > >> >>> > > >>>>>>>>>> relatively
>> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
>> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > vote.
>> >>> > >> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> decided
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>> >>> > >> > >> >>> > > >>>>>>>>>> proposed
>> >>> (on
>> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>> >>> > >> > >> >>> > > >>>>>>>>>> as
>> >>> > >> > >> >>> > > >>>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
>> >>> > >> > of
>> >>> > >> > >> >>> > > >>>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> --
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
>> >>> > >> > >> >>> > > >>>>>>>>> Engineer
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> >>> > >> > >> join
>> >>> > >> > >> >>> > > >>>>>>>>> our
>> >>> > >> > >> >>> > > >>>>>>>>> community
>> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>> >>> Download
>> >>> > >> > Dremio
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>
>> >>> > >> > >> >
>> >>> > >> > >>
>> >>> > >> > >
>> >>> > >> >
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
hi David,

I did a preliminary view and things look to be on the right track
there. What do you think about breaking out the protocol changes (and
adding appropriate comments) so we can have a vote on that in
relatively short order?

- Wes

On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>
> Following up here, I've submitted a draft implementation for C++:
> https://github.com/apache/arrow/pull/6656
>
> The core functionality is there, but there are still holes that I need
> to implement. Compared to the draft spec, the client also sends a
> FlightDescriptor to begin with, though it's currently not exposed.
> This provides consistency with DoGet/DoPut which also send a message
> to begin with to describe the stream to the server.
>
> Andy, I hope this helps clarify whether it meets your needs.
>
> Best,
> David
>
> On 2/25/20, David Li <li...@gmail.com> wrote:
> > Hey Andy,
> >
> > I've been rather busy unfortunately. I had started on an
> > implementation in C++ to provide as part of this discussion, but it's
> > not complete. I'm hoping to have more done in March.
> >
> > Best,
> > David
> >
> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> I was wondering if there had been any momentum on this (the BiDirectional
> >> RPC design)?
> >>
> >> I'm interested in this for the use case of Apache Spark sending a stream
> >> of
> >> data to another process to invoke custom code and then receive a stream
> >> back with the transformed data.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >>
> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>
> >>> I support moving forward with the current proposal.
> >>>
> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
> >>>
> >>> > Just following up here again, any other thoughts?
> >>> >
> >>> > I think we do have justifications for potentially separate streams in
> >>> > a call, but that's more of an orthogonal question - it doesn't need to
> >>> > be addressed here. I do agree that it very much complicates things.
> >>> >
> >>> > Thanks,
> >>> > David
> >>> >
> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > > I would generally agree with this. Note that you have the
> >>> > > possibility
> >>> > > to use unions-of-structs to send record batches with different
> >>> > > schemas
> >>> > > in the same stream, though with some added complexity on each side
> >>> > >
> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >>
> >>> > >> I'd vote for explicitly not supported. We should keep our
> >>> > >> primitives
> >>> > >> narrow.
> >>> > >>
> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Thanks for the feedback.
> >>> > >> >
> >>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> >>> > >> > there are a lot of places where things could be made more
> >>> > >> > ergonomic,
> >>> > >> > including with the metadata fields. But it would also have locked
> >>> out
> >>> > >> > us of potential future transports.
> >>> > >> >
> >>> > >> > On another note: I hesitate to put too much into this method, but
> >>> > >> > we
> >>> > >> > are looking at use cases where potentially, a client may want to
> >>> > >> > upload multiple distinct datasets (with differing schemas). (This
> >>> is a
> >>> > >> > little tentative, and I can get more details...) Right now, each
> >>> > >> > logical stream in Flight must have a single, consistent schema;
> >>> would
> >>> > >> > it make sense to look at ways to relax this, or declare this
> >>> > >> > explicitly out of scope (and require multiple calls and
> >>> > >> > coordination
> >>> > >> > with the deployment topology) in order to accomplish this?
> >>> > >> >
> >>> > >> > Best,
> >>> > >> > David
> >>> > >> >
> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> >>> looks
> >>> > >> > > good
> >>> > >> > > to me.
> >>> > >> > >
> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >>> > >> > > <li...@gmail.com>
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > >> I've updated the proposal.
> >>> > >> > >>
> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >>> Protobuf
> >>> > >> > types,
> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
> >>> > >> > >> to
> >>> > >> > >> C++
> >>> > >> > >> users without causing build pains,
> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >>> > >> > >> field
> >>> > >> > >> without somehow being compatible with the Protobuf wheels (by
> >>> > >> > >> linking
> >>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> >>> into
> >>> > >> > >> the Python ones),
> >>> > >> > >> - All our other application-defined fields are already bytes.
> >>> > >> > >>
> >>> > >> > >> Applications that want structure can encode JSON or Protobuf
> >>> > >> > >> Any
> >>> > >> > >> into
> >>> > >> > >> the bytes field themselves, much as you can already do for
> >>> Ticket,
> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> >>> using
> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >>> > >> > >> and
> >>> > must
> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >>> message.
> >>> > >> > >>
> >>> > >> > >> If we decide on using bytes, then I don't think it makes sense
> >>> > >> > >> to
> >>> > >> > >> define a new message with a oneof either, since it would be
> >>> > >> > >> redundant.
> >>> > >> > >>
> >>> > >> > >> Thanks,
> >>> > >> > >> David
> >>> > >> > >>
> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >>> > >> > >> > I've been extremely backlogged, I will update the proposal
> >>> when I
> >>> > >> > >> > get
> >>> > >> > >> > a chance and reply here when done.
> >>> > >> > >> >
> >>> > >> > >> > Best,
> >>> > >> > >> > David
> >>> > >> > >> >
> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >>> > >> > >> >> passed.
> >>> It
> >>> > >> > >> >> seems
> >>> > >> > >> >> there are still some questions here, could we summarize
> >>> > >> > >> >> what
> >>> are
> >>> > >> > >> >> the
> >>> > >> > >> >> alternatives along with any public API implications so we
> >>> > >> > >> >> can
> >>> > try
> >>> > >> > >> >> to
> >>> > >> > >> >> render a decision?
> >>> > >> > >> >>
> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >>> li.davidm96@gmail.com
> >>> > >
> >>> > >> > >> >> wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> Hi Wes,
> >>> > >> > >> >>>
> >>> > >> > >> >>> Responses inline:
> >>> > >> > >> >>>
> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >>> wesmckinn@gmail.com>
> >>> > >> > wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >>> > >> > >> >>> > <li...@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >>> > FlightData
> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >>> > >> > >> >>> > > filled
> >>> in
> >>> > >> > >> >>> > > and
> >>> > >> > >> data
> >>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
> >>> > >> > >> >>> > > the
> >>> > >> > >> FlightData
> >>> > >> > >> >>> > > structure in another structure that explicitly
> >>> > distinguishes
> >>> > >> > >> between
> >>> > >> > >> >>> > > control and data messages.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >>> > >> > >> >>> > with
> >>> > >> > >> >>> > empty
> >>> > >> > >> body.
> >>> > >> > >> >>> > One question to consider is what changes (if any) would
> >>> need
> >>> > to
> >>> > >> > >> >>> > be
> >>> > >> > >> >>> > made to public APIs in either scenario.
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
> >>> data
> >>> > >> > >> >>> messages
> >>> > >> > >> >>> in
> >>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
> >>> > change
> >>> > >> > >> >>> the
> >>> > >> > >> >>> wire
> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >>> compatibly,
> >>> > >> > >> >>> though.
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >>> > >> > >> >>> > > fields.
> >>> So
> >>> > >> > >> >>> > > far,
> >>> > >> > >> >>> > > we've
> >>> > >> > >> >>> > > used bytestring fields for application-defined data.
> >>> > >> > >> >>> > > This
> >>> > is
> >>> > >> > >> >>> > > workable
> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
> >>> those
> >>> > >> > >> >>> > > fields,
> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >>> > >> > >> >>> > > into/from
> >>> the
> >>> > >> > >> >>> > > bytestring
> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >>> > >> > >> >>> > > dynamically
> >>> > >> > >> >>> > > typed
> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
> >>> > >> > >> >>> > > exposing
> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >>> > >> > >> >>> > > combination
> >>> of
> >>> > >> > >> >>> > > a
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >>> > >> > >> >>> > > Protobuf
> >>> > >> > >> >>> > > Any
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> >>> > cleaner
> >>> > >> > >> >>> > > in
> >>> > >> > >> any
> >>> > >> > >> >>> > > of the language APIs, though.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
> >>> > >> > >> >>> > particular
> >>> > >> > >> >>> > Flight implementation seems on first principles like the
> >>> most
> >>> > >> > >> flexible
> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >>> > >> > >> >>> > _must_
> >>> be
> >>> > a
> >>> > >> > >> >>> > protobuf?
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
> >>> > >> > >> >>> but
> >>> it
> >>> > >> > would
> >>> > >> > >> >>> have
> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >>> > >> > >> >>> Then
> >>> > the
> >>> > >> > >> >>> question
> >>> > >> > >> >>> would just be how to expose this (would be easier in Java,
> >>> > harder
> >>> > >> > >> >>> in
> >>> > >> > >> >>> C++).
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > David
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >>> > >> > >> >>> > > wrote:
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >>> > >> > >> >>> > > > non-protobuf
> >>> > >> > >> >>> > > > terms?
> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> >>> > Flight.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Regards
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Antoine.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> >>> > would
> >>> > >> > just
> >>> > >> > >> be
> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >>> > >> > >> >>> > > >> doesn't
> >>> > even
> >>> > >> > >> mention
> >>> > >> > >> >>> > > >> it
> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > >> > >> >>> > > >> as well.)
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >>> > >> > >> >>> > > >> essentially
> >>> > >> > >> >>> > > >> inlining
> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightMessage {
> >>> > >> > >> >>> > > >>   oneof message {
> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >>> > >> > >> >>> > > >>   }
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightAny {
> >>> > >> > >> >>> > > >>   string type = 1;
> >>> > >> > >> >>> > > >>   bytes data = 2;
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Is this correct?
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >>> > >> > >> >>> > > >> for
> >>> > >> > >> >>> > > >> DoGet/DoPut
> >>> > >> > >> >>> > > >> as
> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >>> > >> > >> >>> > > >> consistent
> >>> > with
> >>> > >> > >> >>> > > >> all
> >>> > >> > >> >>> > > >> of
> >>> > >> > >> >>> > > >> them,
> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >>> > >> > >> >>> > > >> own
> >>> > >> > >> >>> > > >> thing.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Thanks,
> >>> > >> > >> >>> > > >> David
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >>> wrote:
> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >>> > >> > >> >>> > > >>> behavior
> >>> > >> > >> >>> > > >>> without
> >>> > >> > >> >>> > > >>> exposing
> >>> > >> > >> >>> > the
> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
> >>> could
> >>> > >> > >> >>> > > >>> expose
> >>> > >> > >> >>> > > >>> as
> >>> > >> > >> >>> > > >>> two
> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >>> > >> > >> >>> > > >>> whatever
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> people
> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >>> > >> > >> >>> > > >>> (Basically
> >>> a
> >>> > >> > >> >>> > > >>> user
> >>> > >> > >> >>> > > >>> could
> >>> > >> > >> >>> > > >>> use
> >>> > >> > >> >>> > > >>> any
> >>> > >> > >> >>> > > >>> with it easily but they could also use any other
> >>> > >> > >> >>> > > >>> mechanism).
> >>> > >> > >> >>> > > >>> At
> >>> > >> > >> >>> > least in
> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
> >>> other
> >>> > >> > >> language
> >>> > >> > >> >>> > > >>> bindings
> >>> > >> > >> >>> > > >>> less diy?
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >>> > >> > >> >>> > > >>> metadata
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> it
> >>> > >> > >> >>> > just
> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
> >>> > thing,
> >>> > >> > >> >>> > > >>> I
> >>> > >> > >> >>> > > >>> wonder
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > we
> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> >>> > property
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> we
> >>> > >> > >> >>> > think it
> >>> > >> > >> >>> > > >>> is
> >>> > >> > >> >>> > > >>> a good idea...
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >>> > >> > >> li.davidm96@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > without
> >>> > >> > >> >>> > > >>>> data,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
> >>> > >> > >> >>> > > >>>> with
> >>> > >> > >> >>> > > >>>> only
> >>> > >> > >> >>> > app_metadata
> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >>> > FlightData
> >>> > >> > >> >>> > > >>>> should
> >>> > >> > >> >>> > always
> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >>> > >> > >> >>> > > >>>> the
> >>> > >> > >> >>> > > >>>> documentation
> >>> > >> > >> >>> > > >>>> on
> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >>> > >> > >> >>> > > >>>> about
> >>> > >> > >> >>> > > >>>> having
> >>> > >> > >> >>> > > >>>> a
> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
> >>> from
> >>> > >> > >> >>> > > >>>> happening,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >>> > >> > >> >>> > > >>>> and
> >>> > >> > >> >>> > > >>>> control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > >> >>> > is
> >>> > >> > >> >>> > > >>>> cleaner.
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >>> > >> > >> >>> > > >>>> refrained
> >>> > >> > >> >>> > > >>>> from
> >>> > >> > >> >>> > > >>>> exposing
> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
> >>> that
> >>> > >> > >> >>> > > >>>> now?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> Best,
> >>> > >> > >> >>> > > >>>> David
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
> >>> > >> > >> >>> > > >>>>> use
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> doget/doput
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> >>> > async
> >>> > >> > >> >>> > > >>>>> always
> >>> > >> > >> >>> > pattern.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
> >>> > >> > >> >>> > > >>>>> is
> >>> > >> > >> >>> > > >>>>> overloading
> >>> > >> > >> >>> > > >>>>> the
> >>> > >> > >> >>> > > >>>> message
> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >>> control
> >>> > >> > >> >>> > > >>>>> message
> >>> > >> > >> >>> > > >>>> independently
> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >>> something
> >>> > >> > >> >>> > > >>>>> like
> >>> > >> > >> >>> > > >>>>> an
> >>> > >> > >> >>> > > >>>>> empty
> >>> > >> > >> >>> > > >>>> flight
> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> >>> > support
> >>> > >> > >> >>> > > >>>>> a
> >>> > >> > >> >>> > > >>>>> container
> >>> > >> > >> >>> > > >>>>> object
> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
> >>> users
> >>> > >> > >> >>> > > >>>>> can
> >>> > >> > >> >>> > > >>>>> add
> >>> > >> > >> >>> > > >>>>> more
> >>> > >> > >> >>> > data
> >>> > >> > >> >>> > > >>>>> as
> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >>> > >> > >> >>> > > >>>>> the
> >>> Any
> >>> > >> > >> >>> > > >>>>> messages.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>>> wrote:
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
> >>> result
> >>> > >> > >> >>> > > >>>>>> of
> >>> > >> > >> prior
> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >>> > >> > >> >>> > > >>>>>> message
> >>> to
> >>> > >> > carry
> >>> > >> > >> >>> > > >>>>>> that
> >>> > >> > >> >>> > kind
> >>> > >> > >> >>> > > >>>>>> of information.
> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> >>> > handle
> >>> > >> > >> >>> > > >>>>>> incoming
> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >>> > >> > >> >>> > > >>>>>> actually
> >>> > be
> >>> > >> > >> >>> > > >>>>>> quite
> >>> > >> > >> >>> > natural
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
> >>> not
> >>> > >> > >> >>> > > >>>>>> possible
> >>> > >> > >> >>> > > >>>>>> in
> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
> >>> > >> > >> >>> > > >>>>>> (In
> >>> > >> > essence,
> >>> > >> > >> >>> > gRPC-Java
> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
> >>> There
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > experimental
> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> Java,
> >>> > >> > >> >>> > > >>>>>> but
> >>> > >> > >> >>> > > >>>>>> those
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> >>> > still
> >>> > >> > >> >>> > > >>>>>> under
> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
> >>> which
> >>> > >> > >> >>> > > >>>>>> have
> >>> > >> > >> been
> >>> > >> > >> >>> > around
> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>> David
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >>> > >> > >> >>> > > >>>>>>> Might
> >>> > >> > >> >>> > > >>>>>>> worth
> >>> > >> > >> >>> > > >>>>>>> discussion
> >>> > >> > >> >>> > > >>>>>>> here
> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >>> provide a
> >>> > >> > >> >>> > > >>>>>>>> Python
> >>> > >> > >> >>> > strawman.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> >>> > could
> >>> > >> > >> >>> > > >>>>>>>> interleave
> >>> > >> > >> >>> > > >>>> uploads
> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
> >>> now,
> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >>> > >> > >> >>> > APIs
> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >>> > >> > >> >>> > > >>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>> server
> >>> > >> > >> >>> > > >>>>>>>> wait
> >>> > >> > >> >>> > > >>>>>>>> for
> >>> > >> > >> >>> > each
> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>> give
> >>> > >> > >> >>> > > >>>>>>>> us
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >>> overall
> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >>> > >> > >> >>> > > >>>>>>>> As
> >>> > >> > >> >>> > an
> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >>> > >> > >> >>> > > >>>>>>>>
> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
> >>> > >> > >> >>> > > >>>>>>>> we
> >>> > >> > >> >>> > > >>>>>>>> will
> >>> > >> > >> >>> > > >>>>>>>> want
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > design
> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >>> > >> > bidirectional
> >>> > >> > >> >>> > > >>>>>>>> stream
> >>> > >> > >> >>> > > >>>>>>>> like
> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >>> > >> > >> >>> > > >>>>>>>> easier
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>> run
> >>> > >> > >> >>> > into.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >>> > >> > but
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > main
> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
> >>> > >> > >> >>> > > >>>>>>>> two
> >>> > >> > >> >>> > > >>>>>>>> separate
> >>> > >> > >> >>> > > >>>>>>>> calls
> >>> > >> > >> >>> > > >>>>>>>> could
> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >>> > >> > >> >>> > > >>>>>>>> has
> >>> > >> > >> >>> > > >>>>>>>> some
> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >>> > >> > >> >>> > > >>>>>>>> away
> >>> in
> >>> > >> > >> >>> > > >>>>>>>> between
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > two
> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >>> > >> > >> >>> > > >>>>>>>> another
> >>> > >> > instance
> >>> > >> > >> >>> > available,
> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> >>> > server
> >>> > >> > >> without
> >>> > >> > >> >>> > > >>>>>>>> warning.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
> >>> > >> > >> >>> > > >>>>>>>>> I
> >>> > like
> >>> > >> > >> >>> > > >>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >>> > >> > >> >>> > > >>>>>>>>> thing
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>>> help
> >>> > >> > >> >>> > me
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> be
> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
> >>> use
> >>> > >> > >> >>> > > >>>>>>>>> case.
> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >>> > >> > >> >>> > what
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >>> > >> > >> >>> > > >>>>>>>>> Would
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client
> >>> > >> > >> >>> > > >>>>>>>>> block
> >>> > >> > >> >>> > > >>>>>>>> till
> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >>> > >> > >> flight.vector_mult(a,
> >>> > >> > >> >>> > > >>>>>>>>> b)
> >>> > >> > >> >>> > or
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
> >>> > >> > >> >>> > > >>>>>>>>> was
> >>> > >> > >> >>> > > >>>>>>>>> done.
> >>> > >> > >> >>> > > >>>>>>>>> If
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> later
> >>> > >> > >> >>> > > >>>>>>>>> how
> >>> > >> > >> >>> > > >>>>>>>>> is
> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >>> suppose
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> this
> >>> > >> > >> >>> > could
> >>> > >> > >> >>> > > >>>> be
> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >>> > >> > >> >>> > > >>>>>>>>> interface
> >>> > >> > >> >>> > > >>>>>>>>> but
> >>> > >> > >> rather
> >>> > >> > >> >>> > > >>>>>>>>> by a
> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
> >>> > >> > >> >>> > > >>>>>>>>>> so
> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >>> > >> > but
> >>> > >> > >> we
> >>> > >> > >> >>> > > >>>>>>>>>> have
> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
> >>> being
> >>> > >> > >> >>> > > >>>>>>>>>> able
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> both
> >>> > >> > >> >>> > > >>>>>>>>>> send
> >>> > >> > >> >>> > > >>>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >>> To
> >>> > >> > >> >>> > > >>>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >>> > >> > >> >>> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >>> comment
> >>> > >> > >> >>> > > >>>>>>>>>> on
> >>> > >> > the
> >>> > >> > >> >>> > document.
> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> >>> > vote.
> >>> > >> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >>> (on
> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >>> > >> > >> >>> > > >>>>>>>>>> as
> >>> > >> > >> >>> > > >>>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >>> > >> > of
> >>> > >> > >> >>> > > >>>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> --
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >>> > >> > >> join
> >>> > >> > >> >>> > > >>>>>>>>> our
> >>> > >> > >> >>> > > >>>>>>>>> community
> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >>> Download
> >>> > >> > Dremio
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> >
> >>> > >> > >> >>
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> > >
> >>> > >> >
> >>> > >
> >>> >
> >>>
> >>
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Following up here, I've submitted a draft implementation for C++:
https://github.com/apache/arrow/pull/6656

The core functionality is there, but there are still holes that I need
to implement. Compared to the draft spec, the client also sends a
FlightDescriptor to begin with, though it's currently not exposed.
This provides consistency with DoGet/DoPut which also send a message
to begin with to describe the stream to the server.

Andy, I hope this helps clarify whether it meets your needs.

Best,
David

On 2/25/20, David Li <li...@gmail.com> wrote:
> Hey Andy,
>
> I've been rather busy unfortunately. I had started on an
> implementation in C++ to provide as part of this discussion, but it's
> not complete. I'm hoping to have more done in March.
>
> Best,
> David
>
> On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> I was wondering if there had been any momentum on this (the BiDirectional
>> RPC design)?
>>
>> I'm interested in this for the use case of Apache Spark sending a stream
>> of
>> data to another process to invoke custom code and then receive a stream
>> back with the transformed data.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> I support moving forward with the current proposal.
>>>
>>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>>>
>>> > Just following up here again, any other thoughts?
>>> >
>>> > I think we do have justifications for potentially separate streams in
>>> > a call, but that's more of an orthogonal question - it doesn't need to
>>> > be addressed here. I do agree that it very much complicates things.
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>>> > > I would generally agree with this. Note that you have the
>>> > > possibility
>>> > > to use unions-of-structs to send record batches with different
>>> > > schemas
>>> > > in the same stream, though with some added complexity on each side
>>> > >
>>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >>
>>> > >> I'd vote for explicitly not supported. We should keep our
>>> > >> primitives
>>> > >> narrow.
>>> > >>
>>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Thanks for the feedback.
>>> > >> >
>>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>>> > >> > there are a lot of places where things could be made more
>>> > >> > ergonomic,
>>> > >> > including with the metadata fields. But it would also have locked
>>> out
>>> > >> > us of potential future transports.
>>> > >> >
>>> > >> > On another note: I hesitate to put too much into this method, but
>>> > >> > we
>>> > >> > are looking at use cases where potentially, a client may want to
>>> > >> > upload multiple distinct datasets (with differing schemas). (This
>>> is a
>>> > >> > little tentative, and I can get more details...) Right now, each
>>> > >> > logical stream in Flight must have a single, consistent schema;
>>> would
>>> > >> > it make sense to look at ways to relax this, or declare this
>>> > >> > explicitly out of scope (and require multiple calls and
>>> > >> > coordination
>>> > >> > with the deployment topology) in order to accomplish this?
>>> > >> >
>>> > >> > Best,
>>> > >> > David
>>> > >> >
>>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>>> looks
>>> > >> > > good
>>> > >> > > to me.
>>> > >> > >
>>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>>> > >> > > <li...@gmail.com>
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > >> I've updated the proposal.
>>> > >> > >>
>>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>>> > >> > >> errors/metadata, I still think using bytes is preferable:
>>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>>> Protobuf
>>> > >> > types,
>>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>>> > >> > >> to
>>> > >> > >> C++
>>> > >> > >> users without causing build pains,
>>> > >> > >> - We can't let Python users take advantage of the Protobuf
>>> > >> > >> field
>>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>>> > >> > >> linking
>>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>>> into
>>> > >> > >> the Python ones),
>>> > >> > >> - All our other application-defined fields are already bytes.
>>> > >> > >>
>>> > >> > >> Applications that want structure can encode JSON or Protobuf
>>> > >> > >> Any
>>> > >> > >> into
>>> > >> > >> the bytes field themselves, much as you can already do for
>>> Ticket,
>>> > >> > >> commands in FlightDescriptors, and application metadata in
>>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>>> using
>>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>>> > >> > >> and
>>> > must
>>> > >> > >> invoke the Protobuf deserializer again to read the actual
>>> message.
>>> > >> > >>
>>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>>> > >> > >> to
>>> > >> > >> define a new message with a oneof either, since it would be
>>> > >> > >> redundant.
>>> > >> > >>
>>> > >> > >> Thanks,
>>> > >> > >> David
>>> > >> > >>
>>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>>> > >> > >> > I've been extremely backlogged, I will update the proposal
>>> when I
>>> > >> > >> > get
>>> > >> > >> > a chance and reply here when done.
>>> > >> > >> >
>>> > >> > >> > Best,
>>> > >> > >> > David
>>> > >> > >> >
>>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>>> > >> > >> >> Bumping this discussion since a couple of weeks have
>>> > >> > >> >> passed.
>>> It
>>> > >> > >> >> seems
>>> > >> > >> >> there are still some questions here, could we summarize
>>> > >> > >> >> what
>>> are
>>> > >> > >> >> the
>>> > >> > >> >> alternatives along with any public API implications so we
>>> > >> > >> >> can
>>> > try
>>> > >> > >> >> to
>>> > >> > >> >> render a decision?
>>> > >> > >> >>
>>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>>> li.davidm96@gmail.com
>>> > >
>>> > >> > >> >> wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> Hi Wes,
>>> > >> > >> >>>
>>> > >> > >> >>> Responses inline:
>>> > >> > >> >>>
>>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>>> wesmckinn@gmail.com>
>>> > >> > wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>>> > >> > >> >>> > <li...@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > The question is whether to repurpose the existing
>>> > FlightData
>>> > >> > >> >>> > > structure, and allow for the metadata field to be
>>> > >> > >> >>> > > filled
>>> in
>>> > >> > >> >>> > > and
>>> > >> > >> data
>>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
>>> > >> > >> >>> > > the
>>> > >> > >> FlightData
>>> > >> > >> >>> > > structure in another structure that explicitly
>>> > distinguishes
>>> > >> > >> between
>>> > >> > >> >>> > > control and data messages.
>>> > >> > >> >>> >
>>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>>> > >> > >> >>> > with
>>> > >> > >> >>> > empty
>>> > >> > >> body.
>>> > >> > >> >>> > One question to consider is what changes (if any) would
>>> need
>>> > to
>>> > >> > >> >>> > be
>>> > >> > >> >>> > made to public APIs in either scenario.
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
>>> data
>>> > >> > >> >>> messages
>>> > >> > >> >>> in
>>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
>>> > change
>>> > >> > >> >>> the
>>> > >> > >> >>> wire
>>> > >> > >> >>> format. I think the APIs could be changed backwards
>>> compatibly,
>>> > >> > >> >>> though.
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > The other question is how to handle the metadata
>>> > >> > >> >>> > > fields.
>>> So
>>> > >> > >> >>> > > far,
>>> > >> > >> >>> > > we've
>>> > >> > >> >>> > > used bytestring fields for application-defined data.
>>> > >> > >> >>> > > This
>>> > is
>>> > >> > >> >>> > > workable
>>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
>>> those
>>> > >> > >> >>> > > fields,
>>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>>> > >> > >> >>> > > into/from
>>> the
>>> > >> > >> >>> > > bytestring
>>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>>> > >> > >> >>> > > dynamically
>>> > >> > >> >>> > > typed
>>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
>>> > >> > >> >>> > > exposing
>>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>>> > >> > >> >>> > > combination
>>> of
>>> > >> > >> >>> > > a
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>>> > >> > >> >>> > > Protobuf
>>> > >> > >> >>> > > Any
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
>>> > cleaner
>>> > >> > >> >>> > > in
>>> > >> > >> any
>>> > >> > >> >>> > > of the language APIs, though.
>>> > >> > >> >>> >
>>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
>>> > >> > >> >>> > particular
>>> > >> > >> >>> > Flight implementation seems on first principles like the
>>> most
>>> > >> > >> flexible
>>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>>> > >> > >> >>> > _must_
>>> be
>>> > a
>>> > >> > >> >>> > protobuf?
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
>>> > >> > >> >>> but
>>> it
>>> > >> > would
>>> > >> > >> >>> have
>>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>>> > >> > >> >>> Then
>>> > the
>>> > >> > >> >>> question
>>> > >> > >> >>> would just be how to expose this (would be easier in Java,
>>> > harder
>>> > >> > >> >>> in
>>> > >> > >> >>> C++).
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > David
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>>> > >> > >> >>> > > wrote:
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>>> > >> > >> >>> > > > non-protobuf
>>> > >> > >> >>> > > > terms?
>>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
>>> > Flight.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Regards
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Antoine.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
>>> > would
>>> > >> > just
>>> > >> > >> be
>>> > >> > >> >>> > > >> application-level logic. (The official guide
>>> > >> > >> >>> > > >> doesn't
>>> > even
>>> > >> > >> mention
>>> > >> > >> >>> > > >> it
>>> > >> > >> >>> > > >> in the encoding docs; I found
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > >> > >> >>> > > >> as well.)
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>>> > >> > >> >>> > > >> essentially
>>> > >> > >> >>> > > >> inlining
>>> > >> > >> >>> > > >> the definition of Any, e.g.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightMessage {
>>> > >> > >> >>> > > >>   oneof message {
>>> > >> > >> >>> > > >>     FlightData data = 1;
>>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>>> > >> > >> >>> > > >>   }
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightAny {
>>> > >> > >> >>> > > >>   string type = 1;
>>> > >> > >> >>> > > >>   bytes data = 2;
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Is this correct?
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>>> > >> > >> >>> > > >> for
>>> > >> > >> >>> > > >> DoGet/DoPut
>>> > >> > >> >>> > > >> as
>>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>>> > >> > >> >>> > > >> consistent
>>> > with
>>> > >> > >> >>> > > >> all
>>> > >> > >> >>> > > >> of
>>> > >> > >> >>> > > >> them,
>>> > >> > >> >>> > > >> rather than have one of the three methods do its
>>> > >> > >> >>> > > >> own
>>> > >> > >> >>> > > >> thing.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Thanks,
>>> > >> > >> >>> > > >> David
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>>> > >> > >> >>> > > >>> behavior
>>> > >> > >> >>> > > >>> without
>>> > >> > >> >>> > > >>> exposing
>>> > >> > >> >>> > the
>>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
>>> could
>>> > >> > >> >>> > > >>> expose
>>> > >> > >> >>> > > >>> as
>>> > >> > >> >>> > > >>> two
>>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
>>> > >> > >> >>> > > >>> whatever
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> people
>>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>>> > >> > >> >>> > > >>> (Basically
>>> a
>>> > >> > >> >>> > > >>> user
>>> > >> > >> >>> > > >>> could
>>> > >> > >> >>> > > >>> use
>>> > >> > >> >>> > > >>> any
>>> > >> > >> >>> > > >>> with it easily but they could also use any other
>>> > >> > >> >>> > > >>> mechanism).
>>> > >> > >> >>> > > >>> At
>>> > >> > >> >>> > least in
>>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
>>> other
>>> > >> > >> language
>>> > >> > >> >>> > > >>> bindings
>>> > >> > >> >>> > > >>> less diy?
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>>> > >> > >> >>> > > >>> metadata
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> it
>>> > >> > >> >>> > just
>>> > >> > >> >>> > > >>> seemed a bit janky.
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
>>> > thing,
>>> > >> > >> >>> > > >>> I
>>> > >> > >> >>> > > >>> wonder
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > we
>>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
>>> > property
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> we
>>> > >> > >> >>> > think it
>>> > >> > >> >>> > > >>> is
>>> > >> > >> >>> > > >>> a good idea...
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>>> > >> > >> li.davidm96@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>>> I was definitely considering having control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > without
>>> > >> > >> >>> > > >>>> data,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
>>> > >> > >> >>> > > >>>> with
>>> > >> > >> >>> > > >>>> only
>>> > >> > >> >>> > app_metadata
>>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>>> > FlightData
>>> > >> > >> >>> > > >>>> should
>>> > >> > >> >>> > always
>>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>>> > >> > >> >>> > > >>>> the
>>> > >> > >> >>> > > >>>> documentation
>>> > >> > >> >>> > > >>>> on
>>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>>> > >> > >> >>> > > >>>> about
>>> > >> > >> >>> > > >>>> having
>>> > >> > >> >>> > > >>>> a
>>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
>>> from
>>> > >> > >> >>> > > >>>> happening,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> overall having a clear separation between data
>>> > >> > >> >>> > > >>>> and
>>> > >> > >> >>> > > >>>> control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > >> >>> > is
>>> > >> > >> >>> > > >>>> cleaner.
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>>> > >> > >> >>> > > >>>> refrained
>>> > >> > >> >>> > > >>>> from
>>> > >> > >> >>> > > >>>> exposing
>>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
>>> that
>>> > >> > >> >>> > > >>>> now?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> Best,
>>> > >> > >> >>> > > >>>> David
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >> > >> >>> > > >>>>> Hey David,
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
>>> > >> > >> >>> > > >>>>> use
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> doget/doput
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
>>> > async
>>> > >> > >> >>> > > >>>>> always
>>> > >> > >> >>> > pattern.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
>>> > >> > >> >>> > > >>>>> is
>>> > >> > >> >>> > > >>>>> overloading
>>> > >> > >> >>> > > >>>>> the
>>> > >> > >> >>> > > >>>> message
>>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>>> control
>>> > >> > >> >>> > > >>>>> message
>>> > >> > >> >>> > > >>>> independently
>>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>>> something
>>> > >> > >> >>> > > >>>>> like
>>> > >> > >> >>> > > >>>>> an
>>> > >> > >> >>> > > >>>>> empty
>>> > >> > >> >>> > > >>>> flight
>>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
>>> > support
>>> > >> > >> >>> > > >>>>> a
>>> > >> > >> >>> > > >>>>> container
>>> > >> > >> >>> > > >>>>> object
>>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
>>> users
>>> > >> > >> >>> > > >>>>> can
>>> > >> > >> >>> > > >>>>> add
>>> > >> > >> >>> > > >>>>> more
>>> > >> > >> >>> > data
>>> > >> > >> >>> > > >>>>> as
>>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>>> > >> > >> >>> > > >>>>> the
>>> Any
>>> > >> > >> >>> > > >>>>> messages.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>>> wrote:
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>>> Hi Jacques,
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
>>> result
>>> > >> > >> >>> > > >>>>>> of
>>> > >> > >> prior
>>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>>> > >> > >> >>> > > >>>>>> message
>>> to
>>> > >> > carry
>>> > >> > >> >>> > > >>>>>> that
>>> > >> > >> >>> > kind
>>> > >> > >> >>> > > >>>>>> of information.
>>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
>>> > handle
>>> > >> > >> >>> > > >>>>>> incoming
>>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>>> > >> > >> >>> > > >>>>>> actually
>>> > be
>>> > >> > >> >>> > > >>>>>> quite
>>> > >> > >> >>> > natural
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
>>> not
>>> > >> > >> >>> > > >>>>>> possible
>>> > >> > >> >>> > > >>>>>> in
>>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
>>> > >> > >> >>> > > >>>>>> (In
>>> > >> > essence,
>>> > >> > >> >>> > gRPC-Java
>>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
>>> There
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > experimental
>>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> Java,
>>> > >> > >> >>> > > >>>>>> but
>>> > >> > >> >>> > > >>>>>> those
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
>>> > still
>>> > >> > >> >>> > > >>>>>> under
>>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
>>> which
>>> > >> > >> >>> > > >>>>>> have
>>> > >> > >> been
>>> > >> > >> >>> > around
>>> > >> > >> >>> > > >>>>>> for quite a while).
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>> David
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>>> > >> > >> >>> > > >>>>>>> Might
>>> > >> > >> >>> > > >>>>>>> worth
>>> > >> > >> >>> > > >>>>>>> discussion
>>> > >> > >> >>> > > >>>>>>> here
>>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>> wrote:
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>>> provide a
>>> > >> > >> >>> > > >>>>>>>> Python
>>> > >> > >> >>> > strawman.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
>>> > could
>>> > >> > >> >>> > > >>>>>>>> interleave
>>> > >> > >> >>> > > >>>> uploads
>>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
>>> now,
>>> > >> > >> >>> > > >>>>>>>> synchronous
>>> > >> > >> >>> > APIs
>>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>>> > >> > >> >>> > > >>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>> server
>>> > >> > >> >>> > > >>>>>>>> wait
>>> > >> > >> >>> > > >>>>>>>> for
>>> > >> > >> >>> > each
>>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>>> > >> > >> >>> > > >>>>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>> give
>>> > >> > >> >>> > > >>>>>>>> us
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>>> overall
>>> > >> > >> >>> > > >>>>>>>> timeout.)
>>> > >> > >> >>> > > >>>>>>>> As
>>> > >> > >> >>> > an
>>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>>> > >> > >> >>> > > >>>>>>>>
>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
>>> > >> > >> >>> > > >>>>>>>> we
>>> > >> > >> >>> > > >>>>>>>> will
>>> > >> > >> >>> > > >>>>>>>> want
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > design
>>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>>> > >> > bidirectional
>>> > >> > >> >>> > > >>>>>>>> stream
>>> > >> > >> >>> > > >>>>>>>> like
>>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>>> > >> > >> >>> > > >>>>>>>> pitfalls
>>> > >> > >> >>> > > >>>>>>>> easier
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>> run
>>> > >> > >> >>> > into.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>>> > >> > >> >>> > > >>>>>>>> proposal,
>>> > >> > but
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > main
>>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
>>> > >> > >> >>> > > >>>>>>>> two
>>> > >> > >> >>> > > >>>>>>>> separate
>>> > >> > >> >>> > > >>>>>>>> calls
>>> > >> > >> >>> > > >>>>>>>> could
>>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>>> > >> > >> >>> > > >>>>>>>> Additionally,
>>> > >> > >> >>> > > >>>>>>>> gRPC
>>> > >> > >> >>> > > >>>>>>>> has
>>> > >> > >> >>> > > >>>>>>>> some
>>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>>> > >> > >> >>> > > >>>>>>>> away
>>> in
>>> > >> > >> >>> > > >>>>>>>> between
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > two
>>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>>> > >> > >> >>> > > >>>>>>>> another
>>> > >> > instance
>>> > >> > >> >>> > available,
>>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
>>> > server
>>> > >> > >> without
>>> > >> > >> >>> > > >>>>>>>> warning.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
>>> > wrote:
>>> > >> > >> >>> > > >>>>>>>>> Hey David,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
>>> > >> > >> >>> > > >>>>>>>>> I
>>> > like
>>> > >> > >> >>> > > >>>>>>>>> it
>>> > >> > >> >>> > > >>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> possibility
>>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>>> > >> > >> >>> > > >>>>>>>>> thing
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>>> help
>>> > >> > >> >>> > me
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> be
>>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
>>> use
>>> > >> > >> >>> > > >>>>>>>>> case.
>>> > >> > >> >>> > > >>>>>>>>> Also,
>>> > >> > >> >>> > what
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>>> > >> > >> >>> > > >>>>>>>>> asyc?
>>> > >> > >> >>> > > >>>>>>>>> Would
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client
>>> > >> > >> >>> > > >>>>>>>>> block
>>> > >> > >> >>> > > >>>>>>>> till
>>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>>> > >> > >> flight.vector_mult(a,
>>> > >> > >> >>> > > >>>>>>>>> b)
>>> > >> > >> >>> > or
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
>>> > >> > >> >>> > > >>>>>>>>> was
>>> > >> > >> >>> > > >>>>>>>>> done.
>>> > >> > >> >>> > > >>>>>>>>> If
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> later
>>> > >> > >> >>> > > >>>>>>>>> how
>>> > >> > >> >>> > > >>>>>>>>> is
>>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>>> suppose
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> this
>>> > >> > >> >>> > could
>>> > >> > >> >>> > > >>>> be
>>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>>> > >> > >> >>> > > >>>>>>>>> interface
>>> > >> > >> >>> > > >>>>>>>>> but
>>> > >> > >> rather
>>> > >> > >> >>> > > >>>>>>>>> by a
>>> > >> > >> >>> > > >>>>>>>>> function/util?
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Best,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > >> > >> >>> > li.davidm96@gmail.com>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
>>> > >> > >> >>> > > >>>>>>>>>> so
>>> > >> > >> >>> > > >>>>>>>>>> far,
>>> > >> > but
>>> > >> > >> we
>>> > >> > >> >>> > > >>>>>>>>>> have
>>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
>>> being
>>> > >> > >> >>> > > >>>>>>>>>> able
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> both
>>> > >> > >> >>> > > >>>>>>>>>> send
>>> > >> > >> >>> > > >>>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>>> > >> > >> >>> > > >>>>>>>>>> call.
>>> To
>>> > >> > >> >>> > > >>>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>>> end,
>>> > >> > >> >>> > I've
>>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>>> comment
>>> > >> > >> >>> > > >>>>>>>>>> on
>>> > >> > the
>>> > >> > >> >>> > document.
>>> > >> > >> >>> > > >>>>>>>>>> I'd
>>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>>> > >> > >> >>> > > >>>>>>>>>> relatively
>>> > >> > >> >>> > > >>>>>>>>>> straightforward
>>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
>>> > vote.
>>> > >> > I've
>>> > >> > >> >>> > > >>>>>>>>>> decided
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>>> > >> > >> >>> > > >>>>>>>>>> proposed
>>> (on
>>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>>> > >> > >> >>> > > >>>>>>>>>> as
>>> > >> > >> >>> > > >>>>>>>>>> it
>>> > >> > >> >>> > > >>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>>> > >> > >> >>> > > >>>>>>>>>> interpretation
>>> > >> > of
>>> > >> > >> >>> > > >>>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>>> semantics.
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> --
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>>> > >> > >> join
>>> > >> > >> >>> > > >>>>>>>>> our
>>> > >> > >> >>> > > >>>>>>>>> community
>>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>>> Download
>>> > >> > Dremio
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >
>>> > >> > >> >>> >
>>> > >> > >> >>
>>> > >> > >> >
>>> > >> > >>
>>> > >> > >
>>> > >> >
>>> > >
>>> >
>>>
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Andy,

I've been rather busy unfortunately. I had started on an
implementation in C++ to provide as part of this discussion, but it's
not complete. I'm hoping to have more done in March.

Best,
David

On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> I was wondering if there had been any momentum on this (the BiDirectional
> RPC design)?
>
> I'm interested in this for the use case of Apache Spark sending a stream of
> data to another process to invoke custom code and then receive a stream
> back with the transformed data.
>
> Thanks,
>
> Andy.
>
>
>
> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> I support moving forward with the current proposal.
>>
>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>>
>> > Just following up here again, any other thoughts?
>> >
>> > I think we do have justifications for potentially separate streams in
>> > a call, but that's more of an orthogonal question - it doesn't need to
>> > be addressed here. I do agree that it very much complicates things.
>> >
>> > Thanks,
>> > David
>> >
>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>> > > I would generally agree with this. Note that you have the possibility
>> > > to use unions-of-structs to send record batches with different
>> > > schemas
>> > > in the same stream, though with some added complexity on each side
>> > >
>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
>> > wrote:
>> > >>
>> > >> I'd vote for explicitly not supported. We should keep our primitives
>> > >> narrow.
>> > >>
>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Thanks for the feedback.
>> > >> >
>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>> > >> > there are a lot of places where things could be made more
>> > >> > ergonomic,
>> > >> > including with the metadata fields. But it would also have locked
>> out
>> > >> > us of potential future transports.
>> > >> >
>> > >> > On another note: I hesitate to put too much into this method, but
>> > >> > we
>> > >> > are looking at use cases where potentially, a client may want to
>> > >> > upload multiple distinct datasets (with differing schemas). (This
>> is a
>> > >> > little tentative, and I can get more details...) Right now, each
>> > >> > logical stream in Flight must have a single, consistent schema;
>> would
>> > >> > it make sense to look at ways to relax this, or declare this
>> > >> > explicitly out of scope (and require multiple calls and
>> > >> > coordination
>> > >> > with the deployment topology) in order to accomplish this?
>> > >> >
>> > >> > Best,
>> > >> > David
>> > >> >
>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>> looks
>> > >> > > good
>> > >> > > to me.
>> > >> > >
>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > >> I've updated the proposal.
>> > >> > >>
>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> Protobuf
>> > >> > types,
>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>> > >> > >> to
>> > >> > >> C++
>> > >> > >> users without causing build pains,
>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> > >> > >> field
>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>> > >> > >> linking
>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>> into
>> > >> > >> the Python ones),
>> > >> > >> - All our other application-defined fields are already bytes.
>> > >> > >>
>> > >> > >> Applications that want structure can encode JSON or Protobuf
>> > >> > >> Any
>> > >> > >> into
>> > >> > >> the bytes field themselves, much as you can already do for
>> Ticket,
>> > >> > >> commands in FlightDescriptors, and application metadata in
>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>> using
>> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
>> > must
>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> message.
>> > >> > >>
>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>> > >> > >> to
>> > >> > >> define a new message with a oneof either, since it would be
>> > >> > >> redundant.
>> > >> > >>
>> > >> > >> Thanks,
>> > >> > >> David
>> > >> > >>
>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > >> > >> > I've been extremely backlogged, I will update the proposal
>> when I
>> > >> > >> > get
>> > >> > >> > a chance and reply here when done.
>> > >> > >> >
>> > >> > >> > Best,
>> > >> > >> > David
>> > >> > >> >
>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
>> It
>> > >> > >> >> seems
>> > >> > >> >> there are still some questions here, could we summarize what
>> are
>> > >> > >> >> the
>> > >> > >> >> alternatives along with any public API implications so we
>> > >> > >> >> can
>> > try
>> > >> > >> >> to
>> > >> > >> >> render a decision?
>> > >> > >> >>
>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>> li.davidm96@gmail.com
>> > >
>> > >> > >> >> wrote:
>> > >> > >> >>>
>> > >> > >> >>> Hi Wes,
>> > >> > >> >>>
>> > >> > >> >>> Responses inline:
>> > >> > >> >>>
>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>> wesmckinn@gmail.com>
>> > >> > wrote:
>> > >> > >> >>>
>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> > >> > >> >>> > <li...@gmail.com>
>> > >> > >> >>> > wrote:
>> > >> > >> >>> > >
>> > >> > >> >>> > > The question is whether to repurpose the existing
>> > FlightData
>> > >> > >> >>> > > structure, and allow for the metadata field to be
>> > >> > >> >>> > > filled
>> in
>> > >> > >> >>> > > and
>> > >> > >> data
>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
>> > >> > >> >>> > > the
>> > >> > >> FlightData
>> > >> > >> >>> > > structure in another structure that explicitly
>> > distinguishes
>> > >> > >> between
>> > >> > >> >>> > > control and data messages.
>> > >> > >> >>> >
>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>> > >> > >> >>> > with
>> > >> > >> >>> > empty
>> > >> > >> body.
>> > >> > >> >>> > One question to consider is what changes (if any) would
>> need
>> > to
>> > >> > >> >>> > be
>> > >> > >> >>> > made to public APIs in either scenario.
>> > >> > >> >>> >
>> > >> > >> >>>
>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
>> data
>> > >> > >> >>> messages
>> > >> > >> >>> in
>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
>> > change
>> > >> > >> >>> the
>> > >> > >> >>> wire
>> > >> > >> >>> format. I think the APIs could be changed backwards
>> compatibly,
>> > >> > >> >>> though.
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> > > The other question is how to handle the metadata
>> > >> > >> >>> > > fields.
>> So
>> > >> > >> >>> > > far,
>> > >> > >> >>> > > we've
>> > >> > >> >>> > > used bytestring fields for application-defined data.
>> > >> > >> >>> > > This
>> > is
>> > >> > >> >>> > > workable
>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
>> those
>> > >> > >> >>> > > fields,
>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from
>> the
>> > >> > >> >>> > > bytestring
>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> > >> > >> >>> > > dynamically
>> > >> > >> >>> > > typed
>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
>> > >> > >> >>> > > exposing
>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>> > >> > >> >>> > > combination
>> of
>> > >> > >> >>> > > a
>> > >> > >> >>> > > type
>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>> > >> > >> >>> > > Protobuf
>> > >> > >> >>> > > Any
>> > >> > >> >>> > > type
>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
>> > cleaner
>> > >> > >> >>> > > in
>> > >> > >> any
>> > >> > >> >>> > > of the language APIs, though.
>> > >> > >> >>> >
>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
>> > >> > >> >>> > particular
>> > >> > >> >>> > Flight implementation seems on first principles like the
>> most
>> > >> > >> flexible
>> > >> > >> >>> > thing, if Any is used, does that mean the metadata _must_
>> be
>> > a
>> > >> > >> >>> > protobuf?
>> > >> > >> >>> >
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
>> > >> > >> >>> but
>> it
>> > >> > would
>> > >> > >> >>> have
>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>> > >> > >> >>> Then
>> > the
>> > >> > >> >>> question
>> > >> > >> >>> would just be how to expose this (would be easier in Java,
>> > harder
>> > >> > >> >>> in
>> > >> > >> >>> C++).
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> > > David
>> > >> > >> >>> > >
>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>> > >> > >> >>> > > > non-protobuf
>> > >> > >> >>> > > > terms?
>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
>> > Flight.
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Regards
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Antoine.
>> > >> > >> >>> > > >
>> > >> > >> >>> > > >
>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
>> > would
>> > >> > just
>> > >> > >> be
>> > >> > >> >>> > > >> application-level logic. (The official guide doesn't
>> > even
>> > >> > >> mention
>> > >> > >> >>> > > >> it
>> > >> > >> >>> > > >> in the encoding docs; I found
>> > >> > >> >>> > > >>
>> > >> > >> >>> >
>> > >> > >>
>> > >> >
>> >
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > >> > >> >>> > > >> as well.)
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> > >> > >> >>> > > >> essentially
>> > >> > >> >>> > > >> inlining
>> > >> > >> >>> > > >> the definition of Any, e.g.
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> message FlightMessage {
>> > >> > >> >>> > > >>   oneof message {
>> > >> > >> >>> > > >>     FlightData data = 1;
>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>> > >> > >> >>> > > >>   }
>> > >> > >> >>> > > >> }
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> message FlightAny {
>> > >> > >> >>> > > >>   string type = 1;
>> > >> > >> >>> > > >>   bytes data = 2;
>> > >> > >> >>> > > >> }
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> Is this correct?
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> It might be nice to consider the wrapper message for
>> > >> > >> >>> > > >> DoGet/DoPut
>> > >> > >> >>> > > >> as
>> > >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
>> > with
>> > >> > >> >>> > > >> all
>> > >> > >> >>> > > >> of
>> > >> > >> >>> > > >> them,
>> > >> > >> >>> > > >> rather than have one of the three methods do its own
>> > >> > >> >>> > > >> thing.
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> Thanks,
>> > >> > >> >>> > > >> David
>> > >> > >> >>> > > >>
>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> > >> > >> >>> > > >>> I think we could probably expose the oneof behavior
>> > >> > >> >>> > > >>> without
>> > >> > >> >>> > > >>> exposing
>> > >> > >> >>> > the
>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
>> could
>> > >> > >> >>> > > >>> expose
>> > >> > >> >>> > > >>> as
>> > >> > >> >>> > > >>> two
>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
>> > >> > >> >>> > > >>> whatever
>> > >> > >> >>> > > >>> but
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > > >>> people
>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>> > >> > >> >>> > > >>> (Basically
>> a
>> > >> > >> >>> > > >>> user
>> > >> > >> >>> > > >>> could
>> > >> > >> >>> > > >>> use
>> > >> > >> >>> > > >>> any
>> > >> > >> >>> > > >>> with it easily but they could also use any other
>> > >> > >> >>> > > >>> mechanism).
>> > >> > >> >>> > > >>> At
>> > >> > >> >>> > least in
>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
>> other
>> > >> > >> language
>> > >> > >> >>> > > >>> bindings
>> > >> > >> >>> > > >>> less diy?
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>> > >> > >> >>> > > >>> metadata
>> > >> > >> >>> > > >>> but
>> > >> > >> >>> > > >>> it
>> > >> > >> >>> > just
>> > >> > >> >>> > > >>> seemed a bit janky.
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
>> > thing,
>> > >> > >> >>> > > >>> I
>> > >> > >> >>> > > >>> wonder
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > we
>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
>> > property
>> > >> > >> >>> > > >>> if
>> > >> > >> >>> > > >>> we
>> > >> > >> >>> > think it
>> > >> > >> >>> > > >>> is
>> > >> > >> >>> > > >>> a good idea...
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> > >> > >> li.davidm96@gmail.com>
>> > >> > >> >>> > wrote:
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >>>> I was definitely considering having control
>> > >> > >> >>> > > >>>> messages
>> > >> > without
>> > >> > >> >>> > > >>>> data,
>> > >> > >> >>> > and
>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
>> > >> > >> >>> > > >>>> with
>> > >> > >> >>> > > >>>> only
>> > >> > >> >>> > app_metadata
>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>> > FlightData
>> > >> > >> >>> > > >>>> should
>> > >> > >> >>> > always
>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
>> > >> > >> >>> > > >>>> documentation
>> > >> > >> >>> > > >>>> on
>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>> > >> > >> >>> > > >>>> about
>> > >> > >> >>> > > >>>> having
>> > >> > >> >>> > > >>>> a
>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
>> from
>> > >> > >> >>> > > >>>> happening,
>> > >> > >> >>> > and
>> > >> > >> >>> > > >>>> overall having a clear separation between data and
>> > >> > >> >>> > > >>>> control
>> > >> > >> >>> > > >>>> messages
>> > >> > >> >>> > is
>> > >> > >> >>> > > >>>> cleaner.
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>> > >> > >> >>> > > >>>> refrained
>> > >> > >> >>> > > >>>> from
>> > >> > >> >>> > > >>>> exposing
>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
>> that
>> > >> > >> >>> > > >>>> now?
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> Best,
>> > >> > >> >>> > > >>>> David
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
>> > wrote:
>> > >> > >> >>> > > >>>>> Hey David,
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
>> > >> > >> >>> > > >>>>> use
>> > >> > >> >>> > > >>>>> for
>> > >> > >> >>> > > >>>>> doget/doput
>> > >> > >> >>> > > >>>>> for
>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
>> > async
>> > >> > >> >>> > > >>>>> always
>> > >> > >> >>> > pattern.
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
>> > >> > >> >>> > > >>>>> is
>> > >> > >> >>> > > >>>>> overloading
>> > >> > >> >>> > > >>>>> the
>> > >> > >> >>> > > >>>> message
>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>> control
>> > >> > >> >>> > > >>>>> message
>> > >> > >> >>> > > >>>> independently
>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>> something
>> > >> > >> >>> > > >>>>> like
>> > >> > >> >>> > > >>>>> an
>> > >> > >> >>> > > >>>>> empty
>> > >> > >> >>> > > >>>> flight
>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
>> > support
>> > >> > >> >>> > > >>>>> a
>> > >> > >> >>> > > >>>>> container
>> > >> > >> >>> > > >>>>> object
>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
>> users
>> > >> > >> >>> > > >>>>> can
>> > >> > >> >>> > > >>>>> add
>> > >> > >> >>> > > >>>>> more
>> > >> > >> >>> > data
>> > >> > >> >>> > > >>>>> as
>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for the
>> Any
>> > >> > >> >>> > > >>>>> messages.
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>> > >> > >> >>> > > >>>>> wrote:
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>>> Hi Jacques,
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
>> result
>> > >> > >> >>> > > >>>>>> of
>> > >> > >> prior
>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>> > >> > >> >>> > > >>>>>> message
>> to
>> > >> > carry
>> > >> > >> >>> > > >>>>>> that
>> > >> > >> >>> > kind
>> > >> > >> >>> > > >>>>>> of information.
>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
>> > handle
>> > >> > >> >>> > > >>>>>> incoming
>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>> > >> > >> >>> > > >>>>>> actually
>> > be
>> > >> > >> >>> > > >>>>>> quite
>> > >> > >> >>> > natural
>> > >> > >> >>> > > >>>>>> to
>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
>> not
>> > >> > >> >>> > > >>>>>> possible
>> > >> > >> >>> > > >>>>>> in
>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
>> > >> > >> >>> > > >>>>>> (In
>> > >> > essence,
>> > >> > >> >>> > gRPC-Java
>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
>> There
>> > >> > >> >>> > > >>>>>> are
>> > >> > >> >>> > experimental
>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
>> > >> > >> >>> > > >>>>>> to
>> > >> > >> >>> > > >>>>>> Java,
>> > >> > >> >>> > > >>>>>> but
>> > >> > >> >>> > > >>>>>> those
>> > >> > >> >>> > > >>>>>> are
>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
>> > still
>> > >> > >> >>> > > >>>>>> under
>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
>> which
>> > >> > >> >>> > > >>>>>> have
>> > >> > >> been
>> > >> > >> >>> > around
>> > >> > >> >>> > > >>>>>> for quite a while).
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> Thanks,
>> > >> > >> >>> > > >>>>>> David
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
>> > >> > >> >>> > > >>>>>> wrote:
>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>> > >> > >> >>> > > >>>>>>> Might
>> > >> > >> >>> > > >>>>>>> worth
>> > >> > >> >>> > > >>>>>>> discussion
>> > >> > >> >>> > > >>>>>>> here
>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>> > >> > >> >>> > > >>>> wrote:
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>> provide a
>> > >> > >> >>> > > >>>>>>>> Python
>> > >> > >> >>> > strawman.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
>> > could
>> > >> > >> >>> > > >>>>>>>> interleave
>> > >> > >> >>> > > >>>> uploads
>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
>> now,
>> > >> > >> >>> > > >>>>>>>> synchronous
>> > >> > >> >>> > APIs
>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
>> > >> > >> >>> > > >>>>>>>> server
>> > >> > >> >>> > > >>>>>>>> wait
>> > >> > >> >>> > > >>>>>>>> for
>> > >> > >> >>> > each
>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>> > >> > >> >>> > > >>>>>>>> doesn't
>> > >> > >> >>> > > >>>>>>>> give
>> > >> > >> >>> > > >>>>>>>> us
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>> overall
>> > >> > >> >>> > > >>>>>>>> timeout.)
>> > >> > >> >>> > > >>>>>>>> As
>> > >> > >> >>> > an
>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>> > >> > >> >>> > > >>>>>>>>
>> https://issues.apache.org/jira/browse/ARROW-6063
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
>> > >> > >> >>> > > >>>>>>>> we
>> > >> > >> >>> > > >>>>>>>> will
>> > >> > >> >>> > > >>>>>>>> want
>> > >> > >> >>> > > >>>>>>>> to
>> > >> > >> >>> > design
>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> > >> > bidirectional
>> > >> > >> >>> > > >>>>>>>> stream
>> > >> > >> >>> > > >>>>>>>> like
>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>> > >> > >> >>> > > >>>>>>>> pitfalls
>> > >> > >> >>> > > >>>>>>>> easier
>> > >> > >> >>> > > >>>>>>>> to
>> > >> > >> >>> > > >>>>>>>> run
>> > >> > >> >>> > into.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> > >> > >> >>> > > >>>>>>>> proposal,
>> > >> > but
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > main
>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
>> > >> > >> >>> > > >>>>>>>> two
>> > >> > >> >>> > > >>>>>>>> separate
>> > >> > >> >>> > > >>>>>>>> calls
>> > >> > >> >>> > > >>>>>>>> could
>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>> > >> > >> >>> > > >>>>>>>> Additionally,
>> > >> > >> >>> > > >>>>>>>> gRPC
>> > >> > >> >>> > > >>>>>>>> has
>> > >> > >> >>> > > >>>>>>>> some
>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>> > >> > >> >>> > > >>>>>>>> away
>> in
>> > >> > >> >>> > > >>>>>>>> between
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > two
>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>> > >> > >> >>> > > >>>>>>>> another
>> > >> > instance
>> > >> > >> >>> > available,
>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
>> > server
>> > >> > >> without
>> > >> > >> >>> > > >>>>>>>> warning.
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> Thanks,
>> > >> > >> >>> > > >>>>>>>> David
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
>> > wrote:
>> > >> > >> >>> > > >>>>>>>>> Hey David,
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
>> > like
>> > >> > >> >>> > > >>>>>>>>> it
>> > >> > >> >>> > > >>>>>>>>> and
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> possibility
>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>> > >> > >> >>> > > >>>>>>>>> thing
>> > >> > >> >>> > > >>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>> would
>> > >> > >> >>> > > >>>>>>>>> help
>> > >> > >> >>> > me
>> > >> > >> >>> > > >>>>>> would
>> > >> > >> >>> > > >>>>>>>> be
>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
>> use
>> > >> > >> >>> > > >>>>>>>>> case.
>> > >> > >> >>> > > >>>>>>>>> Also,
>> > >> > >> >>> > what
>> > >> > >> >>> > > >>>>>> would
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>> > >> > >> >>> > > >>>>>>>>> asyc?
>> > >> > >> >>> > > >>>>>>>>> Would
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client
>> > >> > >> >>> > > >>>>>>>>> block
>> > >> > >> >>> > > >>>>>>>> till
>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> > >> > >> flight.vector_mult(a,
>> > >> > >> >>> > > >>>>>>>>> b)
>> > >> > >> >>> > or
>> > >> > >> >>> > > >>>>>>>>> would
>> > >> > >> >>> > > >>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
>> > >> > >> >>> > > >>>>>>>>> was
>> > >> > >> >>> > > >>>>>>>>> done.
>> > >> > >> >>> > > >>>>>>>>> If
>> > >> > >> >>> > > >>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>> later
>> > >> > >> >>> > > >>>>>>>>> how
>> > >> > >> >>> > > >>>>>>>>> is
>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>> suppose
>> > >> > >> >>> > > >>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>> this
>> > >> > >> >>> > could
>> > >> > >> >>> > > >>>> be
>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>> > >> > >> >>> > > >>>>>>>>> interface
>> > >> > >> >>> > > >>>>>>>>> but
>> > >> > >> rather
>> > >> > >> >>> > > >>>>>>>>> by a
>> > >> > >> >>> > > >>>>>>>>> function/util?
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Best,
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Ryan
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > >> > >> >>> > li.davidm96@gmail.com>
>> > >> > >> >>> > > >>>>>> wrote:
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
>> > >> > >> >>> > > >>>>>>>>>> so
>> > >> > >> >>> > > >>>>>>>>>> far,
>> > >> > but
>> > >> > >> we
>> > >> > >> >>> > > >>>>>>>>>> have
>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
>> being
>> > >> > >> >>> > > >>>>>>>>>> able
>> > >> > >> >>> > > >>>>>>>>>> to
>> > >> > >> >>> > > >>>>>>>>>> both
>> > >> > >> >>> > > >>>>>>>>>> send
>> > >> > >> >>> > > >>>>>>>>>> and
>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>> > >> > >> >>> > > >>>>>>>>>> call.
>> To
>> > >> > >> >>> > > >>>>>>>>>> that
>> > >> > >> >>> > > >>>>>>>>>> end,
>> > >> > >> >>> > I've
>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> >
>> > >> > >>
>> > >> >
>> >
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>> comment
>> > >> > >> >>> > > >>>>>>>>>> on
>> > >> > the
>> > >> > >> >>> > document.
>> > >> > >> >>> > > >>>>>>>>>> I'd
>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>> > >> > >> >>> > > >>>>>>>>>> relatively
>> > >> > >> >>> > > >>>>>>>>>> straightforward
>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
>> > vote.
>> > >> > I've
>> > >> > >> >>> > > >>>>>>>>>> decided
>> > >> > >> >>> > > >>>>>>>>>> to
>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed
>> (on
>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>> > >> > >> >>> > > >>>>>>>>>> as
>> > >> > >> >>> > > >>>>>>>>>> it
>> > >> > >> >>> > > >>>>>> doesn't
>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> > >> > >> >>> > > >>>>>>>>>> interpretation
>> > >> > of
>> > >> > >> >>> > > >>>>>>>>>> the
>> > >> > >> >>> > > >>>>>>>>>> semantics.
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>> > >> > >> >>> > > >>>>>>>>>> David
>> > >> > >> >>> > > >>>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> --
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> > >> > >> join
>> > >> > >> >>> > > >>>>>>>>> our
>> > >> > >> >>> > > >>>>>>>>> community
>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>> Download
>> > >> > Dremio
>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > >> > >> >>> > > >>>>>>>>>
>> > >> > >> >>> > > >>>>>>>>
>> > >> > >> >>> > > >>>>>>>
>> > >> > >> >>> > > >>>>>>
>> > >> > >> >>> > > >>>>>
>> > >> > >> >>> > > >>>>
>> > >> > >> >>> > > >>>
>> > >> > >> >>> > > >
>> > >> > >> >>> >
>> > >> > >> >>
>> > >> > >> >
>> > >> > >>
>> > >> > >
>> > >> >
>> > >
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Andy Grove <an...@gmail.com>.
I was wondering if there had been any momentum on this (the BiDirectional
RPC design)?

I'm interested in this for the use case of Apache Spark sending a stream of
data to another process to invoke custom code and then receive a stream
back with the transformed data.

Thanks,

Andy.



On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org> wrote:

> I support moving forward with the current proposal.
>
> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>
> > Just following up here again, any other thoughts?
> >
> > I think we do have justifications for potentially separate streams in
> > a call, but that's more of an orthogonal question - it doesn't need to
> > be addressed here. I do agree that it very much complicates things.
> >
> > Thanks,
> > David
> >
> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > > I would generally agree with this. Note that you have the possibility
> > > to use unions-of-structs to send record batches with different schemas
> > > in the same stream, though with some added complexity on each side
> > >
> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> > wrote:
> > >>
> > >> I'd vote for explicitly not supported. We should keep our primitives
> > >> narrow.
> > >>
> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> > >>
> > >> > Thanks for the feedback.
> > >> >
> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> > >> > there are a lot of places where things could be made more ergonomic,
> > >> > including with the metadata fields. But it would also have locked
> out
> > >> > us of potential future transports.
> > >> >
> > >> > On another note: I hesitate to put too much into this method, but we
> > >> > are looking at use cases where potentially, a client may want to
> > >> > upload multiple distinct datasets (with differing schemas). (This
> is a
> > >> > little tentative, and I can get more details...) Right now, each
> > >> > logical stream in Flight must have a single, consistent schema;
> would
> > >> > it make sense to look at ways to relax this, or declare this
> > >> > explicitly out of scope (and require multiple calls and coordination
> > >> > with the deployment topology) in order to accomplish this?
> > >> >
> > >> > Best,
> > >> > David
> > >> >
> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> looks
> > >> > > good
> > >> > > to me.
> > >> > >
> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > >> I've updated the proposal.
> > >> > >>
> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> > >> > >> errors/metadata, I still think using bytes is preferable:
> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> Protobuf
> > >> > types,
> > >> > >> - We wouldn't be able to practically expose the Protobuf field to
> > >> > >> C++
> > >> > >> users without causing build pains,
> > >> > >> - We can't let Python users take advantage of the Protobuf field
> > >> > >> without somehow being compatible with the Protobuf wheels (by
> > >> > >> linking
> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> into
> > >> > >> the Python ones),
> > >> > >> - All our other application-defined fields are already bytes.
> > >> > >>
> > >> > >> Applications that want structure can encode JSON or Protobuf Any
> > >> > >> into
> > >> > >> the bytes field themselves, much as you can already do for
> Ticket,
> > >> > >> commands in FlightDescriptors, and application metadata in
> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> using
> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
> > must
> > >> > >> invoke the Protobuf deserializer again to read the actual
> message.
> > >> > >>
> > >> > >> If we decide on using bytes, then I don't think it makes sense to
> > >> > >> define a new message with a oneof either, since it would be
> > >> > >> redundant.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> David
> > >> > >>
> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> > >> > >> > I've been extremely backlogged, I will update the proposal
> when I
> > >> > >> > get
> > >> > >> > a chance and reply here when done.
> > >> > >> >
> > >> > >> > Best,
> > >> > >> > David
> > >> > >> >
> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
> It
> > >> > >> >> seems
> > >> > >> >> there are still some questions here, could we summarize what
> are
> > >> > >> >> the
> > >> > >> >> alternatives along with any public API implications so we can
> > try
> > >> > >> >> to
> > >> > >> >> render a decision?
> > >> > >> >>
> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> li.davidm96@gmail.com
> > >
> > >> > >> >> wrote:
> > >> > >> >>>
> > >> > >> >>> Hi Wes,
> > >> > >> >>>
> > >> > >> >>> Responses inline:
> > >> > >> >>>
> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> wesmckinn@gmail.com>
> > >> > wrote:
> > >> > >> >>>
> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> > >> > >> >>> > <li...@gmail.com>
> > >> > >> >>> > wrote:
> > >> > >> >>> > >
> > >> > >> >>> > > The question is whether to repurpose the existing
> > FlightData
> > >> > >> >>> > > structure, and allow for the metadata field to be filled
> in
> > >> > >> >>> > > and
> > >> > >> data
> > >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> > >> > >> FlightData
> > >> > >> >>> > > structure in another structure that explicitly
> > distinguishes
> > >> > >> between
> > >> > >> >>> > > control and data messages.
> > >> > >> >>> >
> > >> > >> >>> > I'm not super against having metadata-only FlightData with
> > >> > >> >>> > empty
> > >> > >> body.
> > >> > >> >>> > One question to consider is what changes (if any) would
> need
> > to
> > >> > >> >>> > be
> > >> > >> >>> > made to public APIs in either scenario.
> > >> > >> >>> >
> > >> > >> >>>
> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
> data
> > >> > >> >>> messages
> > >> > >> >>> in
> > >> > >> >>> the future. This would be a breaking change, but wouldn't
> > change
> > >> > >> >>> the
> > >> > >> >>> wire
> > >> > >> >>> format. I think the APIs could be changed backwards
> compatibly,
> > >> > >> >>> though.
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> > > The other question is how to handle the metadata fields.
> So
> > >> > >> >>> > > far,
> > >> > >> >>> > > we've
> > >> > >> >>> > > used bytestring fields for application-defined data. This
> > is
> > >> > >> >>> > > workable
> > >> > >> >>> > > if you want to use Protobuf to define the contents of
> those
> > >> > >> >>> > > fields,
> > >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from
> the
> > >> > >> >>> > > bytestring
> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> > >> > >> >>> > > dynamically
> > >> > >> >>> > > typed
> > >> > >> >>> > > field, this would be more convenient, but then we'd be
> > >> > >> >>> > > exposing
> > >> > >> >>> > > Protobuf types. We could alternatively use a combination
> of
> > >> > >> >>> > > a
> > >> > >> >>> > > type
> > >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> > >> > >> >>> > > Any
> > >> > >> >>> > > type
> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> > cleaner
> > >> > >> >>> > > in
> > >> > >> any
> > >> > >> >>> > > of the language APIs, though.
> > >> > >> >>> >
> > >> > >> >>> > Leaving the deserialization of the app metadata to the
> > >> > >> >>> > particular
> > >> > >> >>> > Flight implementation seems on first principles like the
> most
> > >> > >> flexible
> > >> > >> >>> > thing, if Any is used, does that mean the metadata _must_
> be
> > a
> > >> > >> >>> > protobuf?
> > >> > >> >>> >
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> If Any is used, we could still expose a bytes-based API, but
> it
> > >> > would
> > >> > >> >>> have
> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then
> > the
> > >> > >> >>> question
> > >> > >> >>> would just be how to expose this (would be easier in Java,
> > harder
> > >> > >> >>> in
> > >> > >> >>> C++).
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> > > David
> > >> > >> >>> > >
> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > >> > >> >>> > > >
> > >> > >> >>> > > > Can one of you explain what is being proposed in
> > >> > >> >>> > > > non-protobuf
> > >> > >> >>> > > > terms?
> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> > Flight.
> > >> > >> >>> > > >
> > >> > >> >>> > > > Regards
> > >> > >> >>> > > >
> > >> > >> >>> > > > Antoine.
> > >> > >> >>> > > >
> > >> > >> >>> > > >
> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> > would
> > >> > just
> > >> > >> be
> > >> > >> >>> > > >> application-level logic. (The official guide doesn't
> > even
> > >> > >> mention
> > >> > >> >>> > > >> it
> > >> > >> >>> > > >> in the encoding docs; I found
> > >> > >> >>> > > >>
> > >> > >> >>> >
> > >> > >>
> > >> >
> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> > >> >>> > > >> as well.)
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> > >> > >> >>> > > >> essentially
> > >> > >> >>> > > >> inlining
> > >> > >> >>> > > >> the definition of Any, e.g.
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> message FlightMessage {
> > >> > >> >>> > > >>   oneof message {
> > >> > >> >>> > > >>     FlightData data = 1;
> > >> > >> >>> > > >>     FlightAny metadata = 2;
> > >> > >> >>> > > >>   }
> > >> > >> >>> > > >> }
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> message FlightAny {
> > >> > >> >>> > > >>   string type = 1;
> > >> > >> >>> > > >>   bytes data = 2;
> > >> > >> >>> > > >> }
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> Is this correct?
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> It might be nice to consider the wrapper message for
> > >> > >> >>> > > >> DoGet/DoPut
> > >> > >> >>> > > >> as
> > >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
> > with
> > >> > >> >>> > > >> all
> > >> > >> >>> > > >> of
> > >> > >> >>> > > >> them,
> > >> > >> >>> > > >> rather than have one of the three methods do its own
> > >> > >> >>> > > >> thing.
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> Thanks,
> > >> > >> >>> > > >> David
> > >> > >> >>> > > >>
> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> wrote:
> > >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> > >> > >> >>> > > >>> without
> > >> > >> >>> > > >>> exposing
> > >> > >> >>> > the
> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
> could
> > >> > >> >>> > > >>> expose
> > >> > >> >>> > > >>> as
> > >> > >> >>> > > >>> two
> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
> > >> > >> >>> > > >>> whatever
> > >> > >> >>> > > >>> but
> > >> > >> >>> > > >>> if
> > >> > >> >>> > > >>> people
> > >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically
> a
> > >> > >> >>> > > >>> user
> > >> > >> >>> > > >>> could
> > >> > >> >>> > > >>> use
> > >> > >> >>> > > >>> any
> > >> > >> >>> > > >>> with it easily but they could also use any other
> > >> > >> >>> > > >>> mechanism).
> > >> > >> >>> > > >>> At
> > >> > >> >>> > least in
> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
> other
> > >> > >> language
> > >> > >> >>> > > >>> bindings
> > >> > >> >>> > > >>> less diy?
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> > >> > >> >>> > > >>> metadata
> > >> > >> >>> > > >>> but
> > >> > >> >>> > > >>> it
> > >> > >> >>> > just
> > >> > >> >>> > > >>> seemed a bit janky.
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
> > thing,
> > >> > >> >>> > > >>> I
> > >> > >> >>> > > >>> wonder
> > >> > >> >>> > > >>> if
> > >> > >> >>> > we
> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> > property
> > >> > >> >>> > > >>> if
> > >> > >> >>> > > >>> we
> > >> > >> >>> > think it
> > >> > >> >>> > > >>> is
> > >> > >> >>> > > >>> a good idea...
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> > >> > >> li.davidm96@gmail.com>
> > >> > >> >>> > wrote:
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >>>> I was definitely considering having control messages
> > >> > without
> > >> > >> >>> > > >>>> data,
> > >> > >> >>> > and
> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> > >> > >> >>> > > >>>> only
> > >> > >> >>> > app_metadata
> > >> > >> >>> > > >>>> set. I think I understand your position now:
> > FlightData
> > >> > >> >>> > > >>>> should
> > >> > >> >>> > always
> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> > >> > >> >>> > > >>>> documentation
> > >> > >> >>> > > >>>> on
> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> > >> > >> >>> > > >>>> having
> > >> > >> >>> > > >>>> a
> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
> from
> > >> > >> >>> > > >>>> happening,
> > >> > >> >>> > and
> > >> > >> >>> > > >>>> overall having a clear separation between data and
> > >> > >> >>> > > >>>> control
> > >> > >> >>> > > >>>> messages
> > >> > >> >>> > is
> > >> > >> >>> > > >>>> cleaner.
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> > >> > >> >>> > > >>>> from
> > >> > >> >>> > > >>>> exposing
> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
> that
> > >> > >> >>> > > >>>> now?
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> Best,
> > >> > >> >>> > > >>>> David
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> > wrote:
> > >> > >> >>> > > >>>>> Hey David,
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> > >> > >> >>> > > >>>>> for
> > >> > >> >>> > > >>>>> doget/doput
> > >> > >> >>> > > >>>>> for
> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> > async
> > >> > >> >>> > > >>>>> always
> > >> > >> >>> > pattern.
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> > >> > >> >>> > > >>>>> overloading
> > >> > >> >>> > > >>>>> the
> > >> > >> >>> > > >>>> message
> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> control
> > >> > >> >>> > > >>>>> message
> > >> > >> >>> > > >>>> independently
> > >> > >> >>> > > >>>>> of the data message, I would have to define
> something
> > >> > >> >>> > > >>>>> like
> > >> > >> >>> > > >>>>> an
> > >> > >> >>> > > >>>>> empty
> > >> > >> >>> > > >>>> flight
> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> > support
> > >> > >> >>> > > >>>>> a
> > >> > >> >>> > > >>>>> container
> > >> > >> >>> > > >>>>> object
> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
> users
> > >> > >> >>> > > >>>>> can
> > >> > >> >>> > > >>>>> add
> > >> > >> >>> > > >>>>> more
> > >> > >> >>> > data
> > >> > >> >>> > > >>>>> as
> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for the
> Any
> > >> > >> >>> > > >>>>> messages.
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> > >> > >> >>> > > >>>>> <li...@gmail.com>
> > >> > >> >>> > > >>>>> wrote:
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>>> Hi Jacques,
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> Thanks for the comments.
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
> result
> > >> > >> >>> > > >>>>>> of
> > >> > >> prior
> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message
> to
> > >> > carry
> > >> > >> >>> > > >>>>>> that
> > >> > >> >>> > kind
> > >> > >> >>> > > >>>>>> of information.
> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> > handle
> > >> > >> >>> > > >>>>>> incoming
> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually
> > be
> > >> > >> >>> > > >>>>>> quite
> > >> > >> >>> > natural
> > >> > >> >>> > > >>>>>> to
> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
> not
> > >> > >> >>> > > >>>>>> possible
> > >> > >> >>> > > >>>>>> in
> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> > >> > essence,
> > >> > >> >>> > gRPC-Java
> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
> There
> > >> > >> >>> > > >>>>>> are
> > >> > >> >>> > experimental
> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> > >> > >> >>> > > >>>>>> Java,
> > >> > >> >>> > > >>>>>> but
> > >> > >> >>> > > >>>>>> those
> > >> > >> >>> > > >>>>>> are
> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> > still
> > >> > >> >>> > > >>>>>> under
> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
> which
> > >> > >> >>> > > >>>>>> have
> > >> > >> been
> > >> > >> >>> > around
> > >> > >> >>> > > >>>>>> for quite a while).
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> Thanks,
> > >> > >> >>> > > >>>>>> David
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> > >> > >> >>> > > >>>>>> wrote:
> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> > >> > >> >>> > > >>>>>>> worth
> > >> > >> >>> > > >>>>>>> discussion
> > >> > >> >>> > > >>>>>>> here
> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> > >> > >> >>> > > >>>> wrote:
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> provide a
> > >> > >> >>> > > >>>>>>>> Python
> > >> > >> >>> > strawman.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> > could
> > >> > >> >>> > > >>>>>>>> interleave
> > >> > >> >>> > > >>>> uploads
> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
> now,
> > >> > >> >>> > > >>>>>>>> synchronous
> > >> > >> >>> > APIs
> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> > >> > >> >>> > > >>>>>>>> server
> > >> > >> >>> > > >>>>>>>> wait
> > >> > >> >>> > > >>>>>>>> for
> > >> > >> >>> > each
> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> > >> > >> >>> > > >>>>>>>> doesn't
> > >> > >> >>> > > >>>>>>>> give
> > >> > >> >>> > > >>>>>>>> us
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> overall
> > >> > >> >>> > > >>>>>>>> timeout.)
> > >> > >> >>> > > >>>>>>>> As
> > >> > >> >>> > an
> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> > >> > >> >>> > > >>>>>>>>
> https://issues.apache.org/jira/browse/ARROW-6063
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> > >> > >> >>> > > >>>>>>>> will
> > >> > >> >>> > > >>>>>>>> want
> > >> > >> >>> > > >>>>>>>> to
> > >> > >> >>> > design
> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> > >> > bidirectional
> > >> > >> >>> > > >>>>>>>> stream
> > >> > >> >>> > > >>>>>>>> like
> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> > >> > >> >>> > > >>>>>>>> easier
> > >> > >> >>> > > >>>>>>>> to
> > >> > >> >>> > > >>>>>>>> run
> > >> > >> >>> > into.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> > >> > >> >>> > > >>>>>>>> proposal,
> > >> > but
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > main
> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> > >> > >> >>> > > >>>>>>>> separate
> > >> > >> >>> > > >>>>>>>> calls
> > >> > >> >>> > > >>>>>>>> could
> > >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> > >> > >> >>> > > >>>>>>>> gRPC
> > >> > >> >>> > > >>>>>>>> has
> > >> > >> >>> > > >>>>>>>> some
> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away
> in
> > >> > >> >>> > > >>>>>>>> between
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > two
> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> > >> > instance
> > >> > >> >>> > available,
> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> > server
> > >> > >> without
> > >> > >> >>> > > >>>>>>>> warning.
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> Thanks,
> > >> > >> >>> > > >>>>>>>> David
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> > wrote:
> > >> > >> >>> > > >>>>>>>>> Hey David,
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
> > like
> > >> > >> >>> > > >>>>>>>>> it
> > >> > >> >>> > > >>>>>>>>> and
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> possibility
> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> > >> > >> >>> > > >>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>> would
> > >> > >> >>> > > >>>>>>>>> help
> > >> > >> >>> > me
> > >> > >> >>> > > >>>>>> would
> > >> > >> >>> > > >>>>>>>> be
> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
> use
> > >> > >> >>> > > >>>>>>>>> case.
> > >> > >> >>> > > >>>>>>>>> Also,
> > >> > >> >>> > what
> > >> > >> >>> > > >>>>>> would
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> > >> > >> >>> > > >>>>>>>>> Would
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client
> > >> > >> >>> > > >>>>>>>>> block
> > >> > >> >>> > > >>>>>>>> till
> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> > >> > >> flight.vector_mult(a,
> > >> > >> >>> > > >>>>>>>>> b)
> > >> > >> >>> > or
> > >> > >> >>> > > >>>>>>>>> would
> > >> > >> >>> > > >>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> > >> > >> >>> > > >>>>>>>>> done.
> > >> > >> >>> > > >>>>>>>>> If
> > >> > >> >>> > > >>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>> later
> > >> > >> >>> > > >>>>>>>>> how
> > >> > >> >>> > > >>>>>>>>> is
> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> suppose
> > >> > >> >>> > > >>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>> this
> > >> > >> >>> > could
> > >> > >> >>> > > >>>> be
> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> > >> > >> >>> > > >>>>>>>>> but
> > >> > >> rather
> > >> > >> >>> > > >>>>>>>>> by a
> > >> > >> >>> > > >>>>>>>>> function/util?
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Best,
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Ryan
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > >> > >> >>> > li.davidm96@gmail.com>
> > >> > >> >>> > > >>>>>> wrote:
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Hi all,
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> > >> > >> >>> > > >>>>>>>>>> far,
> > >> > but
> > >> > >> we
> > >> > >> >>> > > >>>>>>>>>> have
> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
> being
> > >> > >> >>> > > >>>>>>>>>> able
> > >> > >> >>> > > >>>>>>>>>> to
> > >> > >> >>> > > >>>>>>>>>> both
> > >> > >> >>> > > >>>>>>>>>> send
> > >> > >> >>> > > >>>>>>>>>> and
> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call.
> To
> > >> > >> >>> > > >>>>>>>>>> that
> > >> > >> >>> > > >>>>>>>>>> end,
> > >> > >> >>> > I've
> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>
> > >> > >> >>> >
> > >> > >>
> > >> >
> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> comment
> > >> > >> >>> > > >>>>>>>>>> on
> > >> > the
> > >> > >> >>> > document.
> > >> > >> >>> > > >>>>>>>>>> I'd
> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> > >> > >> >>> > > >>>>>>>>>> relatively
> > >> > >> >>> > > >>>>>>>>>> straightforward
> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> > vote.
> > >> > I've
> > >> > >> >>> > > >>>>>>>>>> decided
> > >> > >> >>> > > >>>>>>>>>> to
> > >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed
> (on
> > >> > >> >>> > > >>>>>>>>>> DoPut),
> > >> > >> >>> > > >>>>>>>>>> as
> > >> > >> >>> > > >>>>>>>>>> it
> > >> > >> >>> > > >>>>>> doesn't
> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> > >> > >> >>> > > >>>>>>>>>> interpretation
> > >> > of
> > >> > >> >>> > > >>>>>>>>>> the
> > >> > >> >>> > > >>>>>>>>>> semantics.
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>> Thanks,
> > >> > >> >>> > > >>>>>>>>>> David
> > >> > >> >>> > > >>>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> --
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> > >> > >> join
> > >> > >> >>> > > >>>>>>>>> our
> > >> > >> >>> > > >>>>>>>>> community
> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> Download
> > >> > Dremio
> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> > >> > >> >>> > > >>>>>>>>>
> > >> > >> >>> > > >>>>>>>>
> > >> > >> >>> > > >>>>>>>
> > >> > >> >>> > > >>>>>>
> > >> > >> >>> > > >>>>>
> > >> > >> >>> > > >>>>
> > >> > >> >>> > > >>>
> > >> > >> >>> > > >
> > >> > >> >>> >
> > >> > >> >>
> > >> > >> >
> > >> > >>
> > >> > >
> > >> >
> > >
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I support moving forward with the current proposal.

On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:

> Just following up here again, any other thoughts?
>
> I think we do have justifications for potentially separate streams in
> a call, but that's more of an orthogonal question - it doesn't need to
> be addressed here. I do agree that it very much complicates things.
>
> Thanks,
> David
>
> On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> > I would generally agree with this. Note that you have the possibility
> > to use unions-of-structs to send record batches with different schemas
> > in the same stream, though with some added complexity on each side
> >
> > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>
> >> I'd vote for explicitly not supported. We should keep our primitives
> >> narrow.
> >>
> >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
> >>
> >> > Thanks for the feedback.
> >> >
> >> > I do think if we had explicitly embraced gRPC from the beginning,
> >> > there are a lot of places where things could be made more ergonomic,
> >> > including with the metadata fields. But it would also have locked out
> >> > us of potential future transports.
> >> >
> >> > On another note: I hesitate to put too much into this method, but we
> >> > are looking at use cases where potentially, a client may want to
> >> > upload multiple distinct datasets (with differing schemas). (This is a
> >> > little tentative, and I can get more details...) Right now, each
> >> > logical stream in Flight must have a single, consistent schema; would
> >> > it make sense to look at ways to relax this, or declare this
> >> > explicitly out of scope (and require multiple calls and coordination
> >> > with the deployment topology) in order to accomplish this?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > > Fair enough. I'm okay with the bytes approach and the proposal looks
> >> > > good
> >> > > to me.
> >> > >
> >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I've updated the proposal.
> >> > >>
> >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> > >> errors/metadata, I still think using bytes is preferable:
> >> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> >> > types,
> >> > >> - We wouldn't be able to practically expose the Protobuf field to
> >> > >> C++
> >> > >> users without causing build pains,
> >> > >> - We can't let Python users take advantage of the Protobuf field
> >> > >> without somehow being compatible with the Protobuf wheels (by
> >> > >> linking
> >> > >> to the same version, and doing magic to turn the C++ Protobufs into
> >> > >> the Python ones),
> >> > >> - All our other application-defined fields are already bytes.
> >> > >>
> >> > >> Applications that want structure can encode JSON or Protobuf Any
> >> > >> into
> >> > >> the bytes field themselves, much as you can already do for Ticket,
> >> > >> commands in FlightDescriptors, and application metadata in
> >> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> > >> Any directly, since Any itself is a bytes field with a tag, and
> must
> >> > >> invoke the Protobuf deserializer again to read the actual message.
> >> > >>
> >> > >> If we decide on using bytes, then I don't think it makes sense to
> >> > >> define a new message with a oneof either, since it would be
> >> > >> redundant.
> >> > >>
> >> > >> Thanks,
> >> > >> David
> >> > >>
> >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > >> > I've been extremely backlogged, I will update the proposal when I
> >> > >> > get
> >> > >> > a chance and reply here when done.
> >> > >> >
> >> > >> > Best,
> >> > >> > David
> >> > >> >
> >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> > >> >> Bumping this discussion since a couple of weeks have passed. It
> >> > >> >> seems
> >> > >> >> there are still some questions here, could we summarize what are
> >> > >> >> the
> >> > >> >> alternatives along with any public API implications so we can
> try
> >> > >> >> to
> >> > >> >> render a decision?
> >> > >> >>
> >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.davidm96@gmail.com
> >
> >> > >> >> wrote:
> >> > >> >>>
> >> > >> >>> Hi Wes,
> >> > >> >>>
> >> > >> >>> Responses inline:
> >> > >> >>>
> >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > >> >>>
> >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> > >> >>> > <li...@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > >
> >> > >> >>> > > The question is whether to repurpose the existing
> FlightData
> >> > >> >>> > > structure, and allow for the metadata field to be filled in
> >> > >> >>> > > and
> >> > >> data
> >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> >> > >> FlightData
> >> > >> >>> > > structure in another structure that explicitly
> distinguishes
> >> > >> between
> >> > >> >>> > > control and data messages.
> >> > >> >>> >
> >> > >> >>> > I'm not super against having metadata-only FlightData with
> >> > >> >>> > empty
> >> > >> body.
> >> > >> >>> > One question to consider is what changes (if any) would need
> to
> >> > >> >>> > be
> >> > >> >>> > made to public APIs in either scenario.
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> > >> >>> messages
> >> > >> >>> in
> >> > >> >>> the future. This would be a breaking change, but wouldn't
> change
> >> > >> >>> the
> >> > >> >>> wire
> >> > >> >>> format. I think the APIs could be changed backwards compatibly,
> >> > >> >>> though.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > The other question is how to handle the metadata fields. So
> >> > >> >>> > > far,
> >> > >> >>> > > we've
> >> > >> >>> > > used bytestring fields for application-defined data. This
> is
> >> > >> >>> > > workable
> >> > >> >>> > > if you want to use Protobuf to define the contents of those
> >> > >> >>> > > fields,
> >> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> > >> >>> > > bytestring
> >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> > >> >>> > > dynamically
> >> > >> >>> > > typed
> >> > >> >>> > > field, this would be more convenient, but then we'd be
> >> > >> >>> > > exposing
> >> > >> >>> > > Protobuf types. We could alternatively use a combination of
> >> > >> >>> > > a
> >> > >> >>> > > type
> >> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
> >> > >> >>> > > Any
> >> > >> >>> > > type
> >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> cleaner
> >> > >> >>> > > in
> >> > >> any
> >> > >> >>> > > of the language APIs, though.
> >> > >> >>> >
> >> > >> >>> > Leaving the deserialization of the app metadata to the
> >> > >> >>> > particular
> >> > >> >>> > Flight implementation seems on first principles like the most
> >> > >> flexible
> >> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be
> a
> >> > >> >>> > protobuf?
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> >> > would
> >> > >> >>> have
> >> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then
> the
> >> > >> >>> question
> >> > >> >>> would just be how to expose this (would be easier in Java,
> harder
> >> > >> >>> in
> >> > >> >>> C++).
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> > > David
> >> > >> >>> > >
> >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> > >> >>> > > >
> >> > >> >>> > > > Can one of you explain what is being proposed in
> >> > >> >>> > > > non-protobuf
> >> > >> >>> > > > terms?
> >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> Flight.
> >> > >> >>> > > >
> >> > >> >>> > > > Regards
> >> > >> >>> > > >
> >> > >> >>> > > > Antoine.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> would
> >> > just
> >> > >> be
> >> > >> >>> > > >> application-level logic. (The official guide doesn't
> even
> >> > >> mention
> >> > >> >>> > > >> it
> >> > >> >>> > > >> in the encoding docs; I found
> >> > >> >>> > > >>
> >> > >> >>> >
> >> > >>
> >> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> > >> >>> > > >> as well.)
> >> > >> >>> > > >>
> >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> > >> >>> > > >> essentially
> >> > >> >>> > > >> inlining
> >> > >> >>> > > >> the definition of Any, e.g.
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightMessage {
> >> > >> >>> > > >>   oneof message {
> >> > >> >>> > > >>     FlightData data = 1;
> >> > >> >>> > > >>     FlightAny metadata = 2;
> >> > >> >>> > > >>   }
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> message FlightAny {
> >> > >> >>> > > >>   string type = 1;
> >> > >> >>> > > >>   bytes data = 2;
> >> > >> >>> > > >> }
> >> > >> >>> > > >>
> >> > >> >>> > > >> Is this correct?
> >> > >> >>> > > >>
> >> > >> >>> > > >> It might be nice to consider the wrapper message for
> >> > >> >>> > > >> DoGet/DoPut
> >> > >> >>> > > >> as
> >> > >> >>> > > >> well, but at that point, I'd rather we be consistent
> with
> >> > >> >>> > > >> all
> >> > >> >>> > > >> of
> >> > >> >>> > > >> them,
> >> > >> >>> > > >> rather than have one of the three methods do its own
> >> > >> >>> > > >> thing.
> >> > >> >>> > > >>
> >> > >> >>> > > >> Thanks,
> >> > >> >>> > > >> David
> >> > >> >>> > > >>
> >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> > >> >>> > > >>> I think we could probably expose the oneof behavior
> >> > >> >>> > > >>> without
> >> > >> >>> > > >>> exposing
> >> > >> >>> > the
> >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> > >> >>> > > >>> expose
> >> > >> >>> > > >>> as
> >> > >> >>> > > >>> two
> >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >> > >> >>> > > >>> whatever
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> people
> >> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
> >> > >> >>> > > >>> user
> >> > >> >>> > > >>> could
> >> > >> >>> > > >>> use
> >> > >> >>> > > >>> any
> >> > >> >>> > > >>> with it easily but they could also use any other
> >> > >> >>> > > >>> mechanism).
> >> > >> >>> > > >>> At
> >> > >> >>> > least in
> >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> > >> language
> >> > >> >>> > > >>> bindings
> >> > >> >>> > > >>> less diy?
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >> > >> >>> > > >>> metadata
> >> > >> >>> > > >>> but
> >> > >> >>> > > >>> it
> >> > >> >>> > just
> >> > >> >>> > > >>> seemed a bit janky.
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> Thinking about the control message/wrapper object
> thing,
> >> > >> >>> > > >>> I
> >> > >> >>> > > >>> wonder
> >> > >> >>> > > >>> if
> >> > >> >>> > we
> >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> property
> >> > >> >>> > > >>> if
> >> > >> >>> > > >>> we
> >> > >> >>> > think it
> >> > >> >>> > > >>> is
> >> > >> >>> > > >>> a good idea...
> >> > >> >>> > > >>>
> >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> > >> li.davidm96@gmail.com>
> >> > >> >>> > wrote:
> >> > >> >>> > > >>>
> >> > >> >>> > > >>>> I was definitely considering having control messages
> >> > without
> >> > >> >>> > > >>>> data,
> >> > >> >>> > and
> >> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
> >> > >> >>> > > >>>> only
> >> > >> >>> > app_metadata
> >> > >> >>> > > >>>> set. I think I understand your position now:
> FlightData
> >> > >> >>> > > >>>> should
> >> > >> >>> > always
> >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> > >> >>> > > >>>> documentation
> >> > >> >>> > > >>>> on
> >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
> >> > >> >>> > > >>>> having
> >> > >> >>> > > >>>> a
> >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> > >> >>> > > >>>> happening,
> >> > >> >>> > and
> >> > >> >>> > > >>>> overall having a clear separation between data and
> >> > >> >>> > > >>>> control
> >> > >> >>> > > >>>> messages
> >> > >> >>> > is
> >> > >> >>> > > >>>> cleaner.
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
> >> > >> >>> > > >>>> from
> >> > >> >>> > > >>>> exposing
> >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
> >> > >> >>> > > >>>> now?
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> Best,
> >> > >> >>> > > >>>> David
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> wrote:
> >> > >> >>> > > >>>>> Hey David,
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> doget/doput
> >> > >> >>> > > >>>>> for
> >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> async
> >> > >> >>> > > >>>>> always
> >> > >> >>> > pattern.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> > >> >>> > > >>>>> overloading
> >> > >> >>> > > >>>>> the
> >> > >> >>> > > >>>> message
> >> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> > >> >>> > > >>>>> message
> >> > >> >>> > > >>>> independently
> >> > >> >>> > > >>>>> of the data message, I would have to define something
> >> > >> >>> > > >>>>> like
> >> > >> >>> > > >>>>> an
> >> > >> >>> > > >>>>> empty
> >> > >> >>> > > >>>> flight
> >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> support
> >> > >> >>> > > >>>>> a
> >> > >> >>> > > >>>>> container
> >> > >> >>> > > >>>>> object
> >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
> >> > >> >>> > > >>>>> can
> >> > >> >>> > > >>>>> add
> >> > >> >>> > > >>>>> more
> >> > >> >>> > data
> >> > >> >>> > > >>>>> as
> >> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> > >> >>> > > >>>>> messages.
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> > >> >>> > > >>>>> <li...@gmail.com>
> >> > >> >>> > > >>>>> wrote:
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>>> Hi Jacques,
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
> >> > >> >>> > > >>>>>> of
> >> > >> prior
> >> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> >> > carry
> >> > >> >>> > > >>>>>> that
> >> > >> >>> > kind
> >> > >> >>> > > >>>>>> of information.
> >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> handle
> >> > >> >>> > > >>>>>> incoming
> >> > >> >>> > > >>>>>> messages as the fundamental API; it would actually
> be
> >> > >> >>> > > >>>>>> quite
> >> > >> >>> > natural
> >> > >> >>> > > >>>>>> to
> >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> > >> >>> > > >>>>>> possible
> >> > >> >>> > > >>>>>> in
> >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> >> > essence,
> >> > >> >>> > gRPC-Java
> >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > experimental
> >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
> >> > >> >>> > > >>>>>> Java,
> >> > >> >>> > > >>>>>> but
> >> > >> >>> > > >>>>>> those
> >> > >> >>> > > >>>>>> are
> >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> still
> >> > >> >>> > > >>>>>> under
> >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
> >> > >> >>> > > >>>>>> have
> >> > >> been
> >> > >> >>> > around
> >> > >> >>> > > >>>>>> for quite a while).
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> Thanks,
> >> > >> >>> > > >>>>>> David
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
> >> > >> >>> > > >>>>>>> worth
> >> > >> >>> > > >>>>>>> discussion
> >> > >> >>> > > >>>>>>> here
> >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> > >> >>> > > >>>> wrote:
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> > >> >>> > > >>>>>>>> Python
> >> > >> >>> > strawman.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> could
> >> > >> >>> > > >>>>>>>> interleave
> >> > >> >>> > > >>>> uploads
> >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> > >> >>> > > >>>>>>>> synchronous
> >> > >> >>> > APIs
> >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
> >> > >> >>> > > >>>>>>>> server
> >> > >> >>> > > >>>>>>>> wait
> >> > >> >>> > > >>>>>>>> for
> >> > >> >>> > each
> >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >> > >> >>> > > >>>>>>>> doesn't
> >> > >> >>> > > >>>>>>>> give
> >> > >> >>> > > >>>>>>>> us
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> > >> >>> > > >>>>>>>> timeout.)
> >> > >> >>> > > >>>>>>>> As
> >> > >> >>> > an
> >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
> >> > >> >>> > > >>>>>>>> will
> >> > >> >>> > > >>>>>>>> want
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > design
> >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> > bidirectional
> >> > >> >>> > > >>>>>>>> stream
> >> > >> >>> > > >>>>>>>> like
> >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
> >> > >> >>> > > >>>>>>>> easier
> >> > >> >>> > > >>>>>>>> to
> >> > >> >>> > > >>>>>>>> run
> >> > >> >>> > into.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> > >> >>> > > >>>>>>>> proposal,
> >> > but
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > main
> >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> > >> >>> > > >>>>>>>> separate
> >> > >> >>> > > >>>>>>>> calls
> >> > >> >>> > > >>>>>>>> could
> >> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
> >> > >> >>> > > >>>>>>>> gRPC
> >> > >> >>> > > >>>>>>>> has
> >> > >> >>> > > >>>>>>>> some
> >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> > >> >>> > > >>>>>>>> between
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > two
> >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> >> > instance
> >> > >> >>> > available,
> >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> server
> >> > >> without
> >> > >> >>> > > >>>>>>>> warning.
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>> David
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> wrote:
> >> > >> >>> > > >>>>>>>>> Hey David,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I
> like
> >> > >> >>> > > >>>>>>>>> it
> >> > >> >>> > > >>>>>>>>> and
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> possibility
> >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>>> help
> >> > >> >>> > me
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> be
> >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
> >> > >> >>> > > >>>>>>>>> case.
> >> > >> >>> > > >>>>>>>>> Also,
> >> > >> >>> > what
> >> > >> >>> > > >>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
> >> > >> >>> > > >>>>>>>>> Would
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client
> >> > >> >>> > > >>>>>>>>> block
> >> > >> >>> > > >>>>>>>> till
> >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> > >> flight.vector_mult(a,
> >> > >> >>> > > >>>>>>>>> b)
> >> > >> >>> > or
> >> > >> >>> > > >>>>>>>>> would
> >> > >> >>> > > >>>>>>>> the
> >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
> >> > >> >>> > > >>>>>>>>> done.
> >> > >> >>> > > >>>>>>>>> If
> >> > >> >>> > > >>>>>>>>> the
> >> > >> >>> > > >>>>>>>>> later
> >> > >> >>> > > >>>>>>>>> how
> >> > >> >>> > > >>>>>>>>> is
> >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
> >> > >> >>> > > >>>>>>>>> that
> >> > >> >>> > > >>>>>>>>> this
> >> > >> >>> > could
> >> > >> >>> > > >>>> be
> >> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
> >> > >> >>> > > >>>>>>>>> but
> >> > >> rather
> >> > >> >>> > > >>>>>>>>> by a
> >> > >> >>> > > >>>>>>>>> function/util?
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Best,
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> > >> >>> > li.davidm96@gmail.com>
> >> > >> >>> > > >>>>>> wrote:
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Hi all,
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
> >> > >> >>> > > >>>>>>>>>> far,
> >> > but
> >> > >> we
> >> > >> >>> > > >>>>>>>>>> have
> >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
> >> > >> >>> > > >>>>>>>>>> able
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> both
> >> > >> >>> > > >>>>>>>>>> send
> >> > >> >>> > > >>>>>>>>>> and
> >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
> >> > >> >>> > > >>>>>>>>>> that
> >> > >> >>> > > >>>>>>>>>> end,
> >> > >> >>> > I've
> >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> >
> >> > >>
> >> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
> >> > >> >>> > > >>>>>>>>>> on
> >> > the
> >> > >> >>> > document.
> >> > >> >>> > > >>>>>>>>>> I'd
> >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >> > >> >>> > > >>>>>>>>>> relatively
> >> > >> >>> > > >>>>>>>>>> straightforward
> >> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> vote.
> >> > I've
> >> > >> >>> > > >>>>>>>>>> decided
> >> > >> >>> > > >>>>>>>>>> to
> >> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> > >> >>> > > >>>>>>>>>> DoPut),
> >> > >> >>> > > >>>>>>>>>> as
> >> > >> >>> > > >>>>>>>>>> it
> >> > >> >>> > > >>>>>> doesn't
> >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> > >> >>> > > >>>>>>>>>> interpretation
> >> > of
> >> > >> >>> > > >>>>>>>>>> the
> >> > >> >>> > > >>>>>>>>>> semantics.
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>> Thanks,
> >> > >> >>> > > >>>>>>>>>> David
> >> > >> >>> > > >>>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> --
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> > >> join
> >> > >> >>> > > >>>>>>>>> our
> >> > >> >>> > > >>>>>>>>> community
> >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> >> > Dremio
> >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> > >> >>> > > >>>>>>>>>
> >> > >> >>> > > >>>>>>>>
> >> > >> >>> > > >>>>>>>
> >> > >> >>> > > >>>>>>
> >> > >> >>> > > >>>>>
> >> > >> >>> > > >>>>
> >> > >> >>> > > >>>
> >> > >> >>> > > >
> >> > >> >>> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Just following up here again, any other thoughts?

I think we do have justifications for potentially separate streams in
a call, but that's more of an orthogonal question - it doesn't need to
be addressed here. I do agree that it very much complicates things.

Thanks,
David

On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> I would generally agree with this. Note that you have the possibility
> to use unions-of-structs to send record batches with different schemas
> in the same stream, though with some added complexity on each side
>
> On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
>>
>> I'd vote for explicitly not supported. We should keep our primitives
>> narrow.
>>
>> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
>>
>> > Thanks for the feedback.
>> >
>> > I do think if we had explicitly embraced gRPC from the beginning,
>> > there are a lot of places where things could be made more ergonomic,
>> > including with the metadata fields. But it would also have locked out
>> > us of potential future transports.
>> >
>> > On another note: I hesitate to put too much into this method, but we
>> > are looking at use cases where potentially, a client may want to
>> > upload multiple distinct datasets (with differing schemas). (This is a
>> > little tentative, and I can get more details...) Right now, each
>> > logical stream in Flight must have a single, consistent schema; would
>> > it make sense to look at ways to relax this, or declare this
>> > explicitly out of scope (and require multiple calls and coordination
>> > with the deployment topology) in order to accomplish this?
>> >
>> > Best,
>> > David
>> >
>> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > Fair enough. I'm okay with the bytes approach and the proposal looks
>> > > good
>> > > to me.
>> > >
>> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com>
>> > > wrote:
>> > >
>> > >> I've updated the proposal.
>> > >>
>> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> > >> errors/metadata, I still think using bytes is preferable:
>> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
>> > types,
>> > >> - We wouldn't be able to practically expose the Protobuf field to
>> > >> C++
>> > >> users without causing build pains,
>> > >> - We can't let Python users take advantage of the Protobuf field
>> > >> without somehow being compatible with the Protobuf wheels (by
>> > >> linking
>> > >> to the same version, and doing magic to turn the C++ Protobufs into
>> > >> the Python ones),
>> > >> - All our other application-defined fields are already bytes.
>> > >>
>> > >> Applications that want structure can encode JSON or Protobuf Any
>> > >> into
>> > >> the bytes field themselves, much as you can already do for Ticket,
>> > >> commands in FlightDescriptors, and application metadata in
>> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
>> > >> Any directly, since Any itself is a bytes field with a tag, and must
>> > >> invoke the Protobuf deserializer again to read the actual message.
>> > >>
>> > >> If we decide on using bytes, then I don't think it makes sense to
>> > >> define a new message with a oneof either, since it would be
>> > >> redundant.
>> > >>
>> > >> Thanks,
>> > >> David
>> > >>
>> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > >> > I've been extremely backlogged, I will update the proposal when I
>> > >> > get
>> > >> > a chance and reply here when done.
>> > >> >
>> > >> > Best,
>> > >> > David
>> > >> >
>> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> > >> >> Bumping this discussion since a couple of weeks have passed. It
>> > >> >> seems
>> > >> >> there are still some questions here, could we summarize what are
>> > >> >> the
>> > >> >> alternatives along with any public API implications so we can try
>> > >> >> to
>> > >> >> render a decision?
>> > >> >>
>> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
>> > >> >> wrote:
>> > >> >>>
>> > >> >>> Hi Wes,
>> > >> >>>
>> > >> >>> Responses inline:
>> > >> >>>
>> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >> >>>
>> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> > >> >>> > <li...@gmail.com>
>> > >> >>> > wrote:
>> > >> >>> > >
>> > >> >>> > > The question is whether to repurpose the existing FlightData
>> > >> >>> > > structure, and allow for the metadata field to be filled in
>> > >> >>> > > and
>> > >> data
>> > >> >>> > > fields to be blank (as a control message), or to wrap the
>> > >> FlightData
>> > >> >>> > > structure in another structure that explicitly distinguishes
>> > >> between
>> > >> >>> > > control and data messages.
>> > >> >>> >
>> > >> >>> > I'm not super against having metadata-only FlightData with
>> > >> >>> > empty
>> > >> body.
>> > >> >>> > One question to consider is what changes (if any) would need to
>> > >> >>> > be
>> > >> >>> > made to public APIs in either scenario.
>> > >> >>> >
>> > >> >>>
>> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
>> > >> >>> messages
>> > >> >>> in
>> > >> >>> the future. This would be a breaking change, but wouldn't change
>> > >> >>> the
>> > >> >>> wire
>> > >> >>> format. I think the APIs could be changed backwards compatibly,
>> > >> >>> though.
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> > > The other question is how to handle the metadata fields. So
>> > >> >>> > > far,
>> > >> >>> > > we've
>> > >> >>> > > used bytestring fields for application-defined data. This is
>> > >> >>> > > workable
>> > >> >>> > > if you want to use Protobuf to define the contents of those
>> > >> >>> > > fields,
>> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
>> > >> >>> > > bytestring
>> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> > >> >>> > > dynamically
>> > >> >>> > > typed
>> > >> >>> > > field, this would be more convenient, but then we'd be
>> > >> >>> > > exposing
>> > >> >>> > > Protobuf types. We could alternatively use a combination of
>> > >> >>> > > a
>> > >> >>> > > type
>> > >> >>> > > field and a bytestring field, mimicking what the Protobuf
>> > >> >>> > > Any
>> > >> >>> > > type
>> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner
>> > >> >>> > > in
>> > >> any
>> > >> >>> > > of the language APIs, though.
>> > >> >>> >
>> > >> >>> > Leaving the deserialization of the app metadata to the
>> > >> >>> > particular
>> > >> >>> > Flight implementation seems on first principles like the most
>> > >> flexible
>> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
>> > >> >>> > protobuf?
>> > >> >>> >
>> > >> >>>
>> > >> >>>
>> > >> >>> If Any is used, we could still expose a bytes-based API, but it
>> > would
>> > >> >>> have
>> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
>> > >> >>> question
>> > >> >>> would just be how to expose this (would be easier in Java, harder
>> > >> >>> in
>> > >> >>> C++).
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> > > David
>> > >> >>> > >
>> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > >> >>> > > >
>> > >> >>> > > > Can one of you explain what is being proposed in
>> > >> >>> > > > non-protobuf
>> > >> >>> > > > terms?
>> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> > >> >>> > > >
>> > >> >>> > > > Regards
>> > >> >>> > > >
>> > >> >>> > > > Antoine.
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
>> > just
>> > >> be
>> > >> >>> > > >> application-level logic. (The official guide doesn't even
>> > >> mention
>> > >> >>> > > >> it
>> > >> >>> > > >> in the encoding docs; I found
>> > >> >>> > > >>
>> > >> >>> >
>> > >>
>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > >> >>> > > >> as well.)
>> > >> >>> > > >>
>> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> > >> >>> > > >> essentially
>> > >> >>> > > >> inlining
>> > >> >>> > > >> the definition of Any, e.g.
>> > >> >>> > > >>
>> > >> >>> > > >> message FlightMessage {
>> > >> >>> > > >>   oneof message {
>> > >> >>> > > >>     FlightData data = 1;
>> > >> >>> > > >>     FlightAny metadata = 2;
>> > >> >>> > > >>   }
>> > >> >>> > > >> }
>> > >> >>> > > >>
>> > >> >>> > > >> message FlightAny {
>> > >> >>> > > >>   string type = 1;
>> > >> >>> > > >>   bytes data = 2;
>> > >> >>> > > >> }
>> > >> >>> > > >>
>> > >> >>> > > >> Is this correct?
>> > >> >>> > > >>
>> > >> >>> > > >> It might be nice to consider the wrapper message for
>> > >> >>> > > >> DoGet/DoPut
>> > >> >>> > > >> as
>> > >> >>> > > >> well, but at that point, I'd rather we be consistent with
>> > >> >>> > > >> all
>> > >> >>> > > >> of
>> > >> >>> > > >> them,
>> > >> >>> > > >> rather than have one of the three methods do its own
>> > >> >>> > > >> thing.
>> > >> >>> > > >>
>> > >> >>> > > >> Thanks,
>> > >> >>> > > >> David
>> > >> >>> > > >>
>> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> >>> > > >>> I think we could probably expose the oneof behavior
>> > >> >>> > > >>> without
>> > >> >>> > > >>> exposing
>> > >> >>> > the
>> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
>> > >> >>> > > >>> expose
>> > >> >>> > > >>> as
>> > >> >>> > > >>> two
>> > >> >>> > > >>> fields: type and data. Then users could use it for
>> > >> >>> > > >>> whatever
>> > >> >>> > > >>> but
>> > >> >>> > > >>> if
>> > >> >>> > > >>> people
>> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a
>> > >> >>> > > >>> user
>> > >> >>> > > >>> could
>> > >> >>> > > >>> use
>> > >> >>> > > >>> any
>> > >> >>> > > >>> with it easily but they could also use any other
>> > >> >>> > > >>> mechanism).
>> > >> >>> > > >>> At
>> > >> >>> > least in
>> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
>> > >> language
>> > >> >>> > > >>> bindings
>> > >> >>> > > >>> less diy?
>> > >> >>> > > >>>
>> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>> > >> >>> > > >>> metadata
>> > >> >>> > > >>> but
>> > >> >>> > > >>> it
>> > >> >>> > just
>> > >> >>> > > >>> seemed a bit janky.
>> > >> >>> > > >>>
>> > >> >>> > > >>> Thinking about the control message/wrapper object thing,
>> > >> >>> > > >>> I
>> > >> >>> > > >>> wonder
>> > >> >>> > > >>> if
>> > >> >>> > we
>> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property
>> > >> >>> > > >>> if
>> > >> >>> > > >>> we
>> > >> >>> > think it
>> > >> >>> > > >>> is
>> > >> >>> > > >>> a good idea...
>> > >> >>> > > >>>
>> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> > >> li.davidm96@gmail.com>
>> > >> >>> > wrote:
>> > >> >>> > > >>>
>> > >> >>> > > >>>> I was definitely considering having control messages
>> > without
>> > >> >>> > > >>>> data,
>> > >> >>> > and
>> > >> >>> > > >>>> I thought that could be encoded by a FlightData with
>> > >> >>> > > >>>> only
>> > >> >>> > app_metadata
>> > >> >>> > > >>>> set. I think I understand your position now: FlightData
>> > >> >>> > > >>>> should
>> > >> >>> > always
>> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> That makes sense to me, and is consistent with the
>> > >> >>> > > >>>> documentation
>> > >> >>> > > >>>> on
>> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about
>> > >> >>> > > >>>> having
>> > >> >>> > > >>>> a
>> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
>> > >> >>> > > >>>> happening,
>> > >> >>> > and
>> > >> >>> > > >>>> overall having a clear separation between data and
>> > >> >>> > > >>>> control
>> > >> >>> > > >>>> messages
>> > >> >>> > is
>> > >> >>> > > >>>> cleaner.
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained
>> > >> >>> > > >>>> from
>> > >> >>> > > >>>> exposing
>> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that
>> > >> >>> > > >>>> now?
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> Best,
>> > >> >>> > > >>>> David
>> > >> >>> > > >>>>
>> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > >> >>> > > >>>>> Hey David,
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use
>> > >> >>> > > >>>>> for
>> > >> >>> > > >>>>> doget/doput
>> > >> >>> > > >>>>> for
>> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
>> > >> >>> > > >>>>> always
>> > >> >>> > pattern.
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
>> > >> >>> > > >>>>> overloading
>> > >> >>> > > >>>>> the
>> > >> >>> > > >>>> message
>> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
>> > >> >>> > > >>>>> message
>> > >> >>> > > >>>> independently
>> > >> >>> > > >>>>> of the data message, I would have to define something
>> > >> >>> > > >>>>> like
>> > >> >>> > > >>>>> an
>> > >> >>> > > >>>>> empty
>> > >> >>> > > >>>> flight
>> > >> >>> > > >>>>> data message that has custom metadata. Why not support
>> > >> >>> > > >>>>> a
>> > >> >>> > > >>>>> container
>> > >> >>> > > >>>>> object
>> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users
>> > >> >>> > > >>>>> can
>> > >> >>> > > >>>>> add
>> > >> >>> > > >>>>> more
>> > >> >>> > data
>> > >> >>> > > >>>>> as
>> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
>> > >> >>> > > >>>>> messages.
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > >> >>> > > >>>>> <li...@gmail.com>
>> > >> >>> > > >>>>> wrote:
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>>> Hi Jacques,
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> Thanks for the comments.
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result
>> > >> >>> > > >>>>>> of
>> > >> prior
>> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
>> > carry
>> > >> >>> > > >>>>>> that
>> > >> >>> > kind
>> > >> >>> > > >>>>>> of information.
>> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
>> > >> >>> > > >>>>>> incoming
>> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
>> > >> >>> > > >>>>>> quite
>> > >> >>> > natural
>> > >> >>> > > >>>>>> to
>> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
>> > >> >>> > > >>>>>> possible
>> > >> >>> > > >>>>>> in
>> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
>> > essence,
>> > >> >>> > gRPC-Java
>> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There
>> > >> >>> > > >>>>>> are
>> > >> >>> > experimental
>> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to
>> > >> >>> > > >>>>>> Java,
>> > >> >>> > > >>>>>> but
>> > >> >>> > > >>>>>> those
>> > >> >>> > > >>>>>> are
>> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
>> > >> >>> > > >>>>>> under
>> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which
>> > >> >>> > > >>>>>> have
>> > >> been
>> > >> >>> > around
>> > >> >>> > > >>>>>> for quite a while).
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> Thanks,
>> > >> >>> > > >>>>>> David
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org>
>> > >> >>> > > >>>>>> wrote:
>> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might
>> > >> >>> > > >>>>>>> worth
>> > >> >>> > > >>>>>>> discussion
>> > >> >>> > > >>>>>>> here
>> > >> >>> > > >>>>>>> depending on your thoughts.
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > >> >>> > > >>>>>>> <li...@gmail.com>
>> > >> >>> > > >>>> wrote:
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>>> Hey Ryan,
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Thanks for the comments.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
>> > >> >>> > > >>>>>>>> Python
>> > >> >>> > strawman.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> > >> >>> > > >>>>>>>> interleave
>> > >> >>> > > >>>> uploads
>> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> > >> >>> > > >>>>>>>> synchronous
>> > >> >>> > APIs
>> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and
>> > >> >>> > > >>>>>>>> server
>> > >> >>> > > >>>>>>>> wait
>> > >> >>> > > >>>>>>>> for
>> > >> >>> > each
>> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>> > >> >>> > > >>>>>>>> doesn't
>> > >> >>> > > >>>>>>>> give
>> > >> >>> > > >>>>>>>> us
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>> > >> >>> > > >>>>>>>> timeout.)
>> > >> >>> > > >>>>>>>> As
>> > >> >>> > an
>> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> > >> >>> > > >>>>>>>> ARROW-6063:
>> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we
>> > >> >>> > > >>>>>>>> will
>> > >> >>> > > >>>>>>>> want
>> > >> >>> > > >>>>>>>> to
>> > >> >>> > design
>> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> > bidirectional
>> > >> >>> > > >>>>>>>> stream
>> > >> >>> > > >>>>>>>> like
>> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls
>> > >> >>> > > >>>>>>>> easier
>> > >> >>> > > >>>>>>>> to
>> > >> >>> > > >>>>>>>> run
>> > >> >>> > into.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> > >> >>> > > >>>>>>>> proposal,
>> > but
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > main
>> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
>> > >> >>> > > >>>>>>>> separate
>> > >> >>> > > >>>>>>>> calls
>> > >> >>> > > >>>>>>>> could
>> > >> >>> > > >>>>>>>> get routed to different instances. Additionally,
>> > >> >>> > > >>>>>>>> gRPC
>> > >> >>> > > >>>>>>>> has
>> > >> >>> > > >>>>>>>> some
>> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
>> > >> >>> > > >>>>>>>> between
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > two
>> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
>> > instance
>> > >> >>> > available,
>> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
>> > >> without
>> > >> >>> > > >>>>>>>> warning.
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> Thanks,
>> > >> >>> > > >>>>>>>> David
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> > >> >>> > > >>>>>>>>> Hey David,
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like
>> > >> >>> > > >>>>>>>>> it
>> > >> >>> > > >>>>>>>>> and
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> possibility
>> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing
>> > >> >>> > > >>>>>>>>> that
>> > >> >>> > > >>>>>>>>> would
>> > >> >>> > > >>>>>>>>> help
>> > >> >>> > me
>> > >> >>> > > >>>>>> would
>> > >> >>> > > >>>>>>>> be
>> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use
>> > >> >>> > > >>>>>>>>> case.
>> > >> >>> > > >>>>>>>>> Also,
>> > >> >>> > what
>> > >> >>> > > >>>>>> would
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc?
>> > >> >>> > > >>>>>>>>> Would
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> client
>> > >> >>> > > >>>>>>>>> block
>> > >> >>> > > >>>>>>>> till
>> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> > >> flight.vector_mult(a,
>> > >> >>> > > >>>>>>>>> b)
>> > >> >>> > or
>> > >> >>> > > >>>>>>>>> would
>> > >> >>> > > >>>>>>>> the
>> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was
>> > >> >>> > > >>>>>>>>> done.
>> > >> >>> > > >>>>>>>>> If
>> > >> >>> > > >>>>>>>>> the
>> > >> >>> > > >>>>>>>>> later
>> > >> >>> > > >>>>>>>>> how
>> > >> >>> > > >>>>>>>>> is
>> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose
>> > >> >>> > > >>>>>>>>> that
>> > >> >>> > > >>>>>>>>> this
>> > >> >>> > could
>> > >> >>> > > >>>> be
>> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface
>> > >> >>> > > >>>>>>>>> but
>> > >> rather
>> > >> >>> > > >>>>>>>>> by a
>> > >> >>> > > >>>>>>>>> function/util?
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Best,
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Ryan
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > >> >>> > li.davidm96@gmail.com>
>> > >> >>> > > >>>>>> wrote:
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Hi all,
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so
>> > >> >>> > > >>>>>>>>>> far,
>> > but
>> > >> we
>> > >> >>> > > >>>>>>>>>> have
>> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being
>> > >> >>> > > >>>>>>>>>> able
>> > >> >>> > > >>>>>>>>>> to
>> > >> >>> > > >>>>>>>>>> both
>> > >> >>> > > >>>>>>>>>> send
>> > >> >>> > > >>>>>>>>>> and
>> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To
>> > >> >>> > > >>>>>>>>>> that
>> > >> >>> > > >>>>>>>>>> end,
>> > >> >>> > I've
>> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>
>> > >> >>> >
>> > >>
>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment
>> > >> >>> > > >>>>>>>>>> on
>> > the
>> > >> >>> > document.
>> > >> >>> > > >>>>>>>>>> I'd
>> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>> > >> >>> > > >>>>>>>>>> relatively
>> > >> >>> > > >>>>>>>>>> straightforward
>> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
>> > I've
>> > >> >>> > > >>>>>>>>>> decided
>> > >> >>> > > >>>>>>>>>> to
>> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
>> > >> >>> > > >>>>>>>>>> DoPut),
>> > >> >>> > > >>>>>>>>>> as
>> > >> >>> > > >>>>>>>>>> it
>> > >> >>> > > >>>>>> doesn't
>> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> > >> >>> > > >>>>>>>>>> interpretation
>> > of
>> > >> >>> > > >>>>>>>>>> the
>> > >> >>> > > >>>>>>>>>> semantics.
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>> Thanks,
>> > >> >>> > > >>>>>>>>>> David
>> > >> >>> > > >>>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> --
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> > >> >>> > > >>>>>>>>> Check out our GitHub
>> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> > >> join
>> > >> >>> > > >>>>>>>>> our
>> > >> >>> > > >>>>>>>>> community
>> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
>> > Dremio
>> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > >> >>> > > >>>>>>>>>
>> > >> >>> > > >>>>>>>>
>> > >> >>> > > >>>>>>>
>> > >> >>> > > >>>>>>
>> > >> >>> > > >>>>>
>> > >> >>> > > >>>>
>> > >> >>> > > >>>
>> > >> >>> > > >
>> > >> >>> >
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
I would generally agree with this. Note that you have the possibility
to use unions-of-structs to send record batches with different schemas
in the same stream, though with some added complexity on each side

On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org> wrote:
>
> I'd vote for explicitly not supported. We should keep our primitives narrow.
>
> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:
>
> > Thanks for the feedback.
> >
> > I do think if we had explicitly embraced gRPC from the beginning,
> > there are a lot of places where things could be made more ergonomic,
> > including with the metadata fields. But it would also have locked out
> > us of potential future transports.
> >
> > On another note: I hesitate to put too much into this method, but we
> > are looking at use cases where potentially, a client may want to
> > upload multiple distinct datasets (with differing schemas). (This is a
> > little tentative, and I can get more details...) Right now, each
> > logical stream in Flight must have a single, consistent schema; would
> > it make sense to look at ways to relax this, or declare this
> > explicitly out of scope (and require multiple calls and coordination
> > with the deployment topology) in order to accomplish this?
> >
> > Best,
> > David
> >
> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > > Fair enough. I'm okay with the bytes approach and the proposal looks good
> > > to me.
> > >
> > > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
> > >
> > >> I've updated the proposal.
> > >>
> > >> On the subject of Protobuf Any vs bytes, and how to handle
> > >> errors/metadata, I still think using bytes is preferable:
> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> > types,
> > >> - We wouldn't be able to practically expose the Protobuf field to C++
> > >> users without causing build pains,
> > >> - We can't let Python users take advantage of the Protobuf field
> > >> without somehow being compatible with the Protobuf wheels (by linking
> > >> to the same version, and doing magic to turn the C++ Protobufs into
> > >> the Python ones),
> > >> - All our other application-defined fields are already bytes.
> > >>
> > >> Applications that want structure can encode JSON or Protobuf Any into
> > >> the bytes field themselves, much as you can already do for Ticket,
> > >> commands in FlightDescriptors, and application metadata in
> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> > >> Any directly, since Any itself is a bytes field with a tag, and must
> > >> invoke the Protobuf deserializer again to read the actual message.
> > >>
> > >> If we decide on using bytes, then I don't think it makes sense to
> > >> define a new message with a oneof either, since it would be redundant.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> > >> > I've been extremely backlogged, I will update the proposal when I get
> > >> > a chance and reply here when done.
> > >> >
> > >> > Best,
> > >> > David
> > >> >
> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> > >> >> Bumping this discussion since a couple of weeks have passed. It seems
> > >> >> there are still some questions here, could we summarize what are the
> > >> >> alternatives along with any public API implications so we can try to
> > >> >> render a decision?
> > >> >>
> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> Hi Wes,
> > >> >>>
> > >> >>> Responses inline:
> > >> >>>
> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> > wrote:
> > >> >>>
> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> > >> >>> > wrote:
> > >> >>> > >
> > >> >>> > > The question is whether to repurpose the existing FlightData
> > >> >>> > > structure, and allow for the metadata field to be filled in and
> > >> data
> > >> >>> > > fields to be blank (as a control message), or to wrap the
> > >> FlightData
> > >> >>> > > structure in another structure that explicitly distinguishes
> > >> between
> > >> >>> > > control and data messages.
> > >> >>> >
> > >> >>> > I'm not super against having metadata-only FlightData with empty
> > >> body.
> > >> >>> > One question to consider is what changes (if any) would need to be
> > >> >>> > made to public APIs in either scenario.
> > >> >>> >
> > >> >>>
> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> > >> >>> messages
> > >> >>> in
> > >> >>> the future. This would be a breaking change, but wouldn't change the
> > >> >>> wire
> > >> >>> format. I think the APIs could be changed backwards compatibly,
> > >> >>> though.
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> > > The other question is how to handle the metadata fields. So far,
> > >> >>> > > we've
> > >> >>> > > used bytestring fields for application-defined data. This is
> > >> >>> > > workable
> > >> >>> > > if you want to use Protobuf to define the contents of those
> > >> >>> > > fields,
> > >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> > >> >>> > > bytestring
> > >> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> > >> >>> > > typed
> > >> >>> > > field, this would be more convenient, but then we'd be exposing
> > >> >>> > > Protobuf types. We could alternatively use a combination of a
> > >> >>> > > type
> > >> >>> > > field and a bytestring field, mimicking what the Protobuf Any
> > >> >>> > > type
> > >> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> > >> any
> > >> >>> > > of the language APIs, though.
> > >> >>> >
> > >> >>> > Leaving the deserialization of the app metadata to the particular
> > >> >>> > Flight implementation seems on first principles like the most
> > >> flexible
> > >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> > >> >>> > protobuf?
> > >> >>> >
> > >> >>>
> > >> >>>
> > >> >>> If Any is used, we could still expose a bytes-based API, but it
> > would
> > >> >>> have
> > >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> > >> >>> question
> > >> >>> would just be how to expose this (would be easier in Java, harder in
> > >> >>> C++).
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> > > David
> > >> >>> > >
> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> > >> >>> > > >
> > >> >>> > > > Can one of you explain what is being proposed in non-protobuf
> > >> >>> > > > terms?
> > >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> > >> >>> > > >
> > >> >>> > > > Regards
> > >> >>> > > >
> > >> >>> > > > Antoine.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> > just
> > >> be
> > >> >>> > > >> application-level logic. (The official guide doesn't even
> > >> mention
> > >> >>> > > >> it
> > >> >>> > > >> in the encoding docs; I found
> > >> >>> > > >>
> > >> >>> >
> > >>
> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> >>> > > >> as well.)
> > >> >>> > > >>
> > >> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> > >> >>> > > >> inlining
> > >> >>> > > >> the definition of Any, e.g.
> > >> >>> > > >>
> > >> >>> > > >> message FlightMessage {
> > >> >>> > > >>   oneof message {
> > >> >>> > > >>     FlightData data = 1;
> > >> >>> > > >>     FlightAny metadata = 2;
> > >> >>> > > >>   }
> > >> >>> > > >> }
> > >> >>> > > >>
> > >> >>> > > >> message FlightAny {
> > >> >>> > > >>   string type = 1;
> > >> >>> > > >>   bytes data = 2;
> > >> >>> > > >> }
> > >> >>> > > >>
> > >> >>> > > >> Is this correct?
> > >> >>> > > >>
> > >> >>> > > >> It might be nice to consider the wrapper message for
> > >> >>> > > >> DoGet/DoPut
> > >> >>> > > >> as
> > >> >>> > > >> well, but at that point, I'd rather we be consistent with all
> > >> >>> > > >> of
> > >> >>> > > >> them,
> > >> >>> > > >> rather than have one of the three methods do its own thing.
> > >> >>> > > >>
> > >> >>> > > >> Thanks,
> > >> >>> > > >> David
> > >> >>> > > >>
> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>> I think we could probably expose the oneof behavior without
> > >> >>> > > >>> exposing
> > >> >>> > the
> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> > >> >>> > > >>> expose
> > >> >>> > > >>> as
> > >> >>> > > >>> two
> > >> >>> > > >>> fields: type and data. Then users could use it for whatever
> > >> >>> > > >>> but
> > >> >>> > > >>> if
> > >> >>> > > >>> people
> > >> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> > >> >>> > > >>> could
> > >> >>> > > >>> use
> > >> >>> > > >>> any
> > >> >>> > > >>> with it easily but they could also use any other mechanism).
> > >> >>> > > >>> At
> > >> >>> > least in
> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> > >> language
> > >> >>> > > >>> bindings
> > >> >>> > > >>> less diy?
> > >> >>> > > >>>
> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
> > >> >>> > > >>> but
> > >> >>> > > >>> it
> > >> >>> > just
> > >> >>> > > >>> seemed a bit janky.
> > >> >>> > > >>>
> > >> >>> > > >>> Thinking about the control message/wrapper object thing, I
> > >> >>> > > >>> wonder
> > >> >>> > > >>> if
> > >> >>> > we
> > >> >>> > > >>> should redefine DoPut and DoGet to have the same property if
> > >> >>> > > >>> we
> > >> >>> > think it
> > >> >>> > > >>> is
> > >> >>> > > >>> a good idea...
> > >> >>> > > >>>
> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> > >> li.davidm96@gmail.com>
> > >> >>> > wrote:
> > >> >>> > > >>>
> > >> >>> > > >>>> I was definitely considering having control messages
> > without
> > >> >>> > > >>>> data,
> > >> >>> > and
> > >> >>> > > >>>> I thought that could be encoded by a FlightData with only
> > >> >>> > app_metadata
> > >> >>> > > >>>> set. I think I understand your position now: FlightData
> > >> >>> > > >>>> should
> > >> >>> > always
> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> > >> >>> > > >>>>
> > >> >>> > > >>>> That makes sense to me, and is consistent with the
> > >> >>> > > >>>> documentation
> > >> >>> > > >>>> on
> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
> > >> >>> > > >>>> a
> > >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> > >> >>> > > >>>> happening,
> > >> >>> > and
> > >> >>> > > >>>> overall having a clear separation between data and control
> > >> >>> > > >>>> messages
> > >> >>> > is
> > >> >>> > > >>>> cleaner.
> > >> >>> > > >>>>
> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> > >> >>> > > >>>> exposing
> > >> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> > >> >>> > > >>>>
> > >> >>> > > >>>> Best,
> > >> >>> > > >>>> David
> > >> >>> > > >>>>
> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>>>> Hey David,
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> > >> >>> > > >>>>> doget/doput
> > >> >>> > > >>>>> for
> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> > >> >>> > > >>>>> always
> > >> >>> > pattern.
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> On the comment around the FlightData, I think it is
> > >> >>> > > >>>>> overloading
> > >> >>> > > >>>>> the
> > >> >>> > > >>>> message
> > >> >>> > > >>>>> to use metadata for this. If I want to send a control
> > >> >>> > > >>>>> message
> > >> >>> > > >>>> independently
> > >> >>> > > >>>>> of the data message, I would have to define something like
> > >> >>> > > >>>>> an
> > >> >>> > > >>>>> empty
> > >> >>> > > >>>> flight
> > >> >>> > > >>>>> data message that has custom metadata. Why not support a
> > >> >>> > > >>>>> container
> > >> >>> > > >>>>> object
> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
> > >> >>> > > >>>>> add
> > >> >>> > > >>>>> more
> > >> >>> > data
> > >> >>> > > >>>>> as
> > >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> > >> >>> > > >>>>> messages.
> > >> >>> > > >>>>>
> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> > >> >>> > > >>>>> <li...@gmail.com>
> > >> >>> > > >>>>> wrote:
> > >> >>> > > >>>>>
> > >> >>> > > >>>>>> Hi Jacques,
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> Thanks for the comments.
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> > >> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> > >> prior
> > >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> > carry
> > >> >>> > > >>>>>> that
> > >> >>> > kind
> > >> >>> > > >>>>>> of information.
> > >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> > >> >>> > > >>>>>> incoming
> > >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> > >> >>> > > >>>>>> quite
> > >> >>> > natural
> > >> >>> > > >>>>>> to
> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> > >> >>> > > >>>>>> possible
> > >> >>> > > >>>>>> in
> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> > essence,
> > >> >>> > gRPC-Java
> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> > >> >>> > experimental
> > >> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
> > >> >>> > > >>>>>> but
> > >> >>> > > >>>>>> those
> > >> >>> > > >>>>>> are
> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> > >> >>> > > >>>>>> under
> > >> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> > >> been
> > >> >>> > around
> > >> >>> > > >>>>>> for quite a while).
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> Thanks,
> > >> >>> > > >>>>>> David
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> > >> >>> > > >>>>>>> discussion
> > >> >>> > > >>>>>>> here
> > >> >>> > > >>>>>>> depending on your thoughts.
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> > >> >>> > > >>>>>>> <li...@gmail.com>
> > >> >>> > > >>>> wrote:
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>>> Hey Ryan,
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Thanks for the comments.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> > >> >>> > > >>>>>>>> Python
> > >> >>> > strawman.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> > >> >>> > > >>>>>>>> interleave
> > >> >>> > > >>>> uploads
> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> > >> >>> > > >>>>>>>> synchronous
> > >> >>> > APIs
> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
> > >> >>> > > >>>>>>>> wait
> > >> >>> > > >>>>>>>> for
> > >> >>> > each
> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
> > >> >>> > > >>>>>>>> give
> > >> >>> > > >>>>>>>> us
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> > >> >>> > > >>>>>>>> timeout.)
> > >> >>> > > >>>>>>>> As
> > >> >>> > an
> > >> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
> > >> >>> > > >>>>>>>> want
> > >> >>> > > >>>>>>>> to
> > >> >>> > design
> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> > bidirectional
> > >> >>> > > >>>>>>>> stream
> > >> >>> > > >>>>>>>> like
> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
> > >> >>> > > >>>>>>>> to
> > >> >>> > > >>>>>>>> run
> > >> >>> > into.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal,
> > but
> > >> >>> > > >>>>>>>> the
> > >> >>> > main
> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> > >> >>> > > >>>>>>>> separate
> > >> >>> > > >>>>>>>> calls
> > >> >>> > > >>>>>>>> could
> > >> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
> > >> >>> > > >>>>>>>> has
> > >> >>> > > >>>>>>>> some
> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> > >> >>> > > >>>>>>>> between
> > >> >>> > > >>>>>>>> the
> > >> >>> > two
> > >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> > instance
> > >> >>> > available,
> > >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> > >> without
> > >> >>> > > >>>>>>>> warning.
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> Thanks,
> > >> >>> > > >>>>>>>> David
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> > >> >>> > > >>>>>>>>> Hey David,
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
> > >> >>> > > >>>>>>>>> and
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> possibility
> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
> > >> >>> > > >>>>>>>>> would
> > >> >>> > > >>>>>>>>> help
> > >> >>> > me
> > >> >>> > > >>>>>> would
> > >> >>> > > >>>>>>>> be
> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> > >> >>> > > >>>>>>>>> Also,
> > >> >>> > what
> > >> >>> > > >>>>>> would
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> client
> > >> >>> > > >>>>>>>>> block
> > >> >>> > > >>>>>>>> till
> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> > >> flight.vector_mult(a,
> > >> >>> > > >>>>>>>>> b)
> > >> >>> > or
> > >> >>> > > >>>>>>>>> would
> > >> >>> > > >>>>>>>> the
> > >> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
> > >> >>> > > >>>>>>>>> If
> > >> >>> > > >>>>>>>>> the
> > >> >>> > > >>>>>>>>> later
> > >> >>> > > >>>>>>>>> how
> > >> >>> > > >>>>>>>>> is
> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> > >> >>> > > >>>>>>>>> this
> > >> >>> > could
> > >> >>> > > >>>> be
> > >> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> > >> rather
> > >> >>> > > >>>>>>>>> by a
> > >> >>> > > >>>>>>>>> function/util?
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Best,
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Ryan
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> > >> >>> > li.davidm96@gmail.com>
> > >> >>> > > >>>>>> wrote:
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>> Hi all,
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far,
> > but
> > >> we
> > >> >>> > > >>>>>>>>>> have
> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
> > >> >>> > > >>>>>>>>>> to
> > >> >>> > > >>>>>>>>>> both
> > >> >>> > > >>>>>>>>>> send
> > >> >>> > > >>>>>>>>>> and
> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> > >> >>> > > >>>>>>>>>> end,
> > >> >>> > I've
> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>
> > >> >>> >
> > >>
> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on
> > the
> > >> >>> > document.
> > >> >>> > > >>>>>>>>>> I'd
> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > >> >>> > > >>>>>>>>>> straightforward
> > >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> > I've
> > >> >>> > > >>>>>>>>>> decided
> > >> >>> > > >>>>>>>>>> to
> > >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> > >> >>> > > >>>>>>>>>> DoPut),
> > >> >>> > > >>>>>>>>>> as
> > >> >>> > > >>>>>>>>>> it
> > >> >>> > > >>>>>> doesn't
> > >> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation
> > of
> > >> >>> > > >>>>>>>>>> the
> > >> >>> > > >>>>>>>>>> semantics.
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>> Thanks,
> > >> >>> > > >>>>>>>>>> David
> > >> >>> > > >>>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> --
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> > >> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> > >> join
> > >> >>> > > >>>>>>>>> our
> > >> >>> > > >>>>>>>>> community
> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> > Dremio
> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> > >> >>> > > >>>>>>>>>
> > >> >>> > > >>>>>>>>
> > >> >>> > > >>>>>>>
> > >> >>> > > >>>>>>
> > >> >>> > > >>>>>
> > >> >>> > > >>>>
> > >> >>> > > >>>
> > >> >>> > > >
> > >> >>> >
> > >> >>
> > >> >
> > >>
> > >
> >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
I'd vote for explicitly not supported. We should keep our primitives narrow.

On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com> wrote:

> Thanks for the feedback.
>
> I do think if we had explicitly embraced gRPC from the beginning,
> there are a lot of places where things could be made more ergonomic,
> including with the metadata fields. But it would also have locked out
> us of potential future transports.
>
> On another note: I hesitate to put too much into this method, but we
> are looking at use cases where potentially, a client may want to
> upload multiple distinct datasets (with differing schemas). (This is a
> little tentative, and I can get more details...) Right now, each
> logical stream in Flight must have a single, consistent schema; would
> it make sense to look at ways to relax this, or declare this
> explicitly out of scope (and require multiple calls and coordination
> with the deployment topology) in order to accomplish this?
>
> Best,
> David
>
> On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> > Fair enough. I'm okay with the bytes approach and the proposal looks good
> > to me.
> >
> > On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
> >
> >> I've updated the proposal.
> >>
> >> On the subject of Protobuf Any vs bytes, and how to handle
> >> errors/metadata, I still think using bytes is preferable:
> >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> types,
> >> - We wouldn't be able to practically expose the Protobuf field to C++
> >> users without causing build pains,
> >> - We can't let Python users take advantage of the Protobuf field
> >> without somehow being compatible with the Protobuf wheels (by linking
> >> to the same version, and doing magic to turn the C++ Protobufs into
> >> the Python ones),
> >> - All our other application-defined fields are already bytes.
> >>
> >> Applications that want structure can encode JSON or Protobuf Any into
> >> the bytes field themselves, much as you can already do for Ticket,
> >> commands in FlightDescriptors, and application metadata in
> >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> Any directly, since Any itself is a bytes field with a tag, and must
> >> invoke the Protobuf deserializer again to read the actual message.
> >>
> >> If we decide on using bytes, then I don't think it makes sense to
> >> define a new message with a oneof either, since it would be redundant.
> >>
> >> Thanks,
> >> David
> >>
> >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> > I've been extremely backlogged, I will update the proposal when I get
> >> > a chance and reply here when done.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> >> Bumping this discussion since a couple of weeks have passed. It seems
> >> >> there are still some questions here, could we summarize what are the
> >> >> alternatives along with any public API implications so we can try to
> >> >> render a decision?
> >> >>
> >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Wes,
> >> >>>
> >> >>> Responses inline:
> >> >>>
> >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com>
> wrote:
> >> >>>
> >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> >> >>> > wrote:
> >> >>> > >
> >> >>> > > The question is whether to repurpose the existing FlightData
> >> >>> > > structure, and allow for the metadata field to be filled in and
> >> data
> >> >>> > > fields to be blank (as a control message), or to wrap the
> >> FlightData
> >> >>> > > structure in another structure that explicitly distinguishes
> >> between
> >> >>> > > control and data messages.
> >> >>> >
> >> >>> > I'm not super against having metadata-only FlightData with empty
> >> body.
> >> >>> > One question to consider is what changes (if any) would need to be
> >> >>> > made to public APIs in either scenario.
> >> >>> >
> >> >>>
> >> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
> >> >>> messages
> >> >>> in
> >> >>> the future. This would be a breaking change, but wouldn't change the
> >> >>> wire
> >> >>> format. I think the APIs could be changed backwards compatibly,
> >> >>> though.
> >> >>>
> >> >>>
> >> >>>
> >> >>> > > The other question is how to handle the metadata fields. So far,
> >> >>> > > we've
> >> >>> > > used bytestring fields for application-defined data. This is
> >> >>> > > workable
> >> >>> > > if you want to use Protobuf to define the contents of those
> >> >>> > > fields,
> >> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >> >>> > > bytestring
> >> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> >> >>> > > typed
> >> >>> > > field, this would be more convenient, but then we'd be exposing
> >> >>> > > Protobuf types. We could alternatively use a combination of a
> >> >>> > > type
> >> >>> > > field and a bytestring field, mimicking what the Protobuf Any
> >> >>> > > type
> >> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> >> any
> >> >>> > > of the language APIs, though.
> >> >>> >
> >> >>> > Leaving the deserialization of the app metadata to the particular
> >> >>> > Flight implementation seems on first principles like the most
> >> flexible
> >> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >> >>> > protobuf?
> >> >>> >
> >> >>>
> >> >>>
> >> >>> If Any is used, we could still expose a bytes-based API, but it
> would
> >> >>> have
> >> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >> >>> question
> >> >>> would just be how to expose this (would be easier in Java, harder in
> >> >>> C++).
> >> >>>
> >> >>>
> >> >>>
> >> >>> > > David
> >> >>> > >
> >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >> >>> > > >
> >> >>> > > > Can one of you explain what is being proposed in non-protobuf
> >> >>> > > > terms?
> >> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >> >>> > > >
> >> >>> > > > Regards
> >> >>> > > >
> >> >>> > > > Antoine.
> >> >>> > > >
> >> >>> > > >
> >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> >>> > > >> Oneof doesn't actually change the wire encoding; it would
> just
> >> be
> >> >>> > > >> application-level logic. (The official guide doesn't even
> >> mention
> >> >>> > > >> it
> >> >>> > > >> in the encoding docs; I found
> >> >>> > > >>
> >> >>> >
> >>
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> >>> > > >> as well.)
> >> >>> > > >>
> >> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> >> >>> > > >> inlining
> >> >>> > > >> the definition of Any, e.g.
> >> >>> > > >>
> >> >>> > > >> message FlightMessage {
> >> >>> > > >>   oneof message {
> >> >>> > > >>     FlightData data = 1;
> >> >>> > > >>     FlightAny metadata = 2;
> >> >>> > > >>   }
> >> >>> > > >> }
> >> >>> > > >>
> >> >>> > > >> message FlightAny {
> >> >>> > > >>   string type = 1;
> >> >>> > > >>   bytes data = 2;
> >> >>> > > >> }
> >> >>> > > >>
> >> >>> > > >> Is this correct?
> >> >>> > > >>
> >> >>> > > >> It might be nice to consider the wrapper message for
> >> >>> > > >> DoGet/DoPut
> >> >>> > > >> as
> >> >>> > > >> well, but at that point, I'd rather we be consistent with all
> >> >>> > > >> of
> >> >>> > > >> them,
> >> >>> > > >> rather than have one of the three methods do its own thing.
> >> >>> > > >>
> >> >>> > > >> Thanks,
> >> >>> > > >> David
> >> >>> > > >>
> >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>> I think we could probably expose the oneof behavior without
> >> >>> > > >>> exposing
> >> >>> > the
> >> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
> >> >>> > > >>> expose
> >> >>> > > >>> as
> >> >>> > > >>> two
> >> >>> > > >>> fields: type and data. Then users could use it for whatever
> >> >>> > > >>> but
> >> >>> > > >>> if
> >> >>> > > >>> people
> >> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> >> >>> > > >>> could
> >> >>> > > >>> use
> >> >>> > > >>> any
> >> >>> > > >>> with it easily but they could also use any other mechanism).
> >> >>> > > >>> At
> >> >>> > least in
> >> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> >> language
> >> >>> > > >>> bindings
> >> >>> > > >>> less diy?
> >> >>> > > >>>
> >> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
> >> >>> > > >>> but
> >> >>> > > >>> it
> >> >>> > just
> >> >>> > > >>> seemed a bit janky.
> >> >>> > > >>>
> >> >>> > > >>> Thinking about the control message/wrapper object thing, I
> >> >>> > > >>> wonder
> >> >>> > > >>> if
> >> >>> > we
> >> >>> > > >>> should redefine DoPut and DoGet to have the same property if
> >> >>> > > >>> we
> >> >>> > think it
> >> >>> > > >>> is
> >> >>> > > >>> a good idea...
> >> >>> > > >>>
> >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> li.davidm96@gmail.com>
> >> >>> > wrote:
> >> >>> > > >>>
> >> >>> > > >>>> I was definitely considering having control messages
> without
> >> >>> > > >>>> data,
> >> >>> > and
> >> >>> > > >>>> I thought that could be encoded by a FlightData with only
> >> >>> > app_metadata
> >> >>> > > >>>> set. I think I understand your position now: FlightData
> >> >>> > > >>>> should
> >> >>> > always
> >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> >>> > > >>>>
> >> >>> > > >>>> That makes sense to me, and is consistent with the
> >> >>> > > >>>> documentation
> >> >>> > > >>>> on
> >> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
> >> >>> > > >>>> a
> >> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >> >>> > > >>>> happening,
> >> >>> > and
> >> >>> > > >>>> overall having a clear separation between data and control
> >> >>> > > >>>> messages
> >> >>> > is
> >> >>> > > >>>> cleaner.
> >> >>> > > >>>>
> >> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> >> >>> > > >>>> exposing
> >> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> >> >>> > > >>>>
> >> >>> > > >>>> Best,
> >> >>> > > >>>> David
> >> >>> > > >>>>
> >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>>>> Hey David,
> >> >>> > > >>>>>
> >> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> >> >>> > > >>>>> doget/doput
> >> >>> > > >>>>> for
> >> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
> >> >>> > > >>>>> always
> >> >>> > pattern.
> >> >>> > > >>>>>
> >> >>> > > >>>>> On the comment around the FlightData, I think it is
> >> >>> > > >>>>> overloading
> >> >>> > > >>>>> the
> >> >>> > > >>>> message
> >> >>> > > >>>>> to use metadata for this. If I want to send a control
> >> >>> > > >>>>> message
> >> >>> > > >>>> independently
> >> >>> > > >>>>> of the data message, I would have to define something like
> >> >>> > > >>>>> an
> >> >>> > > >>>>> empty
> >> >>> > > >>>> flight
> >> >>> > > >>>>> data message that has custom metadata. Why not support a
> >> >>> > > >>>>> container
> >> >>> > > >>>>> object
> >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
> >> >>> > > >>>>> add
> >> >>> > > >>>>> more
> >> >>> > data
> >> >>> > > >>>>> as
> >> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >> >>> > > >>>>> messages.
> >> >>> > > >>>>>
> >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> >>> > > >>>>> <li...@gmail.com>
> >> >>> > > >>>>> wrote:
> >> >>> > > >>>>>
> >> >>> > > >>>>>> Hi Jacques,
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> Thanks for the comments.
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> >> prior
> >> >>> > > >>>>>> proposals, so I don't think we need a new message to
> carry
> >> >>> > > >>>>>> that
> >> >>> > kind
> >> >>> > > >>>>>> of information.
> >> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >> >>> > > >>>>>> incoming
> >> >>> > > >>>>>> messages as the fundamental API; it would actually be
> >> >>> > > >>>>>> quite
> >> >>> > natural
> >> >>> > > >>>>>> to
> >> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
> >> >>> > > >>>>>> possible
> >> >>> > > >>>>>> in
> >> >>> > > >>>>>> C++/Python without spawning a thread, though. (In
> essence,
> >> >>> > gRPC-Java
> >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> >> >>> > experimental
> >> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
> >> >>> > > >>>>>> but
> >> >>> > > >>>>>> those
> >> >>> > > >>>>>> are
> >> >>> > > >>>>>> only in relatively recent gRPC versions and are still
> >> >>> > > >>>>>> under
> >> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> >> been
> >> >>> > around
> >> >>> > > >>>>>> for quite a while).
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> Thanks,
> >> >>> > > >>>>>> David
> >> >>> > > >>>>>>
> >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> >> >>> > > >>>>>>> discussion
> >> >>> > > >>>>>>> here
> >> >>> > > >>>>>>> depending on your thoughts.
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> >>> > > >>>>>>> <li...@gmail.com>
> >> >>> > > >>>> wrote:
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>>> Hey Ryan,
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Thanks for the comments.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
> >> >>> > > >>>>>>>> Python
> >> >>> > strawman.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >> >>> > > >>>>>>>> interleave
> >> >>> > > >>>> uploads
> >> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >> >>> > > >>>>>>>> synchronous
> >> >>> > APIs
> >> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
> >> >>> > > >>>>>>>> wait
> >> >>> > > >>>>>>>> for
> >> >>> > each
> >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
> >> >>> > > >>>>>>>> give
> >> >>> > > >>>>>>>> us
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >> >>> > > >>>>>>>> timeout.)
> >> >>> > > >>>>>>>> As
> >> >>> > an
> >> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
> >> >>> > > >>>>>>>> want
> >> >>> > > >>>>>>>> to
> >> >>> > design
> >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> bidirectional
> >> >>> > > >>>>>>>> stream
> >> >>> > > >>>>>>>> like
> >> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
> >> >>> > > >>>>>>>> to
> >> >>> > > >>>>>>>> run
> >> >>> > into.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal,
> but
> >> >>> > > >>>>>>>> the
> >> >>> > main
> >> >>> > > >>>>>>>> concern is that depending on how you deploy, two
> >> >>> > > >>>>>>>> separate
> >> >>> > > >>>>>>>> calls
> >> >>> > > >>>>>>>> could
> >> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
> >> >>> > > >>>>>>>> has
> >> >>> > > >>>>>>>> some
> >> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
> >> >>> > > >>>>>>>> between
> >> >>> > > >>>>>>>> the
> >> >>> > two
> >> >>> > > >>>>>>>> calls, but it then restarts or there is another
> instance
> >> >>> > available,
> >> >>> > > >>>>>>>> the client will happily reconnect to the new server
> >> without
> >> >>> > > >>>>>>>> warning.
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> Thanks,
> >> >>> > > >>>>>>>> David
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >> >>> > > >>>>>>>>> Hey David,
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
> >> >>> > > >>>>>>>>> and
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> possibility
> >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
> >> >>> > > >>>>>>>>> would
> >> >>> > > >>>>>>>>> help
> >> >>> > me
> >> >>> > > >>>>>> would
> >> >>> > > >>>>>>>> be
> >> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> >> >>> > > >>>>>>>>> Also,
> >> >>> > what
> >> >>> > > >>>>>> would
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> client
> >> >>> > > >>>>>>>>> block
> >> >>> > > >>>>>>>> till
> >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> flight.vector_mult(a,
> >> >>> > > >>>>>>>>> b)
> >> >>> > or
> >> >>> > > >>>>>>>>> would
> >> >>> > > >>>>>>>> the
> >> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
> >> >>> > > >>>>>>>>> If
> >> >>> > > >>>>>>>>> the
> >> >>> > > >>>>>>>>> later
> >> >>> > > >>>>>>>>> how
> >> >>> > > >>>>>>>>> is
> >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> >> >>> > > >>>>>>>>> this
> >> >>> > could
> >> >>> > > >>>> be
> >> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> >> rather
> >> >>> > > >>>>>>>>> by a
> >> >>> > > >>>>>>>>> function/util?
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Best,
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Ryan
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >> >>> > li.davidm96@gmail.com>
> >> >>> > > >>>>>> wrote:
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>> Hi all,
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far,
> but
> >> we
> >> >>> > > >>>>>>>>>> have
> >> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
> >> >>> > > >>>>>>>>>> to
> >> >>> > > >>>>>>>>>> both
> >> >>> > > >>>>>>>>>> send
> >> >>> > > >>>>>>>>>> and
> >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> >> >>> > > >>>>>>>>>> end,
> >> >>> > I've
> >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>
> >> >>> > > >>>>
> >> >>> >
> >>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on
> the
> >> >>> > document.
> >> >>> > > >>>>>>>>>> I'd
> >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >> >>> > > >>>>>>>>>> straightforward
> >> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> This is a format change and would require a vote.
> I've
> >> >>> > > >>>>>>>>>> decided
> >> >>> > > >>>>>>>>>> to
> >> >>> > > >>>>>>>>>> table the other format change I had proposed (on
> >> >>> > > >>>>>>>>>> DoPut),
> >> >>> > > >>>>>>>>>> as
> >> >>> > > >>>>>>>>>> it
> >> >>> > > >>>>>> doesn't
> >> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation
> of
> >> >>> > > >>>>>>>>>> the
> >> >>> > > >>>>>>>>>> semantics.
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>> Thanks,
> >> >>> > > >>>>>>>>>> David
> >> >>> > > >>>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> --
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> >> join
> >> >>> > > >>>>>>>>> our
> >> >>> > > >>>>>>>>> community
> >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download
> Dremio
> >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> >>> > > >>>>>>>>>
> >> >>> > > >>>>>>>>
> >> >>> > > >>>>>>>
> >> >>> > > >>>>>>
> >> >>> > > >>>>>
> >> >>> > > >>>>
> >> >>> > > >>>
> >> >>> > > >
> >> >>> >
> >> >>
> >> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Thanks for the feedback.

I do think if we had explicitly embraced gRPC from the beginning,
there are a lot of places where things could be made more ergonomic,
including with the metadata fields. But it would also have locked out
us of potential future transports.

On another note: I hesitate to put too much into this method, but we
are looking at use cases where potentially, a client may want to
upload multiple distinct datasets (with differing schemas). (This is a
little tentative, and I can get more details...) Right now, each
logical stream in Flight must have a single, consistent schema; would
it make sense to look at ways to relax this, or declare this
explicitly out of scope (and require multiple calls and coordination
with the deployment topology) in order to accomplish this?

Best,
David

On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> Fair enough. I'm okay with the bytes approach and the proposal looks good
> to me.
>
> On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:
>
>> I've updated the proposal.
>>
>> On the subject of Protobuf Any vs bytes, and how to handle
>> errors/metadata, I still think using bytes is preferable:
>> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
>> - We wouldn't be able to practically expose the Protobuf field to C++
>> users without causing build pains,
>> - We can't let Python users take advantage of the Protobuf field
>> without somehow being compatible with the Protobuf wheels (by linking
>> to the same version, and doing magic to turn the C++ Protobufs into
>> the Python ones),
>> - All our other application-defined fields are already bytes.
>>
>> Applications that want structure can encode JSON or Protobuf Any into
>> the bytes field themselves, much as you can already do for Ticket,
>> commands in FlightDescriptors, and application metadata in
>> DoGet/DoPut. I don't think this is (much) less efficient than using
>> Any directly, since Any itself is a bytes field with a tag, and must
>> invoke the Protobuf deserializer again to read the actual message.
>>
>> If we decide on using bytes, then I don't think it makes sense to
>> define a new message with a oneof either, since it would be redundant.
>>
>> Thanks,
>> David
>>
>> On 11/7/19, David Li <li...@gmail.com> wrote:
>> > I've been extremely backlogged, I will update the proposal when I get
>> > a chance and reply here when done.
>> >
>> > Best,
>> > David
>> >
>> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> >> Bumping this discussion since a couple of weeks have passed. It seems
>> >> there are still some questions here, could we summarize what are the
>> >> alternatives along with any public API implications so we can try to
>> >> render a decision?
>> >>
>> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Wes,
>> >>>
>> >>> Responses inline:
>> >>>
>> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>> >>>
>> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
>> >>> > wrote:
>> >>> > >
>> >>> > > The question is whether to repurpose the existing FlightData
>> >>> > > structure, and allow for the metadata field to be filled in and
>> data
>> >>> > > fields to be blank (as a control message), or to wrap the
>> FlightData
>> >>> > > structure in another structure that explicitly distinguishes
>> between
>> >>> > > control and data messages.
>> >>> >
>> >>> > I'm not super against having metadata-only FlightData with empty
>> body.
>> >>> > One question to consider is what changes (if any) would need to be
>> >>> > made to public APIs in either scenario.
>> >>> >
>> >>>
>> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
>> >>> messages
>> >>> in
>> >>> the future. This would be a breaking change, but wouldn't change the
>> >>> wire
>> >>> format. I think the APIs could be changed backwards compatibly,
>> >>> though.
>> >>>
>> >>>
>> >>>
>> >>> > > The other question is how to handle the metadata fields. So far,
>> >>> > > we've
>> >>> > > used bytestring fields for application-defined data. This is
>> >>> > > workable
>> >>> > > if you want to use Protobuf to define the contents of those
>> >>> > > fields,
>> >>> > > but requires you to pack/unpack your Protobuf into/from the
>> >>> > > bytestring
>> >>> > > field. If we instead used the Protobuf Any field, a dynamically
>> >>> > > typed
>> >>> > > field, this would be more convenient, but then we'd be exposing
>> >>> > > Protobuf types. We could alternatively use a combination of a
>> >>> > > type
>> >>> > > field and a bytestring field, mimicking what the Protobuf Any
>> >>> > > type
>> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
>> any
>> >>> > > of the language APIs, though.
>> >>> >
>> >>> > Leaving the deserialization of the app metadata to the particular
>> >>> > Flight implementation seems on first principles like the most
>> flexible
>> >>> > thing, if Any is used, does that mean the metadata _must_ be a
>> >>> > protobuf?
>> >>> >
>> >>>
>> >>>
>> >>> If Any is used, we could still expose a bytes-based API, but it would
>> >>> have
>> >>> some more wrapping. (We could put a ByteString in Any.) Then the
>> >>> question
>> >>> would just be how to expose this (would be easier in Java, harder in
>> >>> C++).
>> >>>
>> >>>
>> >>>
>> >>> > > David
>> >>> > >
>> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> >>> > > >
>> >>> > > > Can one of you explain what is being proposed in non-protobuf
>> >>> > > > terms?
>> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> >>> > > >
>> >>> > > > Regards
>> >>> > > >
>> >>> > > > Antoine.
>> >>> > > >
>> >>> > > >
>> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
>> be
>> >>> > > >> application-level logic. (The official guide doesn't even
>> mention
>> >>> > > >> it
>> >>> > > >> in the encoding docs; I found
>> >>> > > >>
>> >>> >
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > > >> as well.)
>> >>> > > >>
>> >>> > > >> If I follow you, Jacques, then you are proposing essentially
>> >>> > > >> inlining
>> >>> > > >> the definition of Any, e.g.
>> >>> > > >>
>> >>> > > >> message FlightMessage {
>> >>> > > >>   oneof message {
>> >>> > > >>     FlightData data = 1;
>> >>> > > >>     FlightAny metadata = 2;
>> >>> > > >>   }
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> message FlightAny {
>> >>> > > >>   string type = 1;
>> >>> > > >>   bytes data = 2;
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> Is this correct?
>> >>> > > >>
>> >>> > > >> It might be nice to consider the wrapper message for
>> >>> > > >> DoGet/DoPut
>> >>> > > >> as
>> >>> > > >> well, but at that point, I'd rather we be consistent with all
>> >>> > > >> of
>> >>> > > >> them,
>> >>> > > >> rather than have one of the three methods do its own thing.
>> >>> > > >>
>> >>> > > >> Thanks,
>> >>> > > >> David
>> >>> > > >>
>> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>> I think we could probably expose the oneof behavior without
>> >>> > > >>> exposing
>> >>> > the
>> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
>> >>> > > >>> expose
>> >>> > > >>> as
>> >>> > > >>> two
>> >>> > > >>> fields: type and data. Then users could use it for whatever
>> >>> > > >>> but
>> >>> > > >>> if
>> >>> > > >>> people
>> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
>> >>> > > >>> could
>> >>> > > >>> use
>> >>> > > >>> any
>> >>> > > >>> with it easily but they could also use any other mechanism).
>> >>> > > >>> At
>> >>> > least in
>> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
>> language
>> >>> > > >>> bindings
>> >>> > > >>> less diy?
>> >>> > > >>>
>> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
>> >>> > > >>> but
>> >>> > > >>> it
>> >>> > just
>> >>> > > >>> seemed a bit janky.
>> >>> > > >>>
>> >>> > > >>> Thinking about the control message/wrapper object thing, I
>> >>> > > >>> wonder
>> >>> > > >>> if
>> >>> > we
>> >>> > > >>> should redefine DoPut and DoGet to have the same property if
>> >>> > > >>> we
>> >>> > think it
>> >>> > > >>> is
>> >>> > > >>> a good idea...
>> >>> > > >>>
>> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> li.davidm96@gmail.com>
>> >>> > wrote:
>> >>> > > >>>
>> >>> > > >>>> I was definitely considering having control messages without
>> >>> > > >>>> data,
>> >>> > and
>> >>> > > >>>> I thought that could be encoded by a FlightData with only
>> >>> > app_metadata
>> >>> > > >>>> set. I think I understand your position now: FlightData
>> >>> > > >>>> should
>> >>> > always
>> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > > >>>>
>> >>> > > >>>> That makes sense to me, and is consistent with the
>> >>> > > >>>> documentation
>> >>> > > >>>> on
>> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
>> >>> > > >>>> a
>> >>> > > >>>> redundant metadata field, but oneof prevents that from
>> >>> > > >>>> happening,
>> >>> > and
>> >>> > > >>>> overall having a clear separation between data and control
>> >>> > > >>>> messages
>> >>> > is
>> >>> > > >>>> cleaner.
>> >>> > > >>>>
>> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> >>> > > >>>> exposing
>> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> >>> > > >>>>
>> >>> > > >>>> Best,
>> >>> > > >>>> David
>> >>> > > >>>>
>> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>>>> Hey David,
>> >>> > > >>>>>
>> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> >>> > > >>>>> doget/doput
>> >>> > > >>>>> for
>> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
>> >>> > > >>>>> always
>> >>> > pattern.
>> >>> > > >>>>>
>> >>> > > >>>>> On the comment around the FlightData, I think it is
>> >>> > > >>>>> overloading
>> >>> > > >>>>> the
>> >>> > > >>>> message
>> >>> > > >>>>> to use metadata for this. If I want to send a control
>> >>> > > >>>>> message
>> >>> > > >>>> independently
>> >>> > > >>>>> of the data message, I would have to define something like
>> >>> > > >>>>> an
>> >>> > > >>>>> empty
>> >>> > > >>>> flight
>> >>> > > >>>>> data message that has custom metadata. Why not support a
>> >>> > > >>>>> container
>> >>> > > >>>>> object
>> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
>> >>> > > >>>>> add
>> >>> > > >>>>> more
>> >>> > data
>> >>> > > >>>>> as
>> >>> > > >>>>> desired. The default impl could be a noop for the Any
>> >>> > > >>>>> messages.
>> >>> > > >>>>>
>> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > > >>>>> <li...@gmail.com>
>> >>> > > >>>>> wrote:
>> >>> > > >>>>>
>> >>> > > >>>>>> Hi Jacques,
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks for the comments.
>> >>> > > >>>>>>
>> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > > >>>>>> - FlightData already has metadata fields as a result of
>> prior
>> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> >>> > > >>>>>> that
>> >>> > kind
>> >>> > > >>>>>> of information.
>> >>> > > >>>>>> - I like the suggestion of an async handler to handle
>> >>> > > >>>>>> incoming
>> >>> > > >>>>>> messages as the fundamental API; it would actually be
>> >>> > > >>>>>> quite
>> >>> > natural
>> >>> > > >>>>>> to
>> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
>> >>> > > >>>>>> possible
>> >>> > > >>>>>> in
>> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> >>> > gRPC-Java
>> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> >>> > experimental
>> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
>> >>> > > >>>>>> but
>> >>> > > >>>>>> those
>> >>> > > >>>>>> are
>> >>> > > >>>>>> only in relatively recent gRPC versions and are still
>> >>> > > >>>>>> under
>> >>> > > >>>>>> development (contrary to the interceptor APIs which have
>> been
>> >>> > around
>> >>> > > >>>>>> for quite a while).
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks,
>> >>> > > >>>>>> David
>> >>> > > >>>>>>
>> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> >>> > > >>>>>>> discussion
>> >>> > > >>>>>>> here
>> >>> > > >>>>>>> depending on your thoughts.
>> >>> > > >>>>>>>
>> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > > >>>>>>> <li...@gmail.com>
>> >>> > > >>>> wrote:
>> >>> > > >>>>>>>
>> >>> > > >>>>>>>> Hey Ryan,
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
>> >>> > > >>>>>>>> Python
>> >>> > strawman.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> >>> > > >>>>>>>> interleave
>> >>> > > >>>> uploads
>> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> >>> > > >>>>>>>> synchronous
>> >>> > APIs
>> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
>> >>> > > >>>>>>>> wait
>> >>> > > >>>>>>>> for
>> >>> > each
>> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
>> >>> > > >>>>>>>> give
>> >>> > > >>>>>>>> us
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>> >>> > > >>>>>>>> timeout.)
>> >>> > > >>>>>>>> As
>> >>> > an
>> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
>> >>> > > >>>>>>>> want
>> >>> > > >>>>>>>> to
>> >>> > design
>> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> >>> > > >>>>>>>> stream
>> >>> > > >>>>>>>> like
>> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
>> >>> > > >>>>>>>> to
>> >>> > > >>>>>>>> run
>> >>> > into.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>> >>> > > >>>>>>>> the
>> >>> > main
>> >>> > > >>>>>>>> concern is that depending on how you deploy, two
>> >>> > > >>>>>>>> separate
>> >>> > > >>>>>>>> calls
>> >>> > > >>>>>>>> could
>> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
>> >>> > > >>>>>>>> has
>> >>> > > >>>>>>>> some
>> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
>> >>> > > >>>>>>>> between
>> >>> > > >>>>>>>> the
>> >>> > two
>> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> >>> > available,
>> >>> > > >>>>>>>> the client will happily reconnect to the new server
>> without
>> >>> > > >>>>>>>> warning.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks,
>> >>> > > >>>>>>>> David
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> >>> > > >>>>>>>>> Hey David,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
>> >>> > > >>>>>>>>> and
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> possibility
>> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>>> help
>> >>> > me
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> be
>> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> >>> > > >>>>>>>>> Also,
>> >>> > what
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> client
>> >>> > > >>>>>>>>> block
>> >>> > > >>>>>>>> till
>> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> flight.vector_mult(a,
>> >>> > > >>>>>>>>> b)
>> >>> > or
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
>> >>> > > >>>>>>>>> If
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> later
>> >>> > > >>>>>>>>> how
>> >>> > > >>>>>>>>> is
>> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>> >>> > > >>>>>>>>> this
>> >>> > could
>> >>> > > >>>> be
>> >>> > > >>>>>>>>> implemented without extending the RPC interface but
>> rather
>> >>> > > >>>>>>>>> by a
>> >>> > > >>>>>>>>> function/util?
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Best,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> >>> > li.davidm96@gmail.com>
>> >>> > > >>>>>> wrote:
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>> Hi all,
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
>> we
>> >>> > > >>>>>>>>>> have
>> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> both
>> >>> > > >>>>>>>>>> send
>> >>> > > >>>>>>>>>> and
>> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>> >>> > > >>>>>>>>>> end,
>> >>> > I've
>> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>
>> >>> >
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> >>> > document.
>> >>> > > >>>>>>>>>> I'd
>> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> >>> > > >>>>>>>>>> straightforward
>> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> >>> > > >>>>>>>>>> decided
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> table the other format change I had proposed (on
>> >>> > > >>>>>>>>>> DoPut),
>> >>> > > >>>>>>>>>> as
>> >>> > > >>>>>>>>>> it
>> >>> > > >>>>>> doesn't
>> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>> >>> > > >>>>>>>>>> the
>> >>> > > >>>>>>>>>> semantics.
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Thanks,
>> >>> > > >>>>>>>>>> David
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> --
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
>> join
>> >>> > > >>>>>>>>> our
>> >>> > > >>>>>>>>> community
>> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>>
>> >>> > > >>>>
>> >>> > > >>>
>> >>> > > >
>> >>> >
>> >>
>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Jacques Nadeau <ja...@apache.org>.
Fair enough. I'm okay with the bytes approach and the proposal looks good
to me.

On Fri, Nov 8, 2019 at 11:37 AM David Li <li...@gmail.com> wrote:

> I've updated the proposal.
>
> On the subject of Protobuf Any vs bytes, and how to handle
> errors/metadata, I still think using bytes is preferable:
> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
> - We wouldn't be able to practically expose the Protobuf field to C++
> users without causing build pains,
> - We can't let Python users take advantage of the Protobuf field
> without somehow being compatible with the Protobuf wheels (by linking
> to the same version, and doing magic to turn the C++ Protobufs into
> the Python ones),
> - All our other application-defined fields are already bytes.
>
> Applications that want structure can encode JSON or Protobuf Any into
> the bytes field themselves, much as you can already do for Ticket,
> commands in FlightDescriptors, and application metadata in
> DoGet/DoPut. I don't think this is (much) less efficient than using
> Any directly, since Any itself is a bytes field with a tag, and must
> invoke the Protobuf deserializer again to read the actual message.
>
> If we decide on using bytes, then I don't think it makes sense to
> define a new message with a oneof either, since it would be redundant.
>
> Thanks,
> David
>
> On 11/7/19, David Li <li...@gmail.com> wrote:
> > I've been extremely backlogged, I will update the proposal when I get
> > a chance and reply here when done.
> >
> > Best,
> > David
> >
> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> Bumping this discussion since a couple of weeks have passed. It seems
> >> there are still some questions here, could we summarize what are the
> >> alternatives along with any public API implications so we can try to
> >> render a decision?
> >>
> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
> >>>
> >>> Hi Wes,
> >>>
> >>> Responses inline:
> >>>
> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
> >>>
> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > The question is whether to repurpose the existing FlightData
> >>> > > structure, and allow for the metadata field to be filled in and
> data
> >>> > > fields to be blank (as a control message), or to wrap the
> FlightData
> >>> > > structure in another structure that explicitly distinguishes
> between
> >>> > > control and data messages.
> >>> >
> >>> > I'm not super against having metadata-only FlightData with empty
> body.
> >>> > One question to consider is what changes (if any) would need to be
> >>> > made to public APIs in either scenario.
> >>> >
> >>>
> >>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
> >>> in
> >>> the future. This would be a breaking change, but wouldn't change the
> >>> wire
> >>> format. I think the APIs could be changed backwards compatibly, though.
> >>>
> >>>
> >>>
> >>> > > The other question is how to handle the metadata fields. So far,
> >>> > > we've
> >>> > > used bytestring fields for application-defined data. This is
> >>> > > workable
> >>> > > if you want to use Protobuf to define the contents of those fields,
> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >>> > > bytestring
> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> >>> > > typed
> >>> > > field, this would be more convenient, but then we'd be exposing
> >>> > > Protobuf types. We could alternatively use a combination of a type
> >>> > > field and a bytestring field, mimicking what the Protobuf Any type
> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> any
> >>> > > of the language APIs, though.
> >>> >
> >>> > Leaving the deserialization of the app metadata to the particular
> >>> > Flight implementation seems on first principles like the most
> flexible
> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >>> > protobuf?
> >>> >
> >>>
> >>>
> >>> If Any is used, we could still expose a bytes-based API, but it would
> >>> have
> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >>> question
> >>> would just be how to expose this (would be easier in Java, harder in
> >>> C++).
> >>>
> >>>
> >>>
> >>> > > David
> >>> > >
> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
> >>> > > >
> >>> > > > Can one of you explain what is being proposed in non-protobuf
> >>> > > > terms?
> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >>> > > >
> >>> > > > Regards
> >>> > > >
> >>> > > > Antoine.
> >>> > > >
> >>> > > >
> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
> be
> >>> > > >> application-level logic. (The official guide doesn't even
> mention
> >>> > > >> it
> >>> > > >> in the encoding docs; I found
> >>> > > >>
> >>> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > > >> as well.)
> >>> > > >>
> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> >>> > > >> inlining
> >>> > > >> the definition of Any, e.g.
> >>> > > >>
> >>> > > >> message FlightMessage {
> >>> > > >>   oneof message {
> >>> > > >>     FlightData data = 1;
> >>> > > >>     FlightAny metadata = 2;
> >>> > > >>   }
> >>> > > >> }
> >>> > > >>
> >>> > > >> message FlightAny {
> >>> > > >>   string type = 1;
> >>> > > >>   bytes data = 2;
> >>> > > >> }
> >>> > > >>
> >>> > > >> Is this correct?
> >>> > > >>
> >>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
> >>> > > >> as
> >>> > > >> well, but at that point, I'd rather we be consistent with all of
> >>> > > >> them,
> >>> > > >> rather than have one of the three methods do its own thing.
> >>> > > >>
> >>> > > >> Thanks,
> >>> > > >> David
> >>> > > >>
> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>> I think we could probably expose the oneof behavior without
> >>> > > >>> exposing
> >>> > the
> >>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
> >>> > > >>> as
> >>> > > >>> two
> >>> > > >>> fields: type and data. Then users could use it for whatever but
> >>> > > >>> if
> >>> > > >>> people
> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> >>> > > >>> could
> >>> > > >>> use
> >>> > > >>> any
> >>> > > >>> with it easily but they could also use any other mechanism). At
> >>> > least in
> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> language
> >>> > > >>> bindings
> >>> > > >>> less diy?
> >>> > > >>>
> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
> >>> > > >>> it
> >>> > just
> >>> > > >>> seemed a bit janky.
> >>> > > >>>
> >>> > > >>> Thinking about the control message/wrapper object thing, I
> >>> > > >>> wonder
> >>> > > >>> if
> >>> > we
> >>> > > >>> should redefine DoPut and DoGet to have the same property if we
> >>> > think it
> >>> > > >>> is
> >>> > > >>> a good idea...
> >>> > > >>>
> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> li.davidm96@gmail.com>
> >>> > wrote:
> >>> > > >>>
> >>> > > >>>> I was definitely considering having control messages without
> >>> > > >>>> data,
> >>> > and
> >>> > > >>>> I thought that could be encoded by a FlightData with only
> >>> > app_metadata
> >>> > > >>>> set. I think I understand your position now: FlightData should
> >>> > always
> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > > >>>>
> >>> > > >>>> That makes sense to me, and is consistent with the
> >>> > > >>>> documentation
> >>> > > >>>> on
> >>> > > >>>> FlightData in the Protobuf file. I was worried about having a
> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >>> > > >>>> happening,
> >>> > and
> >>> > > >>>> overall having a clear separation between data and control
> >>> > > >>>> messages
> >>> > is
> >>> > > >>>> cleaner.
> >>> > > >>>>
> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> >>> > > >>>> exposing
> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> >>> > > >>>>
> >>> > > >>>> Best,
> >>> > > >>>> David
> >>> > > >>>>
> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>>>> Hey David,
> >>> > > >>>>>
> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> >>> > > >>>>> doget/doput
> >>> > > >>>>> for
> >>> > > >>>>> async. Yes, more thinking java given java grpc's async always
> >>> > pattern.
> >>> > > >>>>>
> >>> > > >>>>> On the comment around the FlightData, I think it is
> >>> > > >>>>> overloading
> >>> > > >>>>> the
> >>> > > >>>> message
> >>> > > >>>>> to use metadata for this. If I want to send a control message
> >>> > > >>>> independently
> >>> > > >>>>> of the data message, I would have to define something like an
> >>> > > >>>>> empty
> >>> > > >>>> flight
> >>> > > >>>>> data message that has custom metadata. Why not support a
> >>> > > >>>>> container
> >>> > > >>>>> object
> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
> >>> > > >>>>> more
> >>> > data
> >>> > > >>>>> as
> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >>> > > >>>>> messages.
> >>> > > >>>>>
> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > > >>>>> <li...@gmail.com>
> >>> > > >>>>> wrote:
> >>> > > >>>>>
> >>> > > >>>>>> Hi Jacques,
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks for the comments.
> >>> > > >>>>>>
> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> prior
> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
> >>> > > >>>>>> that
> >>> > kind
> >>> > > >>>>>> of information.
> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >>> > > >>>>>> incoming
> >>> > > >>>>>> messages as the fundamental API; it would actually be quite
> >>> > natural
> >>> > > >>>>>> to
> >>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
> >>> > > >>>>>> in
> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
> >>> > gRPC-Java
> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> >>> > experimental
> >>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
> >>> > > >>>>>> those
> >>> > > >>>>>> are
> >>> > > >>>>>> only in relatively recent gRPC versions and are still under
> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> been
> >>> > around
> >>> > > >>>>>> for quite a while).
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks,
> >>> > > >>>>>> David
> >>> > > >>>>>>
> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> >>> > > >>>>>>> discussion
> >>> > > >>>>>>> here
> >>> > > >>>>>>> depending on your thoughts.
> >>> > > >>>>>>>
> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > > >>>>>>> <li...@gmail.com>
> >>> > > >>>> wrote:
> >>> > > >>>>>>>
> >>> > > >>>>>>>> Hey Ryan,
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks for the comments.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> >>> > strawman.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >>> > > >>>>>>>> interleave
> >>> > > >>>> uploads
> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >>> > > >>>>>>>> synchronous
> >>> > APIs
> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
> >>> > > >>>>>>>> for
> >>> > each
> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
> >>> > > >>>>>>>> us
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >>> > > >>>>>>>> timeout.)
> >>> > > >>>>>>>> As
> >>> > an
> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> This is mostly tangential though, eventually we will want
> >>> > > >>>>>>>> to
> >>> > design
> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
> >>> > > >>>>>>>> stream
> >>> > > >>>>>>>> like
> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
> >>> > > >>>>>>>> run
> >>> > into.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
> >>> > > >>>>>>>> the
> >>> > main
> >>> > > >>>>>>>> concern is that depending on how you deploy, two separate
> >>> > > >>>>>>>> calls
> >>> > > >>>>>>>> could
> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
> >>> > > >>>>>>>> some
> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
> >>> > > >>>>>>>> the
> >>> > two
> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
> >>> > available,
> >>> > > >>>>>>>> the client will happily reconnect to the new server
> without
> >>> > > >>>>>>>> warning.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks,
> >>> > > >>>>>>>> David
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
> >>> > > >>>>>>>>> Hey David,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> possibility
> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
> >>> > > >>>>>>>>> help
> >>> > me
> >>> > > >>>>>> would
> >>> > > >>>>>>>> be
> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> >>> > > >>>>>>>>> Also,
> >>> > what
> >>> > > >>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
> >>> > > >>>>>>>>> client
> >>> > > >>>>>>>>> block
> >>> > > >>>>>>>> till
> >>> > > >>>>>>>>> the bidirectional call return ie c =
> flight.vector_mult(a,
> >>> > > >>>>>>>>> b)
> >>> > or
> >>> > > >>>>>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client wait to be signaled that computation was done. If
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> later
> >>> > > >>>>>>>>> how
> >>> > > >>>>>>>>> is
> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> >>> > > >>>>>>>>> this
> >>> > could
> >>> > > >>>> be
> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> rather
> >>> > > >>>>>>>>> by a
> >>> > > >>>>>>>>> function/util?
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Best,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > li.davidm96@gmail.com>
> >>> > > >>>>>> wrote:
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>> Hi all,
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
> we
> >>> > > >>>>>>>>>> have
> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
> >>> > > >>>>>>>>>> both
> >>> > > >>>>>>>>>> send
> >>> > > >>>>>>>>>> and
> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> >>> > > >>>>>>>>>> end,
> >>> > I've
> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>
> >>> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
> >>> > document.
> >>> > > >>>>>>>>>> I'd
> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >>> > > >>>>>>>>>> straightforward
> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
> >>> > > >>>>>>>>>> decided
> >>> > > >>>>>>>>>> to
> >>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
> >>> > > >>>>>>>>>> as
> >>> > > >>>>>>>>>> it
> >>> > > >>>>>> doesn't
> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
> >>> > > >>>>>>>>>> the
> >>> > > >>>>>>>>>> semantics.
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Thanks,
> >>> > > >>>>>>>>>> David
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> --
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> join
> >>> > > >>>>>>>>> our
> >>> > > >>>>>>>>> community
> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>>
> >>> > > >>>>
> >>> > > >>>
> >>> > > >
> >>> >
> >>
> >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
I've updated the proposal.

On the subject of Protobuf Any vs bytes, and how to handle
errors/metadata, I still think using bytes is preferable:
- It doesn't require (conditionally) exposing or wrapping Protobuf types,
- We wouldn't be able to practically expose the Protobuf field to C++
users without causing build pains,
- We can't let Python users take advantage of the Protobuf field
without somehow being compatible with the Protobuf wheels (by linking
to the same version, and doing magic to turn the C++ Protobufs into
the Python ones),
- All our other application-defined fields are already bytes.

Applications that want structure can encode JSON or Protobuf Any into
the bytes field themselves, much as you can already do for Ticket,
commands in FlightDescriptors, and application metadata in
DoGet/DoPut. I don't think this is (much) less efficient than using
Any directly, since Any itself is a bytes field with a tag, and must
invoke the Protobuf deserializer again to read the actual message.

If we decide on using bytes, then I don't think it makes sense to
define a new message with a oneof either, since it would be redundant.

Thanks,
David

On 11/7/19, David Li <li...@gmail.com> wrote:
> I've been extremely backlogged, I will update the proposal when I get
> a chance and reply here when done.
>
> Best,
> David
>
> On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> Bumping this discussion since a couple of weeks have passed. It seems
>> there are still some questions here, could we summarize what are the
>> alternatives along with any public API implications so we can try to
>> render a decision?
>>
>> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>>>
>>> Hi Wes,
>>>
>>> Responses inline:
>>>
>>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>>>
>>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com>
>>> > wrote:
>>> > >
>>> > > The question is whether to repurpose the existing FlightData
>>> > > structure, and allow for the metadata field to be filled in and data
>>> > > fields to be blank (as a control message), or to wrap the FlightData
>>> > > structure in another structure that explicitly distinguishes between
>>> > > control and data messages.
>>> >
>>> > I'm not super against having metadata-only FlightData with empty body.
>>> > One question to consider is what changes (if any) would need to be
>>> > made to public APIs in either scenario.
>>> >
>>>
>>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>>> in
>>> the future. This would be a breaking change, but wouldn't change the
>>> wire
>>> format. I think the APIs could be changed backwards compatibly, though.
>>>
>>>
>>>
>>> > > The other question is how to handle the metadata fields. So far,
>>> > > we've
>>> > > used bytestring fields for application-defined data. This is
>>> > > workable
>>> > > if you want to use Protobuf to define the contents of those fields,
>>> > > but requires you to pack/unpack your Protobuf into/from the
>>> > > bytestring
>>> > > field. If we instead used the Protobuf Any field, a dynamically
>>> > > typed
>>> > > field, this would be more convenient, but then we'd be exposing
>>> > > Protobuf types. We could alternatively use a combination of a type
>>> > > field and a bytestring field, mimicking what the Protobuf Any type
>>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>>> > > of the language APIs, though.
>>> >
>>> > Leaving the deserialization of the app metadata to the particular
>>> > Flight implementation seems on first principles like the most flexible
>>> > thing, if Any is used, does that mean the metadata _must_ be a
>>> > protobuf?
>>> >
>>>
>>>
>>> If Any is used, we could still expose a bytes-based API, but it would
>>> have
>>> some more wrapping. (We could put a ByteString in Any.) Then the
>>> question
>>> would just be how to expose this (would be easier in Java, harder in
>>> C++).
>>>
>>>
>>>
>>> > > David
>>> > >
>>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>>> > > >
>>> > > > Can one of you explain what is being proposed in non-protobuf
>>> > > > terms?
>>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>>> > > >
>>> > > > Regards
>>> > > >
>>> > > > Antoine.
>>> > > >
>>> > > >
>>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>>> > > >> application-level logic. (The official guide doesn't even mention
>>> > > >> it
>>> > > >> in the encoding docs; I found
>>> > > >>
>>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > > >> as well.)
>>> > > >>
>>> > > >> If I follow you, Jacques, then you are proposing essentially
>>> > > >> inlining
>>> > > >> the definition of Any, e.g.
>>> > > >>
>>> > > >> message FlightMessage {
>>> > > >>   oneof message {
>>> > > >>     FlightData data = 1;
>>> > > >>     FlightAny metadata = 2;
>>> > > >>   }
>>> > > >> }
>>> > > >>
>>> > > >> message FlightAny {
>>> > > >>   string type = 1;
>>> > > >>   bytes data = 2;
>>> > > >> }
>>> > > >>
>>> > > >> Is this correct?
>>> > > >>
>>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>>> > > >> as
>>> > > >> well, but at that point, I'd rather we be consistent with all of
>>> > > >> them,
>>> > > >> rather than have one of the three methods do its own thing.
>>> > > >>
>>> > > >> Thanks,
>>> > > >> David
>>> > > >>
>>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>> I think we could probably expose the oneof behavior without
>>> > > >>> exposing
>>> > the
>>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
>>> > > >>> as
>>> > > >>> two
>>> > > >>> fields: type and data. Then users could use it for whatever but
>>> > > >>> if
>>> > > >>> people
>>> > > >>> wanted to treat it as any, it would work. (Basically a user
>>> > > >>> could
>>> > > >>> use
>>> > > >>> any
>>> > > >>> with it easily but they could also use any other mechanism). At
>>> > least in
>>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>>> > > >>> bindings
>>> > > >>> less diy?
>>> > > >>>
>>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
>>> > > >>> it
>>> > just
>>> > > >>> seemed a bit janky.
>>> > > >>>
>>> > > >>> Thinking about the control message/wrapper object thing, I
>>> > > >>> wonder
>>> > > >>> if
>>> > we
>>> > > >>> should redefine DoPut and DoGet to have the same property if we
>>> > think it
>>> > > >>> is
>>> > > >>> a good idea...
>>> > > >>>
>>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
>>> > wrote:
>>> > > >>>
>>> > > >>>> I was definitely considering having control messages without
>>> > > >>>> data,
>>> > and
>>> > > >>>> I thought that could be encoded by a FlightData with only
>>> > app_metadata
>>> > > >>>> set. I think I understand your position now: FlightData should
>>> > always
>>> > > >>>> carry (some) data (with optional metadata)?
>>> > > >>>>
>>> > > >>>> That makes sense to me, and is consistent with the
>>> > > >>>> documentation
>>> > > >>>> on
>>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>>> > > >>>> redundant metadata field, but oneof prevents that from
>>> > > >>>> happening,
>>> > and
>>> > > >>>> overall having a clear separation between data and control
>>> > > >>>> messages
>>> > is
>>> > > >>>> cleaner.
>>> > > >>>>
>>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>>> > > >>>> exposing
>>> > > >>>> Protobuf by using bytes, would we want to change that now?
>>> > > >>>>
>>> > > >>>> Best,
>>> > > >>>> David
>>> > > >>>>
>>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>>>> Hey David,
>>> > > >>>>>
>>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>>> > > >>>>> doget/doput
>>> > > >>>>> for
>>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>>> > pattern.
>>> > > >>>>>
>>> > > >>>>> On the comment around the FlightData, I think it is
>>> > > >>>>> overloading
>>> > > >>>>> the
>>> > > >>>> message
>>> > > >>>>> to use metadata for this. If I want to send a control message
>>> > > >>>> independently
>>> > > >>>>> of the data message, I would have to define something like an
>>> > > >>>>> empty
>>> > > >>>> flight
>>> > > >>>>> data message that has custom metadata. Why not support a
>>> > > >>>>> container
>>> > > >>>>> object
>>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>>> > > >>>>> more
>>> > data
>>> > > >>>>> as
>>> > > >>>>> desired. The default impl could be a noop for the Any
>>> > > >>>>> messages.
>>> > > >>>>>
>>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > > >>>>> <li...@gmail.com>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>>> Hi Jacques,
>>> > > >>>>>>
>>> > > >>>>>> Thanks for the comments.
>>> > > >>>>>>
>>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>>> > > >>>>>> proposals, so I don't think we need a new message to carry
>>> > > >>>>>> that
>>> > kind
>>> > > >>>>>> of information.
>>> > > >>>>>> - I like the suggestion of an async handler to handle
>>> > > >>>>>> incoming
>>> > > >>>>>> messages as the fundamental API; it would actually be quite
>>> > natural
>>> > > >>>>>> to
>>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>>> > > >>>>>> in
>>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>>> > gRPC-Java
>>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>>> > experimental
>>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>>> > > >>>>>> those
>>> > > >>>>>> are
>>> > > >>>>>> only in relatively recent gRPC versions and are still under
>>> > > >>>>>> development (contrary to the interceptor APIs which have been
>>> > around
>>> > > >>>>>> for quite a while).
>>> > > >>>>>>
>>> > > >>>>>> Thanks,
>>> > > >>>>>> David
>>> > > >>>>>>
>>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>>> > > >>>>>>> discussion
>>> > > >>>>>>> here
>>> > > >>>>>>> depending on your thoughts.
>>> > > >>>>>>>
>>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > > >>>>>>> <li...@gmail.com>
>>> > > >>>> wrote:
>>> > > >>>>>>>
>>> > > >>>>>>>> Hey Ryan,
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks for the comments.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>>> > strawman.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>>> > > >>>>>>>> interleave
>>> > > >>>> uploads
>>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>>> > > >>>>>>>> synchronous
>>> > APIs
>>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>>> > > >>>>>>>> for
>>> > each
>>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
>>> > > >>>>>>>> us
>>> > > >>>>>>>> the
>>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>>> > > >>>>>>>> timeout.)
>>> > > >>>>>>>> As
>>> > an
>>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > > >>>>>>>>
>>> > > >>>>>>>> This is mostly tangential though, eventually we will want
>>> > > >>>>>>>> to
>>> > design
>>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>>> > > >>>>>>>> stream
>>> > > >>>>>>>> like
>>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>>> > > >>>>>>>> run
>>> > into.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>>> > > >>>>>>>> the
>>> > main
>>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>>> > > >>>>>>>> calls
>>> > > >>>>>>>> could
>>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>>> > > >>>>>>>> some
>>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>>> > > >>>>>>>> the
>>> > two
>>> > > >>>>>>>> calls, but it then restarts or there is another instance
>>> > available,
>>> > > >>>>>>>> the client will happily reconnect to the new server without
>>> > > >>>>>>>> warning.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks,
>>> > > >>>>>>>> David
>>> > > >>>>>>>>
>>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>>> > > >>>>>>>>> Hey David,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> possibility
>>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>>> > > >>>>>>>>> help
>>> > me
>>> > > >>>>>> would
>>> > > >>>>>>>> be
>>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>>> > > >>>>>>>>> Also,
>>> > what
>>> > > >>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>>> > > >>>>>>>>> client
>>> > > >>>>>>>>> block
>>> > > >>>>>>>> till
>>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>>> > > >>>>>>>>> b)
>>> > or
>>> > > >>>>>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> later
>>> > > >>>>>>>>> how
>>> > > >>>>>>>>> is
>>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>>> > > >>>>>>>>> this
>>> > could
>>> > > >>>> be
>>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>>> > > >>>>>>>>> by a
>>> > > >>>>>>>>> function/util?
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Best,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > li.davidm96@gmail.com>
>>> > > >>>>>> wrote:
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>> Hi all,
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>>> > > >>>>>>>>>> have
>>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>>> > > >>>>>>>>>> both
>>> > > >>>>>>>>>> send
>>> > > >>>>>>>>>> and
>>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>>> > > >>>>>>>>>> end,
>>> > I've
>>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>
>>> > > >>>>
>>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>>> > document.
>>> > > >>>>>>>>>> I'd
>>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>>> > > >>>>>>>>>> straightforward
>>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>>> > > >>>>>>>>>> decided
>>> > > >>>>>>>>>> to
>>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
>>> > > >>>>>>>>>> as
>>> > > >>>>>>>>>> it
>>> > > >>>>>> doesn't
>>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>>> > > >>>>>>>>>> the
>>> > > >>>>>>>>>> semantics.
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Thanks,
>>> > > >>>>>>>>>> David
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> --
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>>> > > >>>>>>>>> our
>>> > > >>>>>>>>> community
>>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>
>>> > > >>>>>>
>>> > > >>>>>
>>> > > >>>>
>>> > > >>>
>>> > > >
>>> >
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
I've been extremely backlogged, I will update the proposal when I get
a chance and reply here when done.

Best,
David

On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> Bumping this discussion since a couple of weeks have passed. It seems
> there are still some questions here, could we summarize what are the
> alternatives along with any public API implications so we can try to
> render a decision?
>
> On Sat, Oct 26, 2019 at 7:19 PM David Li <li...@gmail.com> wrote:
>>
>> Hi Wes,
>>
>> Responses inline:
>>
>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <we...@gmail.com> wrote:
>>
>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li...@gmail.com> wrote:
>> > >
>> > > The question is whether to repurpose the existing FlightData
>> > > structure, and allow for the metadata field to be filled in and data
>> > > fields to be blank (as a control message), or to wrap the FlightData
>> > > structure in another structure that explicitly distinguishes between
>> > > control and data messages.
>> >
>> > I'm not super against having metadata-only FlightData with empty body.
>> > One question to consider is what changes (if any) would need to be
>> > made to public APIs in either scenario.
>> >
>>
>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>> in
>> the future. This would be a breaking change, but wouldn't change the wire
>> format. I think the APIs could be changed backwards compatibly, though.
>>
>>
>>
>> > > The other question is how to handle the metadata fields. So far,
>> > > we've
>> > > used bytestring fields for application-defined data. This is workable
>> > > if you want to use Protobuf to define the contents of those fields,
>> > > but requires you to pack/unpack your Protobuf into/from the
>> > > bytestring
>> > > field. If we instead used the Protobuf Any field, a dynamically typed
>> > > field, this would be more convenient, but then we'd be exposing
>> > > Protobuf types. We could alternatively use a combination of a type
>> > > field and a bytestring field, mimicking what the Protobuf Any type
>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>> > > of the language APIs, though.
>> >
>> > Leaving the deserialization of the app metadata to the particular
>> > Flight implementation seems on first principles like the most flexible
>> > thing, if Any is used, does that mean the metadata _must_ be a
>> > protobuf?
>> >
>>
>>
>> If Any is used, we could still expose a bytes-based API, but it would
>> have
>> some more wrapping. (We could put a ByteString in Any.) Then the question
>> would just be how to expose this (would be easier in Java, harder in
>> C++).
>>
>>
>>
>> > > David
>> > >
>> > > On 10/21/19, Antoine Pitrou <an...@python.org> wrote:
>> > > >
>> > > > Can one of you explain what is being proposed in non-protobuf
>> > > > terms?
>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > > >
>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>> > > >> application-level logic. (The official guide doesn't even mention
>> > > >> it
>> > > >> in the encoding docs; I found
>> > > >>
>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > > >> as well.)
>> > > >>
>> > > >> If I follow you, Jacques, then you are proposing essentially
>> > > >> inlining
>> > > >> the definition of Any, e.g.
>> > > >>
>> > > >> message FlightMessage {
>> > > >>   oneof message {
>> > > >>     FlightData data = 1;
>> > > >>     FlightAny metadata = 2;
>> > > >>   }
>> > > >> }
>> > > >>
>> > > >> message FlightAny {
>> > > >>   string type = 1;
>> > > >>   bytes data = 2;
>> > > >> }
>> > > >>
>> > > >> Is this correct?
>> > > >>
>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>> > > >> as
>> > > >> well, but at that point, I'd rather we be consistent with all of
>> > > >> them,
>> > > >> rather than have one of the three methods do its own thing.
>> > > >>
>> > > >> Thanks,
>> > > >> David
>> > > >>
>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>> I think we could probably expose the oneof behavior without
>> > > >>> exposing
>> > the
>> > > >>> protobuf functions. On the any... hmm. I guess we could expose as
>> > > >>> two
>> > > >>> fields: type and data. Then users could use it for whatever but
>> > > >>> if
>> > > >>> people
>> > > >>> wanted to treat it as any, it would work. (Basically a user could
>> > > >>> use
>> > > >>> any
>> > > >>> with it easily but they could also use any other mechanism). At
>> > least in
>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>> > > >>> bindings
>> > > >>> less diy?
>> > > >>>
>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it
>> > just
>> > > >>> seemed a bit janky.
>> > > >>>
>> > > >>> Thinking about the control message/wrapper object thing, I wonder
>> > > >>> if
>> > we
>> > > >>> should redefine DoPut and DoGet to have the same property if we
>> > think it
>> > > >>> is
>> > > >>> a good idea...
>> > > >>>
>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li...@gmail.com>
>> > wrote:
>> > > >>>
>> > > >>>> I was definitely considering having control messages without
>> > > >>>> data,
>> > and
>> > > >>>> I thought that could be encoded by a FlightData with only
>> > app_metadata
>> > > >>>> set. I think I understand your position now: FlightData should
>> > always
>> > > >>>> carry (some) data (with optional metadata)?
>> > > >>>>
>> > > >>>> That makes sense to me, and is consistent with the documentation
>> > > >>>> on
>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>> > > >>>> redundant metadata field, but oneof prevents that from
>> > > >>>> happening,
>> > and
>> > > >>>> overall having a clear separation between data and control
>> > > >>>> messages
>> > is
>> > > >>>> cleaner.
>> > > >>>>
>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> > > >>>> exposing
>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> > > >>>>
>> > > >>>> Best,
>> > > >>>> David
>> > > >>>>
>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>>>> Hey David,
>> > > >>>>>
>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> > > >>>>> doget/doput
>> > > >>>>> for
>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>> > pattern.
>> > > >>>>>
>> > > >>>>> On the comment around the FlightData, I think it is overloading
>> > > >>>>> the
>> > > >>>> message
>> > > >>>>> to use metadata for this. If I want to send a control message
>> > > >>>> independently
>> > > >>>>> of the data message, I would have to define something like an
>> > > >>>>> empty
>> > > >>>> flight
>> > > >>>>> data message that has custom metadata. Why not support a
>> > > >>>>> container
>> > > >>>>> object
>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>> > > >>>>> more
>> > data
>> > > >>>>> as
>> > > >>>>> desired. The default impl could be a noop for the Any messages.
>> > > >>>>>
>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > > >>>>> <li...@gmail.com>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hi Jacques,
>> > > >>>>>>
>> > > >>>>>> Thanks for the comments.
>> > > >>>>>>
>> > > >>>>>> - I do agree DoExchange is a better name!
>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> > > >>>>>> that
>> > kind
>> > > >>>>>> of information.
>> > > >>>>>> - I like the suggestion of an async handler to handle incoming
>> > > >>>>>> messages as the fundamental API; it would actually be quite
>> > natural
>> > > >>>>>> to
>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>> > > >>>>>> in
>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> > gRPC-Java
>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> > experimental
>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>> > > >>>>>> those
>> > > >>>>>> are
>> > > >>>>>> only in relatively recent gRPC versions and are still under
>> > > >>>>>> development (contrary to the interceptor APIs which have been
>> > around
>> > > >>>>>> for quite a while).
>> > > >>>>>>
>> > > >>>>>> Thanks,
>> > > >>>>>> David
>> > > >>>>>>
>> > > >>>>>> On 10/15/19, Jacques Nadeau <ja...@apache.org> wrote:
>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> > > >>>>>>> discussion
>> > > >>>>>>> here
>> > > >>>>>>> depending on your thoughts.
>> > > >>>>>>>
>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > > >>>>>>> <li...@gmail.com>
>> > > >>>> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> Hey Ryan,
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks for the comments.
>> > > >>>>>>>>
>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>> > strawman.
>> > > >>>>>>>>
>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> > > >>>>>>>> interleave
>> > > >>>> uploads
>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> > > >>>>>>>> synchronous
>> > APIs
>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>> > > >>>>>>>> for
>> > each
>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us
>> > > >>>>>>>> the
>> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.)
>> > > >>>>>>>> As
>> > an
>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> > > >>>>>>>>
>> > > >>>>>>>> This is mostly tangential though, eventually we will want to
>> > design
>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> > > >>>>>>>> stream
>> > > >>>>>>>> like
>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>> > > >>>>>>>> run
>> > into.
>> > > >>>>>>>>
>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
>> > main
>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>> > > >>>>>>>> calls
>> > > >>>>>>>> could
>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>> > > >>>>>>>> some
>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>> > > >>>>>>>> the
>> > two
>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> > available,
>> > > >>>>>>>> the client will happily reconnect to the new server without
>> > > >>>>>>>> warning.
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks,
>> > > >>>>>>>> David
>> > > >>>>>>>>
>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com> wrote:
>> > > >>>>>>>>> Hey David,
>> > > >>>>>>>>>
>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>> > > >>>>>>>>> the
>> > > >>>>>>>>> possibility
>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>> > > >>>>>>>>> help
>> > me
>> > > >>>>>> would
>> > > >>>>>>>> be
>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> > > >>>>>>>>> Also,
>> > what
>> > > >>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>> > > >>>>>>>>> client
>> > > >>>>>>>>> block
>> > > >>>>>>>> till
>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>> > > >>>>>>>>> b)
>> > or
>> > > >>>>>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>> > > >>>>>>>>> the
>> > > >>>>>>>>> later
>> > > >>>>>>>>> how
>> > > >>>>>>>>> is
>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
>> > could
>> > > >>>> be
>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>> > > >>>>>>>>> by a
>> > > >>>>>>>>> function/util?
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> Best,
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > li.davidm96@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> Hi all,
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>> > > >>>>>>>>>> have
>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>> > > >>>>>>>>>> both
>> > > >>>>>>>>>> send
>> > > >>>>>>>>>> and
>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
>> > I've
>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>
>> > > >>>>
>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> > document.
>> > > >>>>>>>>>> I'd
>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> > > >>>>>>>>>> straightforward
>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> > > >>>>>>>>>> decided
>> > > >>>>>>>>>> to
>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as
>> > > >>>>>>>>>> it
>> > > >>>>>> doesn't
>> > > >>>>>>>>>> functionally change Flight, just the interpretation of the
>> > > >>>>>>>>>> semantics.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Thanks,
>> > > >>>>>>>>>> David
>> > > >>>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> --
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > > >>>>>>>>>
>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> > > >>>>>>>>>
>> > > >>>>>>>>> <https://www.dremio.com/>
>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>> > > >>>>>>>>> our
>> > > >>>>>>>>> community
>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > > >>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >
>> >
>