You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by David Li <li...@gmail.com> on 2020/03/18 14:05:57 UTC

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Following up here, I've submitted a draft implementation for C++:
https://github.com/apache/arrow/pull/6656

The core functionality is there, but there are still holes that I need
to implement. Compared to the draft spec, the client also sends a
FlightDescriptor to begin with, though it's currently not exposed.
This provides consistency with DoGet/DoPut which also send a message
to begin with to describe the stream to the server.

Andy, I hope this helps clarify whether it meets your needs.

Best,
David

On 2/25/20, David Li <li...@gmail.com> wrote:
> Hey Andy,
>
> I've been rather busy unfortunately. I had started on an
> implementation in C++ to provide as part of this discussion, but it's
> not complete. I'm hoping to have more done in March.
>
> Best,
> David
>
> On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> I was wondering if there had been any momentum on this (the BiDirectional
>> RPC design)?
>>
>> I'm interested in this for the use case of Apache Spark sending a stream
>> of
>> data to another process to invoke custom code and then receive a stream
>> back with the transformed data.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> I support moving forward with the current proposal.
>>>
>>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
>>>
>>> > Just following up here again, any other thoughts?
>>> >
>>> > I think we do have justifications for potentially separate streams in
>>> > a call, but that's more of an orthogonal question - it doesn't need to
>>> > be addressed here. I do agree that it very much complicates things.
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>>> > > I would generally agree with this. Note that you have the
>>> > > possibility
>>> > > to use unions-of-structs to send record batches with different
>>> > > schemas
>>> > > in the same stream, though with some added complexity on each side
>>> > >
>>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >>
>>> > >> I'd vote for explicitly not supported. We should keep our
>>> > >> primitives
>>> > >> narrow.
>>> > >>
>>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Thanks for the feedback.
>>> > >> >
>>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>>> > >> > there are a lot of places where things could be made more
>>> > >> > ergonomic,
>>> > >> > including with the metadata fields. But it would also have locked
>>> out
>>> > >> > us of potential future transports.
>>> > >> >
>>> > >> > On another note: I hesitate to put too much into this method, but
>>> > >> > we
>>> > >> > are looking at use cases where potentially, a client may want to
>>> > >> > upload multiple distinct datasets (with differing schemas). (This
>>> is a
>>> > >> > little tentative, and I can get more details...) Right now, each
>>> > >> > logical stream in Flight must have a single, consistent schema;
>>> would
>>> > >> > it make sense to look at ways to relax this, or declare this
>>> > >> > explicitly out of scope (and require multiple calls and
>>> > >> > coordination
>>> > >> > with the deployment topology) in order to accomplish this?
>>> > >> >
>>> > >> > Best,
>>> > >> > David
>>> > >> >
>>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>>> looks
>>> > >> > > good
>>> > >> > > to me.
>>> > >> > >
>>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>>> > >> > > <li...@gmail.com>
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > >> I've updated the proposal.
>>> > >> > >>
>>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>>> > >> > >> errors/metadata, I still think using bytes is preferable:
>>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>>> Protobuf
>>> > >> > types,
>>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>>> > >> > >> to
>>> > >> > >> C++
>>> > >> > >> users without causing build pains,
>>> > >> > >> - We can't let Python users take advantage of the Protobuf
>>> > >> > >> field
>>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>>> > >> > >> linking
>>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>>> into
>>> > >> > >> the Python ones),
>>> > >> > >> - All our other application-defined fields are already bytes.
>>> > >> > >>
>>> > >> > >> Applications that want structure can encode JSON or Protobuf
>>> > >> > >> Any
>>> > >> > >> into
>>> > >> > >> the bytes field themselves, much as you can already do for
>>> Ticket,
>>> > >> > >> commands in FlightDescriptors, and application metadata in
>>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>>> using
>>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>>> > >> > >> and
>>> > must
>>> > >> > >> invoke the Protobuf deserializer again to read the actual
>>> message.
>>> > >> > >>
>>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>>> > >> > >> to
>>> > >> > >> define a new message with a oneof either, since it would be
>>> > >> > >> redundant.
>>> > >> > >>
>>> > >> > >> Thanks,
>>> > >> > >> David
>>> > >> > >>
>>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>>> > >> > >> > I've been extremely backlogged, I will update the proposal
>>> when I
>>> > >> > >> > get
>>> > >> > >> > a chance and reply here when done.
>>> > >> > >> >
>>> > >> > >> > Best,
>>> > >> > >> > David
>>> > >> > >> >
>>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>>> > >> > >> >> Bumping this discussion since a couple of weeks have
>>> > >> > >> >> passed.
>>> It
>>> > >> > >> >> seems
>>> > >> > >> >> there are still some questions here, could we summarize
>>> > >> > >> >> what
>>> are
>>> > >> > >> >> the
>>> > >> > >> >> alternatives along with any public API implications so we
>>> > >> > >> >> can
>>> > try
>>> > >> > >> >> to
>>> > >> > >> >> render a decision?
>>> > >> > >> >>
>>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>>> li.davidm96@gmail.com
>>> > >
>>> > >> > >> >> wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> Hi Wes,
>>> > >> > >> >>>
>>> > >> > >> >>> Responses inline:
>>> > >> > >> >>>
>>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>>> wesmckinn@gmail.com>
>>> > >> > wrote:
>>> > >> > >> >>>
>>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>>> > >> > >> >>> > <li...@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > The question is whether to repurpose the existing
>>> > FlightData
>>> > >> > >> >>> > > structure, and allow for the metadata field to be
>>> > >> > >> >>> > > filled
>>> in
>>> > >> > >> >>> > > and
>>> > >> > >> data
>>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
>>> > >> > >> >>> > > the
>>> > >> > >> FlightData
>>> > >> > >> >>> > > structure in another structure that explicitly
>>> > distinguishes
>>> > >> > >> between
>>> > >> > >> >>> > > control and data messages.
>>> > >> > >> >>> >
>>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>>> > >> > >> >>> > with
>>> > >> > >> >>> > empty
>>> > >> > >> body.
>>> > >> > >> >>> > One question to consider is what changes (if any) would
>>> need
>>> > to
>>> > >> > >> >>> > be
>>> > >> > >> >>> > made to public APIs in either scenario.
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
>>> data
>>> > >> > >> >>> messages
>>> > >> > >> >>> in
>>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
>>> > change
>>> > >> > >> >>> the
>>> > >> > >> >>> wire
>>> > >> > >> >>> format. I think the APIs could be changed backwards
>>> compatibly,
>>> > >> > >> >>> though.
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > The other question is how to handle the metadata
>>> > >> > >> >>> > > fields.
>>> So
>>> > >> > >> >>> > > far,
>>> > >> > >> >>> > > we've
>>> > >> > >> >>> > > used bytestring fields for application-defined data.
>>> > >> > >> >>> > > This
>>> > is
>>> > >> > >> >>> > > workable
>>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
>>> those
>>> > >> > >> >>> > > fields,
>>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>>> > >> > >> >>> > > into/from
>>> the
>>> > >> > >> >>> > > bytestring
>>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>>> > >> > >> >>> > > dynamically
>>> > >> > >> >>> > > typed
>>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
>>> > >> > >> >>> > > exposing
>>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>>> > >> > >> >>> > > combination
>>> of
>>> > >> > >> >>> > > a
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>>> > >> > >> >>> > > Protobuf
>>> > >> > >> >>> > > Any
>>> > >> > >> >>> > > type
>>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
>>> > cleaner
>>> > >> > >> >>> > > in
>>> > >> > >> any
>>> > >> > >> >>> > > of the language APIs, though.
>>> > >> > >> >>> >
>>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
>>> > >> > >> >>> > particular
>>> > >> > >> >>> > Flight implementation seems on first principles like the
>>> most
>>> > >> > >> flexible
>>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>>> > >> > >> >>> > _must_
>>> be
>>> > a
>>> > >> > >> >>> > protobuf?
>>> > >> > >> >>> >
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
>>> > >> > >> >>> but
>>> it
>>> > >> > would
>>> > >> > >> >>> have
>>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>>> > >> > >> >>> Then
>>> > the
>>> > >> > >> >>> question
>>> > >> > >> >>> would just be how to expose this (would be easier in Java,
>>> > harder
>>> > >> > >> >>> in
>>> > >> > >> >>> C++).
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> > > David
>>> > >> > >> >>> > >
>>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>>> > >> > >> >>> > > wrote:
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>>> > >> > >> >>> > > > non-protobuf
>>> > >> > >> >>> > > > terms?
>>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
>>> > Flight.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Regards
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Antoine.
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > >
>>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
>>> > would
>>> > >> > just
>>> > >> > >> be
>>> > >> > >> >>> > > >> application-level logic. (The official guide
>>> > >> > >> >>> > > >> doesn't
>>> > even
>>> > >> > >> mention
>>> > >> > >> >>> > > >> it
>>> > >> > >> >>> > > >> in the encoding docs; I found
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > >> > >> >>> > > >> as well.)
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>>> > >> > >> >>> > > >> essentially
>>> > >> > >> >>> > > >> inlining
>>> > >> > >> >>> > > >> the definition of Any, e.g.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightMessage {
>>> > >> > >> >>> > > >>   oneof message {
>>> > >> > >> >>> > > >>     FlightData data = 1;
>>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>>> > >> > >> >>> > > >>   }
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> message FlightAny {
>>> > >> > >> >>> > > >>   string type = 1;
>>> > >> > >> >>> > > >>   bytes data = 2;
>>> > >> > >> >>> > > >> }
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Is this correct?
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>>> > >> > >> >>> > > >> for
>>> > >> > >> >>> > > >> DoGet/DoPut
>>> > >> > >> >>> > > >> as
>>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>>> > >> > >> >>> > > >> consistent
>>> > with
>>> > >> > >> >>> > > >> all
>>> > >> > >> >>> > > >> of
>>> > >> > >> >>> > > >> them,
>>> > >> > >> >>> > > >> rather than have one of the three methods do its
>>> > >> > >> >>> > > >> own
>>> > >> > >> >>> > > >> thing.
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> Thanks,
>>> > >> > >> >>> > > >> David
>>> > >> > >> >>> > > >>
>>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>>> > >> > >> >>> > > >>> behavior
>>> > >> > >> >>> > > >>> without
>>> > >> > >> >>> > > >>> exposing
>>> > >> > >> >>> > the
>>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
>>> could
>>> > >> > >> >>> > > >>> expose
>>> > >> > >> >>> > > >>> as
>>> > >> > >> >>> > > >>> two
>>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
>>> > >> > >> >>> > > >>> whatever
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> people
>>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>>> > >> > >> >>> > > >>> (Basically
>>> a
>>> > >> > >> >>> > > >>> user
>>> > >> > >> >>> > > >>> could
>>> > >> > >> >>> > > >>> use
>>> > >> > >> >>> > > >>> any
>>> > >> > >> >>> > > >>> with it easily but they could also use any other
>>> > >> > >> >>> > > >>> mechanism).
>>> > >> > >> >>> > > >>> At
>>> > >> > >> >>> > least in
>>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
>>> other
>>> > >> > >> language
>>> > >> > >> >>> > > >>> bindings
>>> > >> > >> >>> > > >>> less diy?
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
>>> > >> > >> >>> > > >>> metadata
>>> > >> > >> >>> > > >>> but
>>> > >> > >> >>> > > >>> it
>>> > >> > >> >>> > just
>>> > >> > >> >>> > > >>> seemed a bit janky.
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
>>> > thing,
>>> > >> > >> >>> > > >>> I
>>> > >> > >> >>> > > >>> wonder
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > we
>>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
>>> > property
>>> > >> > >> >>> > > >>> if
>>> > >> > >> >>> > > >>> we
>>> > >> > >> >>> > think it
>>> > >> > >> >>> > > >>> is
>>> > >> > >> >>> > > >>> a good idea...
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>>> > >> > >> li.davidm96@gmail.com>
>>> > >> > >> >>> > wrote:
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >>>> I was definitely considering having control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > without
>>> > >> > >> >>> > > >>>> data,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
>>> > >> > >> >>> > > >>>> with
>>> > >> > >> >>> > > >>>> only
>>> > >> > >> >>> > app_metadata
>>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>>> > FlightData
>>> > >> > >> >>> > > >>>> should
>>> > >> > >> >>> > always
>>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>>> > >> > >> >>> > > >>>> the
>>> > >> > >> >>> > > >>>> documentation
>>> > >> > >> >>> > > >>>> on
>>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>>> > >> > >> >>> > > >>>> about
>>> > >> > >> >>> > > >>>> having
>>> > >> > >> >>> > > >>>> a
>>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
>>> from
>>> > >> > >> >>> > > >>>> happening,
>>> > >> > >> >>> > and
>>> > >> > >> >>> > > >>>> overall having a clear separation between data
>>> > >> > >> >>> > > >>>> and
>>> > >> > >> >>> > > >>>> control
>>> > >> > >> >>> > > >>>> messages
>>> > >> > >> >>> > is
>>> > >> > >> >>> > > >>>> cleaner.
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>>> > >> > >> >>> > > >>>> refrained
>>> > >> > >> >>> > > >>>> from
>>> > >> > >> >>> > > >>>> exposing
>>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
>>> that
>>> > >> > >> >>> > > >>>> now?
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> Best,
>>> > >> > >> >>> > > >>>> David
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
>>> > wrote:
>>> > >> > >> >>> > > >>>>> Hey David,
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
>>> > >> > >> >>> > > >>>>> use
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> doget/doput
>>> > >> > >> >>> > > >>>>> for
>>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
>>> > async
>>> > >> > >> >>> > > >>>>> always
>>> > >> > >> >>> > pattern.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
>>> > >> > >> >>> > > >>>>> is
>>> > >> > >> >>> > > >>>>> overloading
>>> > >> > >> >>> > > >>>>> the
>>> > >> > >> >>> > > >>>> message
>>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>>> control
>>> > >> > >> >>> > > >>>>> message
>>> > >> > >> >>> > > >>>> independently
>>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>>> something
>>> > >> > >> >>> > > >>>>> like
>>> > >> > >> >>> > > >>>>> an
>>> > >> > >> >>> > > >>>>> empty
>>> > >> > >> >>> > > >>>> flight
>>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
>>> > support
>>> > >> > >> >>> > > >>>>> a
>>> > >> > >> >>> > > >>>>> container
>>> > >> > >> >>> > > >>>>> object
>>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
>>> users
>>> > >> > >> >>> > > >>>>> can
>>> > >> > >> >>> > > >>>>> add
>>> > >> > >> >>> > > >>>>> more
>>> > >> > >> >>> > data
>>> > >> > >> >>> > > >>>>> as
>>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>>> > >> > >> >>> > > >>>>> the
>>> Any
>>> > >> > >> >>> > > >>>>> messages.
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>>> wrote:
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>>> Hi Jacques,
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
>>> result
>>> > >> > >> >>> > > >>>>>> of
>>> > >> > >> prior
>>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>>> > >> > >> >>> > > >>>>>> message
>>> to
>>> > >> > carry
>>> > >> > >> >>> > > >>>>>> that
>>> > >> > >> >>> > kind
>>> > >> > >> >>> > > >>>>>> of information.
>>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
>>> > handle
>>> > >> > >> >>> > > >>>>>> incoming
>>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>>> > >> > >> >>> > > >>>>>> actually
>>> > be
>>> > >> > >> >>> > > >>>>>> quite
>>> > >> > >> >>> > natural
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
>>> not
>>> > >> > >> >>> > > >>>>>> possible
>>> > >> > >> >>> > > >>>>>> in
>>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
>>> > >> > >> >>> > > >>>>>> (In
>>> > >> > essence,
>>> > >> > >> >>> > gRPC-Java
>>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
>>> There
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > experimental
>>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
>>> > >> > >> >>> > > >>>>>> to
>>> > >> > >> >>> > > >>>>>> Java,
>>> > >> > >> >>> > > >>>>>> but
>>> > >> > >> >>> > > >>>>>> those
>>> > >> > >> >>> > > >>>>>> are
>>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
>>> > still
>>> > >> > >> >>> > > >>>>>> under
>>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
>>> which
>>> > >> > >> >>> > > >>>>>> have
>>> > >> > >> been
>>> > >> > >> >>> > around
>>> > >> > >> >>> > > >>>>>> for quite a while).
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>> David
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>>> > >> > >> >>> > > >>>>>>> Might
>>> > >> > >> >>> > > >>>>>>> worth
>>> > >> > >> >>> > > >>>>>>> discussion
>>> > >> > >> >>> > > >>>>>>> here
>>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>>> > >> > >> >>> > > >>>> wrote:
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>>> provide a
>>> > >> > >> >>> > > >>>>>>>> Python
>>> > >> > >> >>> > strawman.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
>>> > could
>>> > >> > >> >>> > > >>>>>>>> interleave
>>> > >> > >> >>> > > >>>> uploads
>>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
>>> now,
>>> > >> > >> >>> > > >>>>>>>> synchronous
>>> > >> > >> >>> > APIs
>>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>>> > >> > >> >>> > > >>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>> server
>>> > >> > >> >>> > > >>>>>>>> wait
>>> > >> > >> >>> > > >>>>>>>> for
>>> > >> > >> >>> > each
>>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
>>> > >> > >> >>> > > >>>>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>> give
>>> > >> > >> >>> > > >>>>>>>> us
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>>> overall
>>> > >> > >> >>> > > >>>>>>>> timeout.)
>>> > >> > >> >>> > > >>>>>>>> As
>>> > >> > >> >>> > an
>>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>>> > >> > >> >>> > > >>>>>>>>
>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
>>> > >> > >> >>> > > >>>>>>>> we
>>> > >> > >> >>> > > >>>>>>>> will
>>> > >> > >> >>> > > >>>>>>>> want
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > design
>>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>>> > >> > bidirectional
>>> > >> > >> >>> > > >>>>>>>> stream
>>> > >> > >> >>> > > >>>>>>>> like
>>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>>> > >> > >> >>> > > >>>>>>>> pitfalls
>>> > >> > >> >>> > > >>>>>>>> easier
>>> > >> > >> >>> > > >>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>> run
>>> > >> > >> >>> > into.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>>> > >> > >> >>> > > >>>>>>>> proposal,
>>> > >> > but
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > main
>>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
>>> > >> > >> >>> > > >>>>>>>> two
>>> > >> > >> >>> > > >>>>>>>> separate
>>> > >> > >> >>> > > >>>>>>>> calls
>>> > >> > >> >>> > > >>>>>>>> could
>>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>>> > >> > >> >>> > > >>>>>>>> Additionally,
>>> > >> > >> >>> > > >>>>>>>> gRPC
>>> > >> > >> >>> > > >>>>>>>> has
>>> > >> > >> >>> > > >>>>>>>> some
>>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>>> > >> > >> >>> > > >>>>>>>> away
>>> in
>>> > >> > >> >>> > > >>>>>>>> between
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > two
>>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>>> > >> > >> >>> > > >>>>>>>> another
>>> > >> > instance
>>> > >> > >> >>> > available,
>>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
>>> > server
>>> > >> > >> without
>>> > >> > >> >>> > > >>>>>>>> warning.
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
>>> > wrote:
>>> > >> > >> >>> > > >>>>>>>>> Hey David,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
>>> > >> > >> >>> > > >>>>>>>>> I
>>> > like
>>> > >> > >> >>> > > >>>>>>>>> it
>>> > >> > >> >>> > > >>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> possibility
>>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>>> > >> > >> >>> > > >>>>>>>>> thing
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>>> help
>>> > >> > >> >>> > me
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> be
>>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
>>> use
>>> > >> > >> >>> > > >>>>>>>>> case.
>>> > >> > >> >>> > > >>>>>>>>> Also,
>>> > >> > >> >>> > what
>>> > >> > >> >>> > > >>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>>> > >> > >> >>> > > >>>>>>>>> asyc?
>>> > >> > >> >>> > > >>>>>>>>> Would
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client
>>> > >> > >> >>> > > >>>>>>>>> block
>>> > >> > >> >>> > > >>>>>>>> till
>>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>>> > >> > >> flight.vector_mult(a,
>>> > >> > >> >>> > > >>>>>>>>> b)
>>> > >> > >> >>> > or
>>> > >> > >> >>> > > >>>>>>>>> would
>>> > >> > >> >>> > > >>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
>>> > >> > >> >>> > > >>>>>>>>> was
>>> > >> > >> >>> > > >>>>>>>>> done.
>>> > >> > >> >>> > > >>>>>>>>> If
>>> > >> > >> >>> > > >>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>> later
>>> > >> > >> >>> > > >>>>>>>>> how
>>> > >> > >> >>> > > >>>>>>>>> is
>>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>>> suppose
>>> > >> > >> >>> > > >>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>> this
>>> > >> > >> >>> > could
>>> > >> > >> >>> > > >>>> be
>>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>>> > >> > >> >>> > > >>>>>>>>> interface
>>> > >> > >> >>> > > >>>>>>>>> but
>>> > >> > >> rather
>>> > >> > >> >>> > > >>>>>>>>> by a
>>> > >> > >> >>> > > >>>>>>>>> function/util?
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Best,
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > >> > >> >>> > li.davidm96@gmail.com>
>>> > >> > >> >>> > > >>>>>> wrote:
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
>>> > >> > >> >>> > > >>>>>>>>>> so
>>> > >> > >> >>> > > >>>>>>>>>> far,
>>> > >> > but
>>> > >> > >> we
>>> > >> > >> >>> > > >>>>>>>>>> have
>>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
>>> being
>>> > >> > >> >>> > > >>>>>>>>>> able
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> both
>>> > >> > >> >>> > > >>>>>>>>>> send
>>> > >> > >> >>> > > >>>>>>>>>> and
>>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>>> > >> > >> >>> > > >>>>>>>>>> call.
>>> To
>>> > >> > >> >>> > > >>>>>>>>>> that
>>> > >> > >> >>> > > >>>>>>>>>> end,
>>> > >> > >> >>> > I've
>>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> >
>>> > >> > >>
>>> > >> >
>>> >
>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>>> comment
>>> > >> > >> >>> > > >>>>>>>>>> on
>>> > >> > the
>>> > >> > >> >>> > document.
>>> > >> > >> >>> > > >>>>>>>>>> I'd
>>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
>>> > >> > >> >>> > > >>>>>>>>>> relatively
>>> > >> > >> >>> > > >>>>>>>>>> straightforward
>>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
>>> > vote.
>>> > >> > I've
>>> > >> > >> >>> > > >>>>>>>>>> decided
>>> > >> > >> >>> > > >>>>>>>>>> to
>>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>>> > >> > >> >>> > > >>>>>>>>>> proposed
>>> (on
>>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>>> > >> > >> >>> > > >>>>>>>>>> as
>>> > >> > >> >>> > > >>>>>>>>>> it
>>> > >> > >> >>> > > >>>>>> doesn't
>>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>>> > >> > >> >>> > > >>>>>>>>>> interpretation
>>> > >> > of
>>> > >> > >> >>> > > >>>>>>>>>> the
>>> > >> > >> >>> > > >>>>>>>>>> semantics.
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>>> > >> > >> >>> > > >>>>>>>>>> David
>>> > >> > >> >>> > > >>>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> --
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>>> > >> > >> join
>>> > >> > >> >>> > > >>>>>>>>> our
>>> > >> > >> >>> > > >>>>>>>>> community
>>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>>> Download
>>> > >> > Dremio
>>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > >> > >> >>> > > >>>>>>>>>
>>> > >> > >> >>> > > >>>>>>>>
>>> > >> > >> >>> > > >>>>>>>
>>> > >> > >> >>> > > >>>>>>
>>> > >> > >> >>> > > >>>>>
>>> > >> > >> >>> > > >>>>
>>> > >> > >> >>> > > >>>
>>> > >> > >> >>> > > >
>>> > >> > >> >>> >
>>> > >> > >> >>
>>> > >> > >> >
>>> > >> > >>
>>> > >> > >
>>> > >> >
>>> > >
>>> >
>>>
>>
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
Looks like there is consensus about this. I'll start a vote about the
format change soon if no further comments.

On Mon, Mar 23, 2020 at 7:41 AM David Li <li...@gmail.com> wrote:
>
> Hey Wes,
>
> Thanks for the review. I've broken out the format change into this PR:
> https://github.com/apache/arrow/pull/6686
>
> Best,
> David
>
> On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> > hi David,
> >
> > I did a preliminary view and things look to be on the right track
> > there. What do you think about breaking out the protocol changes (and
> > adding appropriate comments) so we can have a vote on that in
> > relatively short order?
> >
> > - Wes
> >
> > On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
> >>
> >> Following up here, I've submitted a draft implementation for C++:
> >> https://github.com/apache/arrow/pull/6656
> >>
> >> The core functionality is there, but there are still holes that I need
> >> to implement. Compared to the draft spec, the client also sends a
> >> FlightDescriptor to begin with, though it's currently not exposed.
> >> This provides consistency with DoGet/DoPut which also send a message
> >> to begin with to describe the stream to the server.
> >>
> >> Andy, I hope this helps clarify whether it meets your needs.
> >>
> >> Best,
> >> David
> >>
> >> On 2/25/20, David Li <li...@gmail.com> wrote:
> >> > Hey Andy,
> >> >
> >> > I've been rather busy unfortunately. I had started on an
> >> > implementation in C++ to provide as part of this discussion, but it's
> >> > not complete. I'm hoping to have more done in March.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> >> I was wondering if there had been any momentum on this (the
> >> >> BiDirectional
> >> >> RPC design)?
> >> >>
> >> >> I'm interested in this for the use case of Apache Spark sending a
> >> >> stream
> >> >> of
> >> >> data to another process to invoke custom code and then receive a
> >> >> stream
> >> >> back with the transformed data.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Andy.
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> >> wrote:
> >> >>
> >> >>> I support moving forward with the current proposal.
> >> >>>
> >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Just following up here again, any other thoughts?
> >> >>> >
> >> >>> > I think we do have justifications for potentially separate streams
> >> >>> > in
> >> >>> > a call, but that's more of an orthogonal question - it doesn't need
> >> >>> > to
> >> >>> > be addressed here. I do agree that it very much complicates things.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > David
> >> >>> >
> >> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > > I would generally agree with this. Note that you have the
> >> >>> > > possibility
> >> >>> > > to use unions-of-structs to send record batches with different
> >> >>> > > schemas
> >> >>> > > in the same stream, though with some added complexity on each
> >> >>> > > side
> >> >>> > >
> >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
> >> >>> > > <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >>
> >> >>> > >> I'd vote for explicitly not supported. We should keep our
> >> >>> > >> primitives
> >> >>> > >> narrow.
> >> >>> > >>
> >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > Thanks for the feedback.
> >> >>> > >> >
> >> >>> > >> > I do think if we had explicitly embraced gRPC from the
> >> >>> > >> > beginning,
> >> >>> > >> > there are a lot of places where things could be made more
> >> >>> > >> > ergonomic,
> >> >>> > >> > including with the metadata fields. But it would also have
> >> >>> > >> > locked
> >> >>> out
> >> >>> > >> > us of potential future transports.
> >> >>> > >> >
> >> >>> > >> > On another note: I hesitate to put too much into this method,
> >> >>> > >> > but
> >> >>> > >> > we
> >> >>> > >> > are looking at use cases where potentially, a client may want
> >> >>> > >> > to
> >> >>> > >> > upload multiple distinct datasets (with differing schemas).
> >> >>> > >> > (This
> >> >>> is a
> >> >>> > >> > little tentative, and I can get more details...) Right now,
> >> >>> > >> > each
> >> >>> > >> > logical stream in Flight must have a single, consistent
> >> >>> > >> > schema;
> >> >>> would
> >> >>> > >> > it make sense to look at ways to relax this, or declare this
> >> >>> > >> > explicitly out of scope (and require multiple calls and
> >> >>> > >> > coordination
> >> >>> > >> > with the deployment topology) in order to accomplish this?
> >> >>> > >> >
> >> >>> > >> > Best,
> >> >>> > >> > David
> >> >>> > >> >
> >> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
> >> >>> > >> > > proposal
> >> >>> looks
> >> >>> > >> > > good
> >> >>> > >> > > to me.
> >> >>> > >> > >
> >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >> >>> > >> > > <li...@gmail.com>
> >> >>> > >> > > wrote:
> >> >>> > >> > >
> >> >>> > >> > >> I've updated the proposal.
> >> >>> > >> > >>
> >> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >> >>> Protobuf
> >> >>> > >> > types,
> >> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> to
> >> >>> > >> > >> C++
> >> >>> > >> > >> users without causing build pains,
> >> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >> >>> > >> > >> field
> >> >>> > >> > >> without somehow being compatible with the Protobuf wheels
> >> >>> > >> > >> (by
> >> >>> > >> > >> linking
> >> >>> > >> > >> to the same version, and doing magic to turn the C++
> >> >>> > >> > >> Protobufs
> >> >>> into
> >> >>> > >> > >> the Python ones),
> >> >>> > >> > >> - All our other application-defined fields are already
> >> >>> > >> > >> bytes.
> >> >>> > >> > >>
> >> >>> > >> > >> Applications that want structure can encode JSON or
> >> >>> > >> > >> Protobuf
> >> >>> > >> > >> Any
> >> >>> > >> > >> into
> >> >>> > >> > >> the bytes field themselves, much as you can already do for
> >> >>> Ticket,
> >> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
> >> >>> > >> > >> than
> >> >>> using
> >> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >> >>> > >> > >> and
> >> >>> > must
> >> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >> >>> message.
> >> >>> > >> > >>
> >> >>> > >> > >> If we decide on using bytes, then I don't think it makes
> >> >>> > >> > >> sense
> >> >>> > >> > >> to
> >> >>> > >> > >> define a new message with a oneof either, since it would be
> >> >>> > >> > >> redundant.
> >> >>> > >> > >>
> >> >>> > >> > >> Thanks,
> >> >>> > >> > >> David
> >> >>> > >> > >>
> >> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >> >>> > >> > >> > I've been extremely backlogged, I will update the
> >> >>> > >> > >> > proposal
> >> >>> when I
> >> >>> > >> > >> > get
> >> >>> > >> > >> > a chance and reply here when done.
> >> >>> > >> > >> >
> >> >>> > >> > >> > Best,
> >> >>> > >> > >> > David
> >> >>> > >> > >> >
> >> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >> >>> > >> > >> >> passed.
> >> >>> It
> >> >>> > >> > >> >> seems
> >> >>> > >> > >> >> there are still some questions here, could we summarize
> >> >>> > >> > >> >> what
> >> >>> are
> >> >>> > >> > >> >> the
> >> >>> > >> > >> >> alternatives along with any public API implications so
> >> >>> > >> > >> >> we
> >> >>> > >> > >> >> can
> >> >>> > try
> >> >>> > >> > >> >> to
> >> >>> > >> > >> >> render a decision?
> >> >>> > >> > >> >>
> >> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >> >>> li.davidm96@gmail.com
> >> >>> > >
> >> >>> > >> > >> >> wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Hi Wes,
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> Responses inline:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >> >>> wesmckinn@gmail.com>
> >> >>> > >> > wrote:
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> >>> > >> > >> >>> > <li...@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >> >>> > >> > >> >>> > > filled
> >> >>> in
> >> >>> > >> > >> >>> > > and
> >> >>> > >> > >> data
> >> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
> >> >>> > >> > >> >>> > > wrap
> >> >>> > >> > >> >>> > > the
> >> >>> > >> > >> FlightData
> >> >>> > >> > >> >>> > > structure in another structure that explicitly
> >> >>> > distinguishes
> >> >>> > >> > >> between
> >> >>> > >> > >> >>> > > control and data messages.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >> >>> > >> > >> >>> > with
> >> >>> > >> > >> >>> > empty
> >> >>> > >> > >> body.
> >> >>> > >> > >> >>> > One question to consider is what changes (if any)
> >> >>> > >> > >> >>> > would
> >> >>> need
> >> >>> > to
> >> >>> > >> > >> >>> > be
> >> >>> > >> > >> >>> > made to public APIs in either scenario.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
> >> >>> > >> > >> >>> empty
> >> >>> data
> >> >>> > >> > >> >>> messages
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> the future. This would be a breaking change, but
> >> >>> > >> > >> >>> wouldn't
> >> >>> > change
> >> >>> > >> > >> >>> the
> >> >>> > >> > >> >>> wire
> >> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >> >>> compatibly,
> >> >>> > >> > >> >>> though.
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >> >>> > >> > >> >>> > > fields.
> >> >>> So
> >> >>> > >> > >> >>> > > far,
> >> >>> > >> > >> >>> > > we've
> >> >>> > >> > >> >>> > > used bytestring fields for application-defined
> >> >>> > >> > >> >>> > > data.
> >> >>> > >> > >> >>> > > This
> >> >>> > is
> >> >>> > >> > >> >>> > > workable
> >> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
> >> >>> > >> > >> >>> > > of
> >> >>> those
> >> >>> > >> > >> >>> > > fields,
> >> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >> >>> > >> > >> >>> > > into/from
> >> >>> the
> >> >>> > >> > >> >>> > > bytestring
> >> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >> >>> > >> > >> >>> > > dynamically
> >> >>> > >> > >> >>> > > typed
> >> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
> >> >>> > >> > >> >>> > > be
> >> >>> > >> > >> >>> > > exposing
> >> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >> >>> > >> > >> >>> > > combination
> >> >>> of
> >> >>> > >> > >> >>> > > a
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >> >>> > >> > >> >>> > > Protobuf
> >> >>> > >> > >> >>> > > Any
> >> >>> > >> > >> >>> > > type
> >> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
> >> >>> > >> > >> >>> > > actually
> >> >>> > cleaner
> >> >>> > >> > >> >>> > > in
> >> >>> > >> > >> any
> >> >>> > >> > >> >>> > > of the language APIs, though.
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > particular
> >> >>> > >> > >> >>> > Flight implementation seems on first principles like
> >> >>> > >> > >> >>> > the
> >> >>> most
> >> >>> > >> > >> flexible
> >> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >> >>> > >> > >> >>> > _must_
> >> >>> be
> >> >>> > a
> >> >>> > >> > >> >>> > protobuf?
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
> >> >>> > >> > >> >>> API,
> >> >>> > >> > >> >>> but
> >> >>> it
> >> >>> > >> > would
> >> >>> > >> > >> >>> have
> >> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >> >>> > >> > >> >>> Then
> >> >>> > the
> >> >>> > >> > >> >>> question
> >> >>> > >> > >> >>> would just be how to expose this (would be easier in
> >> >>> > >> > >> >>> Java,
> >> >>> > harder
> >> >>> > >> > >> >>> in
> >> >>> > >> > >> >>> C++).
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>>
> >> >>> > >> > >> >>> > > David
> >> >>> > >> > >> >>> > >
> >> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >> >>> > >> > >> >>> > > wrote:
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >> >>> > >> > >> >>> > > > non-protobuf
> >> >>> > >> > >> >>> > > > terms?
> >> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
> >> >>> > >> > >> >>> > > > use
> >> >>> > Flight.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Regards
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Antoine.
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
> >> >>> > >> > >> >>> > > >> it
> >> >>> > would
> >> >>> > >> > just
> >> >>> > >> > >> be
> >> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >> >>> > >> > >> >>> > > >> doesn't
> >> >>> > even
> >> >>> > >> > >> mention
> >> >>> > >> > >> >>> > > >> it
> >> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >> >>> > >> > >> >>> > > >> as well.)
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >> >>> > >> > >> >>> > > >> essentially
> >> >>> > >> > >> >>> > > >> inlining
> >> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightMessage {
> >> >>> > >> > >> >>> > > >>   oneof message {
> >> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >> >>> > >> > >> >>> > > >>   }
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> message FlightAny {
> >> >>> > >> > >> >>> > > >>   string type = 1;
> >> >>> > >> > >> >>> > > >>   bytes data = 2;
> >> >>> > >> > >> >>> > > >> }
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Is this correct?
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >> >>> > >> > >> >>> > > >> for
> >> >>> > >> > >> >>> > > >> DoGet/DoPut
> >> >>> > >> > >> >>> > > >> as
> >> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >> >>> > >> > >> >>> > > >> consistent
> >> >>> > with
> >> >>> > >> > >> >>> > > >> all
> >> >>> > >> > >> >>> > > >> of
> >> >>> > >> > >> >>> > > >> them,
> >> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >> >>> > >> > >> >>> > > >> own
> >> >>> > >> > >> >>> > > >> thing.
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> Thanks,
> >> >>> > >> > >> >>> > > >> David
> >> >>> > >> > >> >>> > > >>
> >> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >> >>> wrote:
> >> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >> >>> > >> > >> >>> > > >>> behavior
> >> >>> > >> > >> >>> > > >>> without
> >> >>> > >> > >> >>> > > >>> exposing
> >> >>> > >> > >> >>> > the
> >> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
> >> >>> > >> > >> >>> > > >>> we
> >> >>> could
> >> >>> > >> > >> >>> > > >>> expose
> >> >>> > >> > >> >>> > > >>> as
> >> >>> > >> > >> >>> > > >>> two
> >> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
> >> >>> > >> > >> >>> > > >>> for
> >> >>> > >> > >> >>> > > >>> whatever
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> people
> >> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >> >>> > >> > >> >>> > > >>> (Basically
> >> >>> a
> >> >>> > >> > >> >>> > > >>> user
> >> >>> > >> > >> >>> > > >>> could
> >> >>> > >> > >> >>> > > >>> use
> >> >>> > >> > >> >>> > > >>> any
> >> >>> > >> > >> >>> > > >>> with it easily but they could also use any
> >> >>> > >> > >> >>> > > >>> other
> >> >>> > >> > >> >>> > > >>> mechanism).
> >> >>> > >> > >> >>> > > >>> At
> >> >>> > >> > >> >>> > least in
> >> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
> >> >>> > >> > >> >>> > > >>> Are
> >> >>> other
> >> >>> > >> > >> language
> >> >>> > >> > >> >>> > > >>> bindings
> >> >>> > >> > >> >>> > > >>> less diy?
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
> >> >>> > >> > >> >>> > > >>> +
> >> >>> > >> > >> >>> > > >>> metadata
> >> >>> > >> > >> >>> > > >>> but
> >> >>> > >> > >> >>> > > >>> it
> >> >>> > >> > >> >>> > just
> >> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
> >> >>> > >> > >> >>> > > >>> object
> >> >>> > thing,
> >> >>> > >> > >> >>> > > >>> I
> >> >>> > >> > >> >>> > > >>> wonder
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > we
> >> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
> >> >>> > >> > >> >>> > > >>> same
> >> >>> > property
> >> >>> > >> > >> >>> > > >>> if
> >> >>> > >> > >> >>> > > >>> we
> >> >>> > >> > >> >>> > think it
> >> >>> > >> > >> >>> > > >>> is
> >> >>> > >> > >> >>> > > >>> a good idea...
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >> >>> > >> > >> li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > without
> >> >>> > >> > >> >>> > > >>>> data,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
> >> >>> > >> > >> >>> > > >>>> FlightData
> >> >>> > >> > >> >>> > > >>>> with
> >> >>> > >> > >> >>> > > >>>> only
> >> >>> > >> > >> >>> > app_metadata
> >> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >> >>> > FlightData
> >> >>> > >> > >> >>> > > >>>> should
> >> >>> > >> > >> >>> > always
> >> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >> >>> > >> > >> >>> > > >>>> the
> >> >>> > >> > >> >>> > > >>>> documentation
> >> >>> > >> > >> >>> > > >>>> on
> >> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >> >>> > >> > >> >>> > > >>>> about
> >> >>> > >> > >> >>> > > >>>> having
> >> >>> > >> > >> >>> > > >>>> a
> >> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
> >> >>> > >> > >> >>> > > >>>> that
> >> >>> from
> >> >>> > >> > >> >>> > > >>>> happening,
> >> >>> > >> > >> >>> > and
> >> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >> >>> > >> > >> >>> > > >>>> and
> >> >>> > >> > >> >>> > > >>>> control
> >> >>> > >> > >> >>> > > >>>> messages
> >> >>> > >> > >> >>> > is
> >> >>> > >> > >> >>> > > >>>> cleaner.
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >> >>> > >> > >> >>> > > >>>> refrained
> >> >>> > >> > >> >>> > > >>>> from
> >> >>> > >> > >> >>> > > >>>> exposing
> >> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
> >> >>> > >> > >> >>> > > >>>> change
> >> >>> that
> >> >>> > >> > >> >>> > > >>>> now?
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> Best,
> >> >>> > >> > >> >>> > > >>>> David
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
> >> >>> > >> > >> >>> > > >>>>> we
> >> >>> > >> > >> >>> > > >>>>> use
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> doget/doput
> >> >>> > >> > >> >>> > > >>>>> for
> >> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
> >> >>> > >> > >> >>> > > >>>>> grpc's
> >> >>> > async
> >> >>> > >> > >> >>> > > >>>>> always
> >> >>> > >> > >> >>> > pattern.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
> >> >>> > >> > >> >>> > > >>>>> it
> >> >>> > >> > >> >>> > > >>>>> is
> >> >>> > >> > >> >>> > > >>>>> overloading
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> > >> > >> >>> > > >>>> message
> >> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >> >>> control
> >> >>> > >> > >> >>> > > >>>>> message
> >> >>> > >> > >> >>> > > >>>> independently
> >> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >> >>> something
> >> >>> > >> > >> >>> > > >>>>> like
> >> >>> > >> > >> >>> > > >>>>> an
> >> >>> > >> > >> >>> > > >>>>> empty
> >> >>> > >> > >> >>> > > >>>> flight
> >> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
> >> >>> > >> > >> >>> > > >>>>> not
> >> >>> > support
> >> >>> > >> > >> >>> > > >>>>> a
> >> >>> > >> > >> >>> > > >>>>> container
> >> >>> > >> > >> >>> > > >>>>> object
> >> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
> >> >>> > >> > >> >>> > > >>>>> so
> >> >>> users
> >> >>> > >> > >> >>> > > >>>>> can
> >> >>> > >> > >> >>> > > >>>>> add
> >> >>> > >> > >> >>> > > >>>>> more
> >> >>> > >> > >> >>> > data
> >> >>> > >> > >> >>> > > >>>>> as
> >> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >> >>> > >> > >> >>> > > >>>>> the
> >> >>> Any
> >> >>> > >> > >> >>> > > >>>>> messages.
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
> >> >>> > >> > >> >>> > > >>>>>> a
> >> >>> result
> >> >>> > >> > >> >>> > > >>>>>> of
> >> >>> > >> > >> prior
> >> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >> >>> > >> > >> >>> > > >>>>>> message
> >> >>> to
> >> >>> > >> > carry
> >> >>> > >> > >> >>> > > >>>>>> that
> >> >>> > >> > >> >>> > kind
> >> >>> > >> > >> >>> > > >>>>>> of information.
> >> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > handle
> >> >>> > >> > >> >>> > > >>>>>> incoming
> >> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >> >>> > >> > >> >>> > > >>>>>> actually
> >> >>> > be
> >> >>> > >> > >> >>> > > >>>>>> quite
> >> >>> > >> > >> >>> > natural
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
> >> >>> > >> > >> >>> > > >>>>>> it's
> >> >>> not
> >> >>> > >> > >> >>> > > >>>>>> possible
> >> >>> > >> > >> >>> > > >>>>>> in
> >> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
> >> >>> > >> > >> >>> > > >>>>>> though.
> >> >>> > >> > >> >>> > > >>>>>> (In
> >> >>> > >> > essence,
> >> >>> > >> > >> >>> > gRPC-Java
> >> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
> >> >>> > >> > >> >>> > > >>>>>> sync-always.)
> >> >>> There
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > experimental
> >> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
> >> >>> > >> > >> >>> > > >>>>>> similar
> >> >>> > >> > >> >>> > > >>>>>> to
> >> >>> > >> > >> >>> > > >>>>>> Java,
> >> >>> > >> > >> >>> > > >>>>>> but
> >> >>> > >> > >> >>> > > >>>>>> those
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
> >> >>> > >> > >> >>> > > >>>>>> are
> >> >>> > still
> >> >>> > >> > >> >>> > > >>>>>> under
> >> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
> >> >>> > >> > >> >>> > > >>>>>> APIs
> >> >>> which
> >> >>> > >> > >> >>> > > >>>>>> have
> >> >>> > >> > >> been
> >> >>> > >> > >> >>> > around
> >> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >> >>> > >> > >> >>> > > >>>>>>> Might
> >> >>> > >> > >> >>> > > >>>>>>> worth
> >> >>> > >> > >> >>> > > >>>>>>> discussion
> >> >>> > >> > >> >>> > > >>>>>>> here
> >> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >> >>> > >> > >> >>> > > >>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >> >>> provide a
> >> >>> > >> > >> >>> > > >>>>>>>> Python
> >> >>> > >> > >> >>> > strawman.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
> >> >>> > >> > >> >>> > > >>>>>>>> you
> >> >>> > could
> >> >>> > >> > >> >>> > > >>>>>>>> interleave
> >> >>> > >> > >> >>> > > >>>> uploads
> >> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
> >> >>> > >> > >> >>> > > >>>>>>>> Right
> >> >>> now,
> >> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >> >>> > >> > >> >>> > APIs
> >> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >> >>> > >> > >> >>> > > >>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>> server
> >> >>> > >> > >> >>> > > >>>>>>>> wait
> >> >>> > >> > >> >>> > > >>>>>>>> for
> >> >>> > >> > >> >>> > each
> >> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
> >> >>> > >> > >> >>> > > >>>>>>>> (gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>> give
> >> >>> > >> > >> >>> > > >>>>>>>> us
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >> >>> overall
> >> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >> >>> > >> > >> >>> > > >>>>>>>> As
> >> >>> > >> > >> >>> > an
> >> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
> >> >>> > >> > >> >>> > > >>>>>>>> eventually
> >> >>> > >> > >> >>> > > >>>>>>>> we
> >> >>> > >> > >> >>> > > >>>>>>>> will
> >> >>> > >> > >> >>> > > >>>>>>>> want
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > design
> >> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >> >>> > >> > bidirectional
> >> >>> > >> > >> >>> > > >>>>>>>> stream
> >> >>> > >> > >> >>> > > >>>>>>>> like
> >> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >> >>> > >> > >> >>> > > >>>>>>>> easier
> >> >>> > >> > >> >>> > > >>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>> run
> >> >>> > >> > >> >>> > into.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >> >>> > >> > but
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > main
> >> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
> >> >>> > >> > >> >>> > > >>>>>>>> deploy,
> >> >>> > >> > >> >>> > > >>>>>>>> two
> >> >>> > >> > >> >>> > > >>>>>>>> separate
> >> >>> > >> > >> >>> > > >>>>>>>> calls
> >> >>> > >> > >> >>> > > >>>>>>>> could
> >> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >> >>> > >> > >> >>> > > >>>>>>>> has
> >> >>> > >> > >> >>> > > >>>>>>>> some
> >> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >> >>> > >> > >> >>> > > >>>>>>>> away
> >> >>> in
> >> >>> > >> > >> >>> > > >>>>>>>> between
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > two
> >> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >> >>> > >> > >> >>> > > >>>>>>>> another
> >> >>> > >> > instance
> >> >>> > >> > >> >>> > available,
> >> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
> >> >>> > >> > >> >>> > > >>>>>>>> new
> >> >>> > server
> >> >>> > >> > >> without
> >> >>> > >> > >> >>> > > >>>>>>>> warning.
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
> >> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
> >> >>> > wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
> >> >>> > >> > >> >>> > > >>>>>>>>> sense.
> >> >>> > >> > >> >>> > > >>>>>>>>> I
> >> >>> > like
> >> >>> > >> > >> >>> > > >>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >> >>> > >> > >> >>> > > >>>>>>>>> thing
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>>> help
> >> >>> > >> > >> >>> > me
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
> >> >>> > >> > >> >>> > > >>>>>>>>> life
> >> >>> use
> >> >>> > >> > >> >>> > > >>>>>>>>> case.
> >> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >> >>> > >> > >> >>> > what
> >> >>> > >> > >> >>> > > >>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >> >>> > >> > >> >>> > > >>>>>>>>> Would
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client
> >> >>> > >> > >> >>> > > >>>>>>>>> block
> >> >>> > >> > >> >>> > > >>>>>>>> till
> >> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >> >>> > >> > >> flight.vector_mult(a,
> >> >>> > >> > >> >>> > > >>>>>>>>> b)
> >> >>> > >> > >> >>> > or
> >> >>> > >> > >> >>> > > >>>>>>>>> would
> >> >>> > >> > >> >>> > > >>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
> >> >>> > >> > >> >>> > > >>>>>>>>> computation
> >> >>> > >> > >> >>> > > >>>>>>>>> was
> >> >>> > >> > >> >>> > > >>>>>>>>> done.
> >> >>> > >> > >> >>> > > >>>>>>>>> If
> >> >>> > >> > >> >>> > > >>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>> later
> >> >>> > >> > >> >>> > > >>>>>>>>> how
> >> >>> > >> > >> >>> > > >>>>>>>>> is
> >> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >> >>> suppose
> >> >>> > >> > >> >>> > > >>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>> this
> >> >>> > >> > >> >>> > could
> >> >>> > >> > >> >>> > > >>>> be
> >> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >> >>> > >> > >> >>> > > >>>>>>>>> interface
> >> >>> > >> > >> >>> > > >>>>>>>>> but
> >> >>> > >> > >> rather
> >> >>> > >> > >> >>> > > >>>>>>>>> by a
> >> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
> >> >>> > >> > >> >>> > > >>>>>>>>> <
> >> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >> >>> > >> > >> >>> > > >>>>>> wrote:
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
> >> >>> > >> > >> >>> > > >>>>>>>>>> successfully
> >> >>> > >> > >> >>> > > >>>>>>>>>> so
> >> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >> >>> > >> > but
> >> >>> > >> > >> we
> >> >>> > >> > >> >>> > > >>>>>>>>>> have
> >> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
> >> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
> >> >>> being
> >> >>> > >> > >> >>> > > >>>>>>>>>> able
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> both
> >> >>> > >> > >> >>> > > >>>>>>>>>> send
> >> >>> > >> > >> >>> > > >>>>>>>>>> and
> >> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >> >>> To
> >> >>> > >> > >> >>> > > >>>>>>>>>> that
> >> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >> >>> > >> > >> >>> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
> >> >>> > >> > >> >>> > > >>>>>>>>>> method:
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> >
> >> >>> > >> > >>
> >> >>> > >> >
> >> >>> >
> >> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >> >>> comment
> >> >>> > >> > >> >>> > > >>>>>>>>>> on
> >> >>> > >> > the
> >> >>> > >> > >> >>> > document.
> >> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
> >> >>> > >> > >> >>> > > >>>>>>>>>> a
> >> >>> > vote.
> >> >>> > >> > I've
> >> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >> >>> > >> > >> >>> > > >>>>>>>>>> to
> >> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >> >>> (on
> >> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >> >>> > >> > >> >>> > > >>>>>>>>>> as
> >> >>> > >> > >> >>> > > >>>>>>>>>> it
> >> >>> > >> > >> >>> > > >>>>>> doesn't
> >> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >> >>> > >> > of
> >> >>> > >> > >> >>> > > >>>>>>>>>> the
> >> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >> >>> > >> > >> >>> > > >>>>>>>>>> David
> >> >>> > >> > >> >>> > > >>>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> --
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
> >> >>> > >> > >> >>> > > >>>>>>>>> Engineer
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >> >>> > >> > >> join
> >> >>> > >> > >> >>> > > >>>>>>>>> our
> >> >>> > >> > >> >>> > > >>>>>>>>> community
> >> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >> >>> Download
> >> >>> > >> > Dremio
> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >> >>> > >> > >> >>> > > >>>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>>
> >> >>> > >> > >> >>> > > >>>>>>
> >> >>> > >> > >> >>> > > >>>>>
> >> >>> > >> > >> >>> > > >>>>
> >> >>> > >> > >> >>> > > >>>
> >> >>> > >> > >> >>> > > >
> >> >>> > >> > >> >>> >
> >> >>> > >> > >> >>
> >> >>> > >> > >> >
> >> >>> > >> > >>
> >> >>> > >> > >
> >> >>> > >> >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >
> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by David Li <li...@gmail.com>.
Hey Wes,

Thanks for the review. I've broken out the format change into this PR:
https://github.com/apache/arrow/pull/6686

Best,
David

On 3/22/20, Wes McKinney <we...@gmail.com> wrote:
> hi David,
>
> I did a preliminary view and things look to be on the right track
> there. What do you think about breaking out the protocol changes (and
> adding appropriate comments) so we can have a vote on that in
> relatively short order?
>
> - Wes
>
> On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>>
>> Following up here, I've submitted a draft implementation for C++:
>> https://github.com/apache/arrow/pull/6656
>>
>> The core functionality is there, but there are still holes that I need
>> to implement. Compared to the draft spec, the client also sends a
>> FlightDescriptor to begin with, though it's currently not exposed.
>> This provides consistency with DoGet/DoPut which also send a message
>> to begin with to describe the stream to the server.
>>
>> Andy, I hope this helps clarify whether it meets your needs.
>>
>> Best,
>> David
>>
>> On 2/25/20, David Li <li...@gmail.com> wrote:
>> > Hey Andy,
>> >
>> > I've been rather busy unfortunately. I had started on an
>> > implementation in C++ to provide as part of this discussion, but it's
>> > not complete. I'm hoping to have more done in March.
>> >
>> > Best,
>> > David
>> >
>> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
>> >> I was wondering if there had been any momentum on this (the
>> >> BiDirectional
>> >> RPC design)?
>> >>
>> >> I'm interested in this for the use case of Apache Spark sending a
>> >> stream
>> >> of
>> >> data to another process to invoke custom code and then receive a
>> >> stream
>> >> back with the transformed data.
>> >>
>> >> Thanks,
>> >>
>> >> Andy.
>> >>
>> >>
>> >>
>> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
>> >> wrote:
>> >>
>> >>> I support moving forward with the current proposal.
>> >>>
>> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Just following up here again, any other thoughts?
>> >>> >
>> >>> > I think we do have justifications for potentially separate streams
>> >>> > in
>> >>> > a call, but that's more of an orthogonal question - it doesn't need
>> >>> > to
>> >>> > be addressed here. I do agree that it very much complicates things.
>> >>> >
>> >>> > Thanks,
>> >>> > David
>> >>> >
>> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > > I would generally agree with this. Note that you have the
>> >>> > > possibility
>> >>> > > to use unions-of-structs to send record batches with different
>> >>> > > schemas
>> >>> > > in the same stream, though with some added complexity on each
>> >>> > > side
>> >>> > >
>> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
>> >>> > > <ja...@apache.org>
>> >>> > wrote:
>> >>> > >>
>> >>> > >> I'd vote for explicitly not supported. We should keep our
>> >>> > >> primitives
>> >>> > >> narrow.
>> >>> > >>
>> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
>> >>> > >> wrote:
>> >>> > >>
>> >>> > >> > Thanks for the feedback.
>> >>> > >> >
>> >>> > >> > I do think if we had explicitly embraced gRPC from the
>> >>> > >> > beginning,
>> >>> > >> > there are a lot of places where things could be made more
>> >>> > >> > ergonomic,
>> >>> > >> > including with the metadata fields. But it would also have
>> >>> > >> > locked
>> >>> out
>> >>> > >> > us of potential future transports.
>> >>> > >> >
>> >>> > >> > On another note: I hesitate to put too much into this method,
>> >>> > >> > but
>> >>> > >> > we
>> >>> > >> > are looking at use cases where potentially, a client may want
>> >>> > >> > to
>> >>> > >> > upload multiple distinct datasets (with differing schemas).
>> >>> > >> > (This
>> >>> is a
>> >>> > >> > little tentative, and I can get more details...) Right now,
>> >>> > >> > each
>> >>> > >> > logical stream in Flight must have a single, consistent
>> >>> > >> > schema;
>> >>> would
>> >>> > >> > it make sense to look at ways to relax this, or declare this
>> >>> > >> > explicitly out of scope (and require multiple calls and
>> >>> > >> > coordination
>> >>> > >> > with the deployment topology) in order to accomplish this?
>> >>> > >> >
>> >>> > >> > Best,
>> >>> > >> > David
>> >>> > >> >
>> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
>> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
>> >>> > >> > > proposal
>> >>> looks
>> >>> > >> > > good
>> >>> > >> > > to me.
>> >>> > >> > >
>> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
>> >>> > >> > > <li...@gmail.com>
>> >>> > >> > > wrote:
>> >>> > >> > >
>> >>> > >> > >> I've updated the proposal.
>> >>> > >> > >>
>> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> >>> Protobuf
>> >>> > >> > types,
>> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> to
>> >>> > >> > >> C++
>> >>> > >> > >> users without causing build pains,
>> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> >>> > >> > >> field
>> >>> > >> > >> without somehow being compatible with the Protobuf wheels
>> >>> > >> > >> (by
>> >>> > >> > >> linking
>> >>> > >> > >> to the same version, and doing magic to turn the C++
>> >>> > >> > >> Protobufs
>> >>> into
>> >>> > >> > >> the Python ones),
>> >>> > >> > >> - All our other application-defined fields are already
>> >>> > >> > >> bytes.
>> >>> > >> > >>
>> >>> > >> > >> Applications that want structure can encode JSON or
>> >>> > >> > >> Protobuf
>> >>> > >> > >> Any
>> >>> > >> > >> into
>> >>> > >> > >> the bytes field themselves, much as you can already do for
>> >>> Ticket,
>> >>> > >> > >> commands in FlightDescriptors, and application metadata in
>> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient
>> >>> > >> > >> than
>> >>> using
>> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
>> >>> > >> > >> and
>> >>> > must
>> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> >>> message.
>> >>> > >> > >>
>> >>> > >> > >> If we decide on using bytes, then I don't think it makes
>> >>> > >> > >> sense
>> >>> > >> > >> to
>> >>> > >> > >> define a new message with a oneof either, since it would be
>> >>> > >> > >> redundant.
>> >>> > >> > >>
>> >>> > >> > >> Thanks,
>> >>> > >> > >> David
>> >>> > >> > >>
>> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
>> >>> > >> > >> > I've been extremely backlogged, I will update the
>> >>> > >> > >> > proposal
>> >>> when I
>> >>> > >> > >> > get
>> >>> > >> > >> > a chance and reply here when done.
>> >>> > >> > >> >
>> >>> > >> > >> > Best,
>> >>> > >> > >> > David
>> >>> > >> > >> >
>> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
>> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
>> >>> > >> > >> >> passed.
>> >>> It
>> >>> > >> > >> >> seems
>> >>> > >> > >> >> there are still some questions here, could we summarize
>> >>> > >> > >> >> what
>> >>> are
>> >>> > >> > >> >> the
>> >>> > >> > >> >> alternatives along with any public API implications so
>> >>> > >> > >> >> we
>> >>> > >> > >> >> can
>> >>> > try
>> >>> > >> > >> >> to
>> >>> > >> > >> >> render a decision?
>> >>> > >> > >> >>
>> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
>> >>> li.davidm96@gmail.com
>> >>> > >
>> >>> > >> > >> >> wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Hi Wes,
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> Responses inline:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
>> >>> wesmckinn@gmail.com>
>> >>> > >> > wrote:
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
>> >>> > >> > >> >>> > <li...@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > The question is whether to repurpose the existing
>> >>> > FlightData
>> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
>> >>> > >> > >> >>> > > filled
>> >>> in
>> >>> > >> > >> >>> > > and
>> >>> > >> > >> data
>> >>> > >> > >> >>> > > fields to be blank (as a control message), or to
>> >>> > >> > >> >>> > > wrap
>> >>> > >> > >> >>> > > the
>> >>> > >> > >> FlightData
>> >>> > >> > >> >>> > > structure in another structure that explicitly
>> >>> > distinguishes
>> >>> > >> > >> between
>> >>> > >> > >> >>> > > control and data messages.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
>> >>> > >> > >> >>> > with
>> >>> > >> > >> >>> > empty
>> >>> > >> > >> body.
>> >>> > >> > >> >>> > One question to consider is what changes (if any)
>> >>> > >> > >> >>> > would
>> >>> need
>> >>> > to
>> >>> > >> > >> >>> > be
>> >>> > >> > >> >>> > made to public APIs in either scenario.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow
>> >>> > >> > >> >>> empty
>> >>> data
>> >>> > >> > >> >>> messages
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> the future. This would be a breaking change, but
>> >>> > >> > >> >>> wouldn't
>> >>> > change
>> >>> > >> > >> >>> the
>> >>> > >> > >> >>> wire
>> >>> > >> > >> >>> format. I think the APIs could be changed backwards
>> >>> compatibly,
>> >>> > >> > >> >>> though.
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > The other question is how to handle the metadata
>> >>> > >> > >> >>> > > fields.
>> >>> So
>> >>> > >> > >> >>> > > far,
>> >>> > >> > >> >>> > > we've
>> >>> > >> > >> >>> > > used bytestring fields for application-defined
>> >>> > >> > >> >>> > > data.
>> >>> > >> > >> >>> > > This
>> >>> > is
>> >>> > >> > >> >>> > > workable
>> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents
>> >>> > >> > >> >>> > > of
>> >>> those
>> >>> > >> > >> >>> > > fields,
>> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
>> >>> > >> > >> >>> > > into/from
>> >>> the
>> >>> > >> > >> >>> > > bytestring
>> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
>> >>> > >> > >> >>> > > dynamically
>> >>> > >> > >> >>> > > typed
>> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd
>> >>> > >> > >> >>> > > be
>> >>> > >> > >> >>> > > exposing
>> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
>> >>> > >> > >> >>> > > combination
>> >>> of
>> >>> > >> > >> >>> > > a
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
>> >>> > >> > >> >>> > > Protobuf
>> >>> > >> > >> >>> > > Any
>> >>> > >> > >> >>> > > type
>> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is
>> >>> > >> > >> >>> > > actually
>> >>> > cleaner
>> >>> > >> > >> >>> > > in
>> >>> > >> > >> any
>> >>> > >> > >> >>> > > of the language APIs, though.
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > particular
>> >>> > >> > >> >>> > Flight implementation seems on first principles like
>> >>> > >> > >> >>> > the
>> >>> most
>> >>> > >> > >> flexible
>> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
>> >>> > >> > >> >>> > _must_
>> >>> be
>> >>> > a
>> >>> > >> > >> >>> > protobuf?
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based
>> >>> > >> > >> >>> API,
>> >>> > >> > >> >>> but
>> >>> it
>> >>> > >> > would
>> >>> > >> > >> >>> have
>> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
>> >>> > >> > >> >>> Then
>> >>> > the
>> >>> > >> > >> >>> question
>> >>> > >> > >> >>> would just be how to expose this (would be easier in
>> >>> > >> > >> >>> Java,
>> >>> > harder
>> >>> > >> > >> >>> in
>> >>> > >> > >> >>> C++).
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>>
>> >>> > >> > >> >>> > > David
>> >>> > >> > >> >>> > >
>> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
>> >>> > >> > >> >>> > > wrote:
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
>> >>> > >> > >> >>> > > > non-protobuf
>> >>> > >> > >> >>> > > > terms?
>> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to
>> >>> > >> > >> >>> > > > use
>> >>> > Flight.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Regards
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Antoine.
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding;
>> >>> > >> > >> >>> > > >> it
>> >>> > would
>> >>> > >> > just
>> >>> > >> > >> be
>> >>> > >> > >> >>> > > >> application-level logic. (The official guide
>> >>> > >> > >> >>> > > >> doesn't
>> >>> > even
>> >>> > >> > >> mention
>> >>> > >> > >> >>> > > >> it
>> >>> > >> > >> >>> > > >> in the encoding docs; I found
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > >> > >> >>> > > >> as well.)
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
>> >>> > >> > >> >>> > > >> essentially
>> >>> > >> > >> >>> > > >> inlining
>> >>> > >> > >> >>> > > >> the definition of Any, e.g.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightMessage {
>> >>> > >> > >> >>> > > >>   oneof message {
>> >>> > >> > >> >>> > > >>     FlightData data = 1;
>> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
>> >>> > >> > >> >>> > > >>   }
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> message FlightAny {
>> >>> > >> > >> >>> > > >>   string type = 1;
>> >>> > >> > >> >>> > > >>   bytes data = 2;
>> >>> > >> > >> >>> > > >> }
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Is this correct?
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
>> >>> > >> > >> >>> > > >> for
>> >>> > >> > >> >>> > > >> DoGet/DoPut
>> >>> > >> > >> >>> > > >> as
>> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
>> >>> > >> > >> >>> > > >> consistent
>> >>> > with
>> >>> > >> > >> >>> > > >> all
>> >>> > >> > >> >>> > > >> of
>> >>> > >> > >> >>> > > >> them,
>> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
>> >>> > >> > >> >>> > > >> own
>> >>> > >> > >> >>> > > >> thing.
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> Thanks,
>> >>> > >> > >> >>> > > >> David
>> >>> > >> > >> >>> > > >>
>> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
>> >>> wrote:
>> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
>> >>> > >> > >> >>> > > >>> behavior
>> >>> > >> > >> >>> > > >>> without
>> >>> > >> > >> >>> > > >>> exposing
>> >>> > >> > >> >>> > the
>> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess
>> >>> > >> > >> >>> > > >>> we
>> >>> could
>> >>> > >> > >> >>> > > >>> expose
>> >>> > >> > >> >>> > > >>> as
>> >>> > >> > >> >>> > > >>> two
>> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it
>> >>> > >> > >> >>> > > >>> for
>> >>> > >> > >> >>> > > >>> whatever
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> people
>> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
>> >>> > >> > >> >>> > > >>> (Basically
>> >>> a
>> >>> > >> > >> >>> > > >>> user
>> >>> > >> > >> >>> > > >>> could
>> >>> > >> > >> >>> > > >>> use
>> >>> > >> > >> >>> > > >>> any
>> >>> > >> > >> >>> > > >>> with it easily but they could also use any
>> >>> > >> > >> >>> > > >>> other
>> >>> > >> > >> >>> > > >>> mechanism).
>> >>> > >> > >> >>> > > >>> At
>> >>> > >> > >> >>> > least in
>> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy.
>> >>> > >> > >> >>> > > >>> Are
>> >>> other
>> >>> > >> > >> language
>> >>> > >> > >> >>> > > >>> bindings
>> >>> > >> > >> >>> > > >>> less diy?
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData
>> >>> > >> > >> >>> > > >>> +
>> >>> > >> > >> >>> > > >>> metadata
>> >>> > >> > >> >>> > > >>> but
>> >>> > >> > >> >>> > > >>> it
>> >>> > >> > >> >>> > just
>> >>> > >> > >> >>> > > >>> seemed a bit janky.
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper
>> >>> > >> > >> >>> > > >>> object
>> >>> > thing,
>> >>> > >> > >> >>> > > >>> I
>> >>> > >> > >> >>> > > >>> wonder
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > we
>> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the
>> >>> > >> > >> >>> > > >>> same
>> >>> > property
>> >>> > >> > >> >>> > > >>> if
>> >>> > >> > >> >>> > > >>> we
>> >>> > >> > >> >>> > think it
>> >>> > >> > >> >>> > > >>> is
>> >>> > >> > >> >>> > > >>> a good idea...
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> >>> > >> > >> li.davidm96@gmail.com>
>> >>> > >> > >> >>> > wrote:
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >>>> I was definitely considering having control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > without
>> >>> > >> > >> >>> > > >>>> data,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a
>> >>> > >> > >> >>> > > >>>> FlightData
>> >>> > >> > >> >>> > > >>>> with
>> >>> > >> > >> >>> > > >>>> only
>> >>> > >> > >> >>> > app_metadata
>> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
>> >>> > FlightData
>> >>> > >> > >> >>> > > >>>> should
>> >>> > >> > >> >>> > always
>> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
>> >>> > >> > >> >>> > > >>>> the
>> >>> > >> > >> >>> > > >>>> documentation
>> >>> > >> > >> >>> > > >>>> on
>> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
>> >>> > >> > >> >>> > > >>>> about
>> >>> > >> > >> >>> > > >>>> having
>> >>> > >> > >> >>> > > >>>> a
>> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents
>> >>> > >> > >> >>> > > >>>> that
>> >>> from
>> >>> > >> > >> >>> > > >>>> happening,
>> >>> > >> > >> >>> > and
>> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
>> >>> > >> > >> >>> > > >>>> and
>> >>> > >> > >> >>> > > >>>> control
>> >>> > >> > >> >>> > > >>>> messages
>> >>> > >> > >> >>> > is
>> >>> > >> > >> >>> > > >>>> cleaner.
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
>> >>> > >> > >> >>> > > >>>> refrained
>> >>> > >> > >> >>> > > >>>> from
>> >>> > >> > >> >>> > > >>>> exposing
>> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to
>> >>> > >> > >> >>> > > >>>> change
>> >>> that
>> >>> > >> > >> >>> > > >>>> now?
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> Best,
>> >>> > >> > >> >>> > > >>>> David
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>> <ja...@apache.org>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern
>> >>> > >> > >> >>> > > >>>>> we
>> >>> > >> > >> >>> > > >>>>> use
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> doget/doput
>> >>> > >> > >> >>> > > >>>>> for
>> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java
>> >>> > >> > >> >>> > > >>>>> grpc's
>> >>> > async
>> >>> > >> > >> >>> > > >>>>> always
>> >>> > >> > >> >>> > pattern.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think
>> >>> > >> > >> >>> > > >>>>> it
>> >>> > >> > >> >>> > > >>>>> is
>> >>> > >> > >> >>> > > >>>>> overloading
>> >>> > >> > >> >>> > > >>>>> the
>> >>> > >> > >> >>> > > >>>> message
>> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
>> >>> control
>> >>> > >> > >> >>> > > >>>>> message
>> >>> > >> > >> >>> > > >>>> independently
>> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
>> >>> something
>> >>> > >> > >> >>> > > >>>>> like
>> >>> > >> > >> >>> > > >>>>> an
>> >>> > >> > >> >>> > > >>>>> empty
>> >>> > >> > >> >>> > > >>>> flight
>> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why
>> >>> > >> > >> >>> > > >>>>> not
>> >>> > support
>> >>> > >> > >> >>> > > >>>>> a
>> >>> > >> > >> >>> > > >>>>> container
>> >>> > >> > >> >>> > > >>>>> object
>> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead
>> >>> > >> > >> >>> > > >>>>> so
>> >>> users
>> >>> > >> > >> >>> > > >>>>> can
>> >>> > >> > >> >>> > > >>>>> add
>> >>> > >> > >> >>> > > >>>>> more
>> >>> > >> > >> >>> > data
>> >>> > >> > >> >>> > > >>>>> as
>> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
>> >>> > >> > >> >>> > > >>>>> the
>> >>> Any
>> >>> > >> > >> >>> > > >>>>> messages.
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as
>> >>> > >> > >> >>> > > >>>>>> a
>> >>> result
>> >>> > >> > >> >>> > > >>>>>> of
>> >>> > >> > >> prior
>> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
>> >>> > >> > >> >>> > > >>>>>> message
>> >>> to
>> >>> > >> > carry
>> >>> > >> > >> >>> > > >>>>>> that
>> >>> > >> > >> >>> > kind
>> >>> > >> > >> >>> > > >>>>>> of information.
>> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > handle
>> >>> > >> > >> >>> > > >>>>>> incoming
>> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
>> >>> > >> > >> >>> > > >>>>>> actually
>> >>> > be
>> >>> > >> > >> >>> > > >>>>>> quite
>> >>> > >> > >> >>> > natural
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that
>> >>> > >> > >> >>> > > >>>>>> it's
>> >>> not
>> >>> > >> > >> >>> > > >>>>>> possible
>> >>> > >> > >> >>> > > >>>>>> in
>> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread,
>> >>> > >> > >> >>> > > >>>>>> though.
>> >>> > >> > >> >>> > > >>>>>> (In
>> >>> > >> > essence,
>> >>> > >> > >> >>> > gRPC-Java
>> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is
>> >>> > >> > >> >>> > > >>>>>> sync-always.)
>> >>> There
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > experimental
>> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something
>> >>> > >> > >> >>> > > >>>>>> similar
>> >>> > >> > >> >>> > > >>>>>> to
>> >>> > >> > >> >>> > > >>>>>> Java,
>> >>> > >> > >> >>> > > >>>>>> but
>> >>> > >> > >> >>> > > >>>>>> those
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and
>> >>> > >> > >> >>> > > >>>>>> are
>> >>> > still
>> >>> > >> > >> >>> > > >>>>>> under
>> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor
>> >>> > >> > >> >>> > > >>>>>> APIs
>> >>> which
>> >>> > >> > >> >>> > > >>>>>> have
>> >>> > >> > >> been
>> >>> > >> > >> >>> > around
>> >>> > >> > >> >>> > > >>>>>> for quite a while).
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>> David
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
>> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
>> >>> > >> > >> >>> > > >>>>>>> Might
>> >>> > >> > >> >>> > > >>>>>>> worth
>> >>> > >> > >> >>> > > >>>>>>> discussion
>> >>> > >> > >> >>> > > >>>>>>> here
>> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
>> >>> > >> > >> >>> > > >>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
>> >>> provide a
>> >>> > >> > >> >>> > > >>>>>>>> Python
>> >>> > >> > >> >>> > strawman.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it,
>> >>> > >> > >> >>> > > >>>>>>>> you
>> >>> > could
>> >>> > >> > >> >>> > > >>>>>>>> interleave
>> >>> > >> > >> >>> > > >>>> uploads
>> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined.
>> >>> > >> > >> >>> > > >>>>>>>> Right
>> >>> now,
>> >>> > >> > >> >>> > > >>>>>>>> synchronous
>> >>> > >> > >> >>> > APIs
>> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
>> >>> > >> > >> >>> > > >>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>> server
>> >>> > >> > >> >>> > > >>>>>>>> wait
>> >>> > >> > >> >>> > > >>>>>>>> for
>> >>> > >> > >> >>> > each
>> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug.
>> >>> > >> > >> >>> > > >>>>>>>> (gRPC
>> >>> > >> > >> >>> > > >>>>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>> give
>> >>> > >> > >> >>> > > >>>>>>>> us
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
>> >>> overall
>> >>> > >> > >> >>> > > >>>>>>>> timeout.)
>> >>> > >> > >> >>> > > >>>>>>>> As
>> >>> > >> > >> >>> > an
>> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
>> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though,
>> >>> > >> > >> >>> > > >>>>>>>> eventually
>> >>> > >> > >> >>> > > >>>>>>>> we
>> >>> > >> > >> >>> > > >>>>>>>> will
>> >>> > >> > >> >>> > > >>>>>>>> want
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > design
>> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
>> >>> > >> > bidirectional
>> >>> > >> > >> >>> > > >>>>>>>> stream
>> >>> > >> > >> >>> > > >>>>>>>> like
>> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
>> >>> > >> > >> >>> > > >>>>>>>> pitfalls
>> >>> > >> > >> >>> > > >>>>>>>> easier
>> >>> > >> > >> >>> > > >>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>> run
>> >>> > >> > >> >>> > into.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
>> >>> > >> > >> >>> > > >>>>>>>> proposal,
>> >>> > >> > but
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > main
>> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you
>> >>> > >> > >> >>> > > >>>>>>>> deploy,
>> >>> > >> > >> >>> > > >>>>>>>> two
>> >>> > >> > >> >>> > > >>>>>>>> separate
>> >>> > >> > >> >>> > > >>>>>>>> calls
>> >>> > >> > >> >>> > > >>>>>>>> could
>> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
>> >>> > >> > >> >>> > > >>>>>>>> Additionally,
>> >>> > >> > >> >>> > > >>>>>>>> gRPC
>> >>> > >> > >> >>> > > >>>>>>>> has
>> >>> > >> > >> >>> > > >>>>>>>> some
>> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
>> >>> > >> > >> >>> > > >>>>>>>> away
>> >>> in
>> >>> > >> > >> >>> > > >>>>>>>> between
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > two
>> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
>> >>> > >> > >> >>> > > >>>>>>>> another
>> >>> > >> > instance
>> >>> > >> > >> >>> > available,
>> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the
>> >>> > >> > >> >>> > > >>>>>>>> new
>> >>> > server
>> >>> > >> > >> without
>> >>> > >> > >> >>> > > >>>>>>>> warning.
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray
>> >>> > >> > >> >>> > > >>>>>>>> <ry...@dremio.com>
>> >>> > wrote:
>> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of
>> >>> > >> > >> >>> > > >>>>>>>>> sense.
>> >>> > >> > >> >>> > > >>>>>>>>> I
>> >>> > like
>> >>> > >> > >> >>> > > >>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> possibility
>> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
>> >>> > >> > >> >>> > > >>>>>>>>> thing
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>>> help
>> >>> > >> > >> >>> > me
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real
>> >>> > >> > >> >>> > > >>>>>>>>> life
>> >>> use
>> >>> > >> > >> >>> > > >>>>>>>>> case.
>> >>> > >> > >> >>> > > >>>>>>>>> Also,
>> >>> > >> > >> >>> > what
>> >>> > >> > >> >>> > > >>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
>> >>> > >> > >> >>> > > >>>>>>>>> asyc?
>> >>> > >> > >> >>> > > >>>>>>>>> Would
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client
>> >>> > >> > >> >>> > > >>>>>>>>> block
>> >>> > >> > >> >>> > > >>>>>>>> till
>> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> >>> > >> > >> flight.vector_mult(a,
>> >>> > >> > >> >>> > > >>>>>>>>> b)
>> >>> > >> > >> >>> > or
>> >>> > >> > >> >>> > > >>>>>>>>> would
>> >>> > >> > >> >>> > > >>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that
>> >>> > >> > >> >>> > > >>>>>>>>> computation
>> >>> > >> > >> >>> > > >>>>>>>>> was
>> >>> > >> > >> >>> > > >>>>>>>>> done.
>> >>> > >> > >> >>> > > >>>>>>>>> If
>> >>> > >> > >> >>> > > >>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>> later
>> >>> > >> > >> >>> > > >>>>>>>>> how
>> >>> > >> > >> >>> > > >>>>>>>>> is
>> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
>> >>> suppose
>> >>> > >> > >> >>> > > >>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>> this
>> >>> > >> > >> >>> > could
>> >>> > >> > >> >>> > > >>>> be
>> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
>> >>> > >> > >> >>> > > >>>>>>>>> interface
>> >>> > >> > >> >>> > > >>>>>>>>> but
>> >>> > >> > >> rather
>> >>> > >> > >> >>> > > >>>>>>>>> by a
>> >>> > >> > >> >>> > > >>>>>>>>> function/util?
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Best,
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li
>> >>> > >> > >> >>> > > >>>>>>>>> <
>> >>> > >> > >> >>> > li.davidm96@gmail.com>
>> >>> > >> > >> >>> > > >>>>>> wrote:
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite
>> >>> > >> > >> >>> > > >>>>>>>>>> successfully
>> >>> > >> > >> >>> > > >>>>>>>>>> so
>> >>> > >> > >> >>> > > >>>>>>>>>> far,
>> >>> > >> > but
>> >>> > >> > >> we
>> >>> > >> > >> >>> > > >>>>>>>>>> have
>> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the
>> >>> > >> > >> >>> > > >>>>>>>>>> horizon:
>> >>> being
>> >>> > >> > >> >>> > > >>>>>>>>>> able
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> both
>> >>> > >> > >> >>> > > >>>>>>>>>> send
>> >>> > >> > >> >>> > > >>>>>>>>>> and
>> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> call.
>> >>> To
>> >>> > >> > >> >>> > > >>>>>>>>>> that
>> >>> > >> > >> >>> > > >>>>>>>>>> end,
>> >>> > >> > >> >>> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC
>> >>> > >> > >> >>> > > >>>>>>>>>> method:
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> >
>> >>> > >> > >>
>> >>> > >> >
>> >>> >
>> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
>> >>> comment
>> >>> > >> > >> >>> > > >>>>>>>>>> on
>> >>> > >> > the
>> >>> > >> > >> >>> > document.
>> >>> > >> > >> >>> > > >>>>>>>>>> I'd
>> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > >> > >> >>> > > >>>>>>>>>> relatively
>> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
>> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
>> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require
>> >>> > >> > >> >>> > > >>>>>>>>>> a
>> >>> > vote.
>> >>> > >> > I've
>> >>> > >> > >> >>> > > >>>>>>>>>> decided
>> >>> > >> > >> >>> > > >>>>>>>>>> to
>> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
>> >>> > >> > >> >>> > > >>>>>>>>>> proposed
>> >>> (on
>> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
>> >>> > >> > >> >>> > > >>>>>>>>>> as
>> >>> > >> > >> >>> > > >>>>>>>>>> it
>> >>> > >> > >> >>> > > >>>>>> doesn't
>> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
>> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
>> >>> > >> > of
>> >>> > >> > >> >>> > > >>>>>>>>>> the
>> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
>> >>> > >> > >> >>> > > >>>>>>>>>> David
>> >>> > >> > >> >>> > > >>>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> --
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting
>> >>> > >> > >> >>> > > >>>>>>>>> Engineer
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
>> >>> > >> > >> join
>> >>> > >> > >> >>> > > >>>>>>>>> our
>> >>> > >> > >> >>> > > >>>>>>>>> community
>> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
>> >>> Download
>> >>> > >> > Dremio
>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > >> > >> >>> > > >>>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>>
>> >>> > >> > >> >>> > > >>>>>>>
>> >>> > >> > >> >>> > > >>>>>>
>> >>> > >> > >> >>> > > >>>>>
>> >>> > >> > >> >>> > > >>>>
>> >>> > >> > >> >>> > > >>>
>> >>> > >> > >> >>> > > >
>> >>> > >> > >> >>> >
>> >>> > >> > >> >>
>> >>> > >> > >> >
>> >>> > >> > >>
>> >>> > >> > >
>> >>> > >> >
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

Posted by Wes McKinney <we...@gmail.com>.
hi David,

I did a preliminary view and things look to be on the right track
there. What do you think about breaking out the protocol changes (and
adding appropriate comments) so we can have a vote on that in
relatively short order?

- Wes

On Wed, Mar 18, 2020 at 9:06 AM David Li <li...@gmail.com> wrote:
>
> Following up here, I've submitted a draft implementation for C++:
> https://github.com/apache/arrow/pull/6656
>
> The core functionality is there, but there are still holes that I need
> to implement. Compared to the draft spec, the client also sends a
> FlightDescriptor to begin with, though it's currently not exposed.
> This provides consistency with DoGet/DoPut which also send a message
> to begin with to describe the stream to the server.
>
> Andy, I hope this helps clarify whether it meets your needs.
>
> Best,
> David
>
> On 2/25/20, David Li <li...@gmail.com> wrote:
> > Hey Andy,
> >
> > I've been rather busy unfortunately. I had started on an
> > implementation in C++ to provide as part of this discussion, but it's
> > not complete. I'm hoping to have more done in March.
> >
> > Best,
> > David
> >
> > On 2/25/20, Andy Grove <an...@gmail.com> wrote:
> >> I was wondering if there had been any momentum on this (the BiDirectional
> >> RPC design)?
> >>
> >> I'm interested in this for the use case of Apache Spark sending a stream
> >> of
> >> data to another process to invoke custom code and then receive a stream
> >> back with the transformed data.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >>
> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>
> >>> I support moving forward with the current proposal.
> >>>
> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li...@gmail.com> wrote:
> >>>
> >>> > Just following up here again, any other thoughts?
> >>> >
> >>> > I think we do have justifications for potentially separate streams in
> >>> > a call, but that's more of an orthogonal question - it doesn't need to
> >>> > be addressed here. I do agree that it very much complicates things.
> >>> >
> >>> > Thanks,
> >>> > David
> >>> >
> >>> > On 11/29/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > > I would generally agree with this. Note that you have the
> >>> > > possibility
> >>> > > to use unions-of-structs to send record batches with different
> >>> > > schemas
> >>> > > in the same stream, though with some added complexity on each side
> >>> > >
> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >>
> >>> > >> I'd vote for explicitly not supported. We should keep our
> >>> > >> primitives
> >>> > >> narrow.
> >>> > >>
> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Thanks for the feedback.
> >>> > >> >
> >>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> >>> > >> > there are a lot of places where things could be made more
> >>> > >> > ergonomic,
> >>> > >> > including with the metadata fields. But it would also have locked
> >>> out
> >>> > >> > us of potential future transports.
> >>> > >> >
> >>> > >> > On another note: I hesitate to put too much into this method, but
> >>> > >> > we
> >>> > >> > are looking at use cases where potentially, a client may want to
> >>> > >> > upload multiple distinct datasets (with differing schemas). (This
> >>> is a
> >>> > >> > little tentative, and I can get more details...) Right now, each
> >>> > >> > logical stream in Flight must have a single, consistent schema;
> >>> would
> >>> > >> > it make sense to look at ways to relax this, or declare this
> >>> > >> > explicitly out of scope (and require multiple calls and
> >>> > >> > coordination
> >>> > >> > with the deployment topology) in order to accomplish this?
> >>> > >> >
> >>> > >> > Best,
> >>> > >> > David
> >>> > >> >
> >>> > >> > On 11/27/19, Jacques Nadeau <ja...@apache.org> wrote:
> >>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> >>> looks
> >>> > >> > > good
> >>> > >> > > to me.
> >>> > >> > >
> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >>> > >> > > <li...@gmail.com>
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > >> I've updated the proposal.
> >>> > >> > >>
> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >>> > >> > >> errors/metadata, I still think using bytes is preferable:
> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> >>> Protobuf
> >>> > >> > types,
> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
> >>> > >> > >> to
> >>> > >> > >> C++
> >>> > >> > >> users without causing build pains,
> >>> > >> > >> - We can't let Python users take advantage of the Protobuf
> >>> > >> > >> field
> >>> > >> > >> without somehow being compatible with the Protobuf wheels (by
> >>> > >> > >> linking
> >>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> >>> into
> >>> > >> > >> the Python ones),
> >>> > >> > >> - All our other application-defined fields are already bytes.
> >>> > >> > >>
> >>> > >> > >> Applications that want structure can encode JSON or Protobuf
> >>> > >> > >> Any
> >>> > >> > >> into
> >>> > >> > >> the bytes field themselves, much as you can already do for
> >>> Ticket,
> >>> > >> > >> commands in FlightDescriptors, and application metadata in
> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> >>> using
> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag,
> >>> > >> > >> and
> >>> > must
> >>> > >> > >> invoke the Protobuf deserializer again to read the actual
> >>> message.
> >>> > >> > >>
> >>> > >> > >> If we decide on using bytes, then I don't think it makes sense
> >>> > >> > >> to
> >>> > >> > >> define a new message with a oneof either, since it would be
> >>> > >> > >> redundant.
> >>> > >> > >>
> >>> > >> > >> Thanks,
> >>> > >> > >> David
> >>> > >> > >>
> >>> > >> > >> On 11/7/19, David Li <li...@gmail.com> wrote:
> >>> > >> > >> > I've been extremely backlogged, I will update the proposal
> >>> when I
> >>> > >> > >> > get
> >>> > >> > >> > a chance and reply here when done.
> >>> > >> > >> >
> >>> > >> > >> > Best,
> >>> > >> > >> > David
> >>> > >> > >> >
> >>> > >> > >> > On 11/7/19, Wes McKinney <we...@gmail.com> wrote:
> >>> > >> > >> >> Bumping this discussion since a couple of weeks have
> >>> > >> > >> >> passed.
> >>> It
> >>> > >> > >> >> seems
> >>> > >> > >> >> there are still some questions here, could we summarize
> >>> > >> > >> >> what
> >>> are
> >>> > >> > >> >> the
> >>> > >> > >> >> alternatives along with any public API implications so we
> >>> > >> > >> >> can
> >>> > try
> >>> > >> > >> >> to
> >>> > >> > >> >> render a decision?
> >>> > >> > >> >>
> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> >>> li.davidm96@gmail.com
> >>> > >
> >>> > >> > >> >> wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> Hi Wes,
> >>> > >> > >> >>>
> >>> > >> > >> >>> Responses inline:
> >>> > >> > >> >>>
> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> >>> wesmckinn@gmail.com>
> >>> > >> > wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >>> > >> > >> >>> > <li...@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > The question is whether to repurpose the existing
> >>> > FlightData
> >>> > >> > >> >>> > > structure, and allow for the metadata field to be
> >>> > >> > >> >>> > > filled
> >>> in
> >>> > >> > >> >>> > > and
> >>> > >> > >> data
> >>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap
> >>> > >> > >> >>> > > the
> >>> > >> > >> FlightData
> >>> > >> > >> >>> > > structure in another structure that explicitly
> >>> > distinguishes
> >>> > >> > >> between
> >>> > >> > >> >>> > > control and data messages.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData
> >>> > >> > >> >>> > with
> >>> > >> > >> >>> > empty
> >>> > >> > >> body.
> >>> > >> > >> >>> > One question to consider is what changes (if any) would
> >>> need
> >>> > to
> >>> > >> > >> >>> > be
> >>> > >> > >> >>> > made to public APIs in either scenario.
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty
> >>> data
> >>> > >> > >> >>> messages
> >>> > >> > >> >>> in
> >>> > >> > >> >>> the future. This would be a breaking change, but wouldn't
> >>> > change
> >>> > >> > >> >>> the
> >>> > >> > >> >>> wire
> >>> > >> > >> >>> format. I think the APIs could be changed backwards
> >>> compatibly,
> >>> > >> > >> >>> though.
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > The other question is how to handle the metadata
> >>> > >> > >> >>> > > fields.
> >>> So
> >>> > >> > >> >>> > > far,
> >>> > >> > >> >>> > > we've
> >>> > >> > >> >>> > > used bytestring fields for application-defined data.
> >>> > >> > >> >>> > > This
> >>> > is
> >>> > >> > >> >>> > > workable
> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents of
> >>> those
> >>> > >> > >> >>> > > fields,
> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf
> >>> > >> > >> >>> > > into/from
> >>> the
> >>> > >> > >> >>> > > bytestring
> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a
> >>> > >> > >> >>> > > dynamically
> >>> > >> > >> >>> > > typed
> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd be
> >>> > >> > >> >>> > > exposing
> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a
> >>> > >> > >> >>> > > combination
> >>> of
> >>> > >> > >> >>> > > a
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the
> >>> > >> > >> >>> > > Protobuf
> >>> > >> > >> >>> > > Any
> >>> > >> > >> >>> > > type
> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually
> >>> > cleaner
> >>> > >> > >> >>> > > in
> >>> > >> > >> any
> >>> > >> > >> >>> > > of the language APIs, though.
> >>> > >> > >> >>> >
> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to the
> >>> > >> > >> >>> > particular
> >>> > >> > >> >>> > Flight implementation seems on first principles like the
> >>> most
> >>> > >> > >> flexible
> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata
> >>> > >> > >> >>> > _must_
> >>> be
> >>> > a
> >>> > >> > >> >>> > protobuf?
> >>> > >> > >> >>> >
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based API,
> >>> > >> > >> >>> but
> >>> it
> >>> > >> > would
> >>> > >> > >> >>> have
> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.)
> >>> > >> > >> >>> Then
> >>> > the
> >>> > >> > >> >>> question
> >>> > >> > >> >>> would just be how to expose this (would be easier in Java,
> >>> > harder
> >>> > >> > >> >>> in
> >>> > >> > >> >>> C++).
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> > > David
> >>> > >> > >> >>> > >
> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <an...@python.org>
> >>> > >> > >> >>> > > wrote:
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in
> >>> > >> > >> >>> > > > non-protobuf
> >>> > >> > >> >>> > > > terms?
> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use
> >>> > Flight.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Regards
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Antoine.
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it
> >>> > would
> >>> > >> > just
> >>> > >> > >> be
> >>> > >> > >> >>> > > >> application-level logic. (The official guide
> >>> > >> > >> >>> > > >> doesn't
> >>> > even
> >>> > >> > >> mention
> >>> > >> > >> >>> > > >> it
> >>> > >> > >> >>> > > >> in the encoding docs; I found
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > >> > >> >>> > > >> as well.)
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing
> >>> > >> > >> >>> > > >> essentially
> >>> > >> > >> >>> > > >> inlining
> >>> > >> > >> >>> > > >> the definition of Any, e.g.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightMessage {
> >>> > >> > >> >>> > > >>   oneof message {
> >>> > >> > >> >>> > > >>     FlightData data = 1;
> >>> > >> > >> >>> > > >>     FlightAny metadata = 2;
> >>> > >> > >> >>> > > >>   }
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> message FlightAny {
> >>> > >> > >> >>> > > >>   string type = 1;
> >>> > >> > >> >>> > > >>   bytes data = 2;
> >>> > >> > >> >>> > > >> }
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Is this correct?
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message
> >>> > >> > >> >>> > > >> for
> >>> > >> > >> >>> > > >> DoGet/DoPut
> >>> > >> > >> >>> > > >> as
> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be
> >>> > >> > >> >>> > > >> consistent
> >>> > with
> >>> > >> > >> >>> > > >> all
> >>> > >> > >> >>> > > >> of
> >>> > >> > >> >>> > > >> them,
> >>> > >> > >> >>> > > >> rather than have one of the three methods do its
> >>> > >> > >> >>> > > >> own
> >>> > >> > >> >>> > > >> thing.
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> Thanks,
> >>> > >> > >> >>> > > >> David
> >>> > >> > >> >>> > > >>
> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <ja...@apache.org>
> >>> wrote:
> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof
> >>> > >> > >> >>> > > >>> behavior
> >>> > >> > >> >>> > > >>> without
> >>> > >> > >> >>> > > >>> exposing
> >>> > >> > >> >>> > the
> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we
> >>> could
> >>> > >> > >> >>> > > >>> expose
> >>> > >> > >> >>> > > >>> as
> >>> > >> > >> >>> > > >>> two
> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for
> >>> > >> > >> >>> > > >>> whatever
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> people
> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work.
> >>> > >> > >> >>> > > >>> (Basically
> >>> a
> >>> > >> > >> >>> > > >>> user
> >>> > >> > >> >>> > > >>> could
> >>> > >> > >> >>> > > >>> use
> >>> > >> > >> >>> > > >>> any
> >>> > >> > >> >>> > > >>> with it easily but they could also use any other
> >>> > >> > >> >>> > > >>> mechanism).
> >>> > >> > >> >>> > > >>> At
> >>> > >> > >> >>> > least in
> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are
> >>> other
> >>> > >> > >> language
> >>> > >> > >> >>> > > >>> bindings
> >>> > >> > >> >>> > > >>> less diy?
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData +
> >>> > >> > >> >>> > > >>> metadata
> >>> > >> > >> >>> > > >>> but
> >>> > >> > >> >>> > > >>> it
> >>> > >> > >> >>> > just
> >>> > >> > >> >>> > > >>> seemed a bit janky.
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object
> >>> > thing,
> >>> > >> > >> >>> > > >>> I
> >>> > >> > >> >>> > > >>> wonder
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > we
> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same
> >>> > property
> >>> > >> > >> >>> > > >>> if
> >>> > >> > >> >>> > > >>> we
> >>> > >> > >> >>> > think it
> >>> > >> > >> >>> > > >>> is
> >>> > >> > >> >>> > > >>> a good idea...
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> >>> > >> > >> li.davidm96@gmail.com>
> >>> > >> > >> >>> > wrote:
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >>>> I was definitely considering having control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > without
> >>> > >> > >> >>> > > >>>> data,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData
> >>> > >> > >> >>> > > >>>> with
> >>> > >> > >> >>> > > >>>> only
> >>> > >> > >> >>> > app_metadata
> >>> > >> > >> >>> > > >>>> set. I think I understand your position now:
> >>> > FlightData
> >>> > >> > >> >>> > > >>>> should
> >>> > >> > >> >>> > always
> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with
> >>> > >> > >> >>> > > >>>> the
> >>> > >> > >> >>> > > >>>> documentation
> >>> > >> > >> >>> > > >>>> on
> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried
> >>> > >> > >> >>> > > >>>> about
> >>> > >> > >> >>> > > >>>> having
> >>> > >> > >> >>> > > >>>> a
> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that
> >>> from
> >>> > >> > >> >>> > > >>>> happening,
> >>> > >> > >> >>> > and
> >>> > >> > >> >>> > > >>>> overall having a clear separation between data
> >>> > >> > >> >>> > > >>>> and
> >>> > >> > >> >>> > > >>>> control
> >>> > >> > >> >>> > > >>>> messages
> >>> > >> > >> >>> > is
> >>> > >> > >> >>> > > >>>> cleaner.
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've
> >>> > >> > >> >>> > > >>>> refrained
> >>> > >> > >> >>> > > >>>> from
> >>> > >> > >> >>> > > >>>> exposing
> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change
> >>> that
> >>> > >> > >> >>> > > >>>> now?
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> Best,
> >>> > >> > >> >>> > > >>>> David
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <ja...@apache.org>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we
> >>> > >> > >> >>> > > >>>>> use
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> doget/doput
> >>> > >> > >> >>> > > >>>>> for
> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's
> >>> > async
> >>> > >> > >> >>> > > >>>>> always
> >>> > >> > >> >>> > pattern.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it
> >>> > >> > >> >>> > > >>>>> is
> >>> > >> > >> >>> > > >>>>> overloading
> >>> > >> > >> >>> > > >>>>> the
> >>> > >> > >> >>> > > >>>> message
> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a
> >>> control
> >>> > >> > >> >>> > > >>>>> message
> >>> > >> > >> >>> > > >>>> independently
> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define
> >>> something
> >>> > >> > >> >>> > > >>>>> like
> >>> > >> > >> >>> > > >>>>> an
> >>> > >> > >> >>> > > >>>>> empty
> >>> > >> > >> >>> > > >>>> flight
> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not
> >>> > support
> >>> > >> > >> >>> > > >>>>> a
> >>> > >> > >> >>> > > >>>>> container
> >>> > >> > >> >>> > > >>>>> object
> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so
> >>> users
> >>> > >> > >> >>> > > >>>>> can
> >>> > >> > >> >>> > > >>>>> add
> >>> > >> > >> >>> > > >>>>> more
> >>> > >> > >> >>> > data
> >>> > >> > >> >>> > > >>>>> as
> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for
> >>> > >> > >> >>> > > >>>>> the
> >>> Any
> >>> > >> > >> >>> > > >>>>> messages.
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > >> > >> >>> > > >>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>>> wrote:
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>>> Hi Jacques,
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a
> >>> result
> >>> > >> > >> >>> > > >>>>>> of
> >>> > >> > >> prior
> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new
> >>> > >> > >> >>> > > >>>>>> message
> >>> to
> >>> > >> > carry
> >>> > >> > >> >>> > > >>>>>> that
> >>> > >> > >> >>> > kind
> >>> > >> > >> >>> > > >>>>>> of information.
> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to
> >>> > handle
> >>> > >> > >> >>> > > >>>>>> incoming
> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would
> >>> > >> > >> >>> > > >>>>>> actually
> >>> > be
> >>> > >> > >> >>> > > >>>>>> quite
> >>> > >> > >> >>> > natural
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's
> >>> not
> >>> > >> > >> >>> > > >>>>>> possible
> >>> > >> > >> >>> > > >>>>>> in
> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though.
> >>> > >> > >> >>> > > >>>>>> (In
> >>> > >> > essence,
> >>> > >> > >> >>> > gRPC-Java
> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.)
> >>> There
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > experimental
> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar
> >>> > >> > >> >>> > > >>>>>> to
> >>> > >> > >> >>> > > >>>>>> Java,
> >>> > >> > >> >>> > > >>>>>> but
> >>> > >> > >> >>> > > >>>>>> those
> >>> > >> > >> >>> > > >>>>>> are
> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are
> >>> > still
> >>> > >> > >> >>> > > >>>>>> under
> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs
> >>> which
> >>> > >> > >> >>> > > >>>>>> have
> >>> > >> > >> been
> >>> > >> > >> >>> > around
> >>> > >> > >> >>> > > >>>>>> for quite a while).
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>> David
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau
> >>> > >> > >> >>> > > >>>>>> <ja...@apache.org>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc.
> >>> > >> > >> >>> > > >>>>>>> Might
> >>> > >> > >> >>> > > >>>>>>> worth
> >>> > >> > >> >>> > > >>>>>>> discussion
> >>> > >> > >> >>> > > >>>>>>> here
> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts.
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > >> > >> >>> > > >>>>>>> <li...@gmail.com>
> >>> > >> > >> >>> > > >>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan,
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to
> >>> provide a
> >>> > >> > >> >>> > > >>>>>>>> Python
> >>> > >> > >> >>> > strawman.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you
> >>> > could
> >>> > >> > >> >>> > > >>>>>>>> interleave
> >>> > >> > >> >>> > > >>>> uploads
> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right
> >>> now,
> >>> > >> > >> >>> > > >>>>>>>> synchronous
> >>> > >> > >> >>> > APIs
> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client
> >>> > >> > >> >>> > > >>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>> server
> >>> > >> > >> >>> > > >>>>>>>> wait
> >>> > >> > >> >>> > > >>>>>>>> for
> >>> > >> > >> >>> > each
> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC
> >>> > >> > >> >>> > > >>>>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>> give
> >>> > >> > >> >>> > > >>>>>>>> us
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an
> >>> overall
> >>> > >> > >> >>> > > >>>>>>>> timeout.)
> >>> > >> > >> >>> > > >>>>>>>> As
> >>> > >> > >> >>> > an
> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see
> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063:
> >>> > >> > >> >>> > > >>>>>>>>
> >>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually
> >>> > >> > >> >>> > > >>>>>>>> we
> >>> > >> > >> >>> > > >>>>>>>> will
> >>> > >> > >> >>> > > >>>>>>>> want
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > design
> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A
> >>> > >> > bidirectional
> >>> > >> > >> >>> > > >>>>>>>> stream
> >>> > >> > >> >>> > > >>>>>>>> like
> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these
> >>> > >> > >> >>> > > >>>>>>>> pitfalls
> >>> > >> > >> >>> > > >>>>>>>> easier
> >>> > >> > >> >>> > > >>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>> run
> >>> > >> > >> >>> > into.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the
> >>> > >> > >> >>> > > >>>>>>>> proposal,
> >>> > >> > but
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > main
> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy,
> >>> > >> > >> >>> > > >>>>>>>> two
> >>> > >> > >> >>> > > >>>>>>>> separate
> >>> > >> > >> >>> > > >>>>>>>> calls
> >>> > >> > >> >>> > > >>>>>>>> could
> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances.
> >>> > >> > >> >>> > > >>>>>>>> Additionally,
> >>> > >> > >> >>> > > >>>>>>>> gRPC
> >>> > >> > >> >>> > > >>>>>>>> has
> >>> > >> > >> >>> > > >>>>>>>> some
> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes
> >>> > >> > >> >>> > > >>>>>>>> away
> >>> in
> >>> > >> > >> >>> > > >>>>>>>> between
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > two
> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is
> >>> > >> > >> >>> > > >>>>>>>> another
> >>> > >> > instance
> >>> > >> > >> >>> > available,
> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new
> >>> > server
> >>> > >> > >> without
> >>> > >> > >> >>> > > >>>>>>>> warning.
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <ry...@dremio.com>
> >>> > wrote:
> >>> > >> > >> >>> > > >>>>>>>>> Hey David,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense.
> >>> > >> > >> >>> > > >>>>>>>>> I
> >>> > like
> >>> > >> > >> >>> > > >>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> possibility
> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One
> >>> > >> > >> >>> > > >>>>>>>>> thing
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>>> help
> >>> > >> > >> >>> > me
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> be
> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life
> >>> use
> >>> > >> > >> >>> > > >>>>>>>>> case.
> >>> > >> > >> >>> > > >>>>>>>>> Also,
> >>> > >> > >> >>> > what
> >>> > >> > >> >>> > > >>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs
> >>> > >> > >> >>> > > >>>>>>>>> asyc?
> >>> > >> > >> >>> > > >>>>>>>>> Would
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client
> >>> > >> > >> >>> > > >>>>>>>>> block
> >>> > >> > >> >>> > > >>>>>>>> till
> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c =
> >>> > >> > >> flight.vector_mult(a,
> >>> > >> > >> >>> > > >>>>>>>>> b)
> >>> > >> > >> >>> > or
> >>> > >> > >> >>> > > >>>>>>>>> would
> >>> > >> > >> >>> > > >>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation
> >>> > >> > >> >>> > > >>>>>>>>> was
> >>> > >> > >> >>> > > >>>>>>>>> done.
> >>> > >> > >> >>> > > >>>>>>>>> If
> >>> > >> > >> >>> > > >>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>> later
> >>> > >> > >> >>> > > >>>>>>>>> how
> >>> > >> > >> >>> > > >>>>>>>>> is
> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I
> >>> suppose
> >>> > >> > >> >>> > > >>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>> this
> >>> > >> > >> >>> > could
> >>> > >> > >> >>> > > >>>> be
> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC
> >>> > >> > >> >>> > > >>>>>>>>> interface
> >>> > >> > >> >>> > > >>>>>>>>> but
> >>> > >> > >> rather
> >>> > >> > >> >>> > > >>>>>>>>> by a
> >>> > >> > >> >>> > > >>>>>>>>> function/util?
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Best,
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > >> > >> >>> > li.davidm96@gmail.com>
> >>> > >> > >> >>> > > >>>>>> wrote:
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Hi all,
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully
> >>> > >> > >> >>> > > >>>>>>>>>> so
> >>> > >> > >> >>> > > >>>>>>>>>> far,
> >>> > >> > but
> >>> > >> > >> we
> >>> > >> > >> >>> > > >>>>>>>>>> have
> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon:
> >>> being
> >>> > >> > >> >>> > > >>>>>>>>>> able
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> both
> >>> > >> > >> >>> > > >>>>>>>>>> send
> >>> > >> > >> >>> > > >>>>>>>>>> and
> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC
> >>> > >> > >> >>> > > >>>>>>>>>> call.
> >>> To
> >>> > >> > >> >>> > > >>>>>>>>>> that
> >>> > >> > >> >>> > > >>>>>>>>>> end,
> >>> > >> > >> >>> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> >
> >>> > >> > >>
> >>> > >> >
> >>> >
> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or
> >>> comment
> >>> > >> > >> >>> > > >>>>>>>>>> on
> >>> > >> > the
> >>> > >> > >> >>> > document.
> >>> > >> > >> >>> > > >>>>>>>>>> I'd
> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a
> >>> > >> > >> >>> > > >>>>>>>>>> relatively
> >>> > >> > >> >>> > > >>>>>>>>>> straightforward
> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially
> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet".
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a
> >>> > vote.
> >>> > >> > I've
> >>> > >> > >> >>> > > >>>>>>>>>> decided
> >>> > >> > >> >>> > > >>>>>>>>>> to
> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had
> >>> > >> > >> >>> > > >>>>>>>>>> proposed
> >>> (on
> >>> > >> > >> >>> > > >>>>>>>>>> DoPut),
> >>> > >> > >> >>> > > >>>>>>>>>> as
> >>> > >> > >> >>> > > >>>>>>>>>> it
> >>> > >> > >> >>> > > >>>>>> doesn't
> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the
> >>> > >> > >> >>> > > >>>>>>>>>> interpretation
> >>> > >> > of
> >>> > >> > >> >>> > > >>>>>>>>>> the
> >>> > >> > >> >>> > > >>>>>>>>>> semantics.
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>> Thanks,
> >>> > >> > >> >>> > > >>>>>>>>>> David
> >>> > >> > >> >>> > > >>>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> --
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rymurr@dremio.com
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>,
> >>> > >> > >> join
> >>> > >> > >> >>> > > >>>>>>>>> our
> >>> > >> > >> >>> > > >>>>>>>>> community
> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> &
> >>> Download
> >>> > >> > Dremio
> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > >> > >> >>> > > >>>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>>
> >>> > >> > >> >>> > > >>>>>>>
> >>> > >> > >> >>> > > >>>>>>
> >>> > >> > >> >>> > > >>>>>
> >>> > >> > >> >>> > > >>>>
> >>> > >> > >> >>> > > >>>
> >>> > >> > >> >>> > > >
> >>> > >> > >> >>> >
> >>> > >> > >> >>
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> > >
> >>> > >> >
> >>> > >
> >>> >
> >>>
> >>
> >