You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Paul Whalen <pg...@gmail.com> on 2021/06/02 03:42:30 UTC

Re: [Flight Extension] Request for Comments

Hopefully this thread isn't too stale to pick back up with an open ended
question.  What interface would a Barrage client library expose?  With
Flight, application code cares about RecordBatches, but with Barrage it
seems as though a client library ought to handle the updating of the table
and expose that updated view to a client application.  But what
specifically would that view be?

In the last few months I've built out some Flight services that would
benefit from a protocol like Barrage, and it renewed my interest enough to
casually start a Go implementation based on Nate's documentation, just as a
way of wrapping my head around the problem.  I was watching the repo Nate
shared which ultimately led to the Java implementation embedded in
Deephaven's open source offering, but since that is part of a larger
application, it's a little hard to tell where the lines would be drawn.

Paul

On Tue, Mar 9, 2021 at 9:45 PM Micah Kornfield <em...@gmail.com>
wrote:

> >
> > As for schema evolution, I agree with what Micah proposes as a first
> step.
> > That would again add some overhead, perhaps. As for feasibility, at least
> > on the C++/Python side, I think there would be a decent amount of
> > refactoring needed, and there's also the question of how to expose this
> in
> > the API - the APIs there are based on reader/writer interfaces that don't
> > expose schema evolution.
>
> One more option that might be too slow, is if a schema change is necessary,
> a new flight endpoint is communicated and a new RPC is used?  (reusing the
> same underlying channel could mitigate some performance issues here).
>
> On Tue, Mar 9, 2021 at 3:17 PM David Li <li...@apache.org> wrote:
>
> > There's not really any convention for the app_metadata field or any of
> the
> > other application-defined fields (e.g. DoAction, Criteria). That said, I
> > wouldn't necessarily worry about conflicting with other projects - if a
> > client connects to a Barrage service, presumably it knows what to expect.
> > And an arbitrary Flight client connecting to an arbitrary Flight server
> > isn't really something we've thought about. For instance, see the Flight
> > SQL proposal on this mailing list, which similarly defines expected
> message
> > formats and schemas for various fields - but doesn't provide any sort of
> > reflection or way for a completely generic client to discover what's
> going
> > on from first principles. (There is no OpenAPI/Swagger for Flight!)
> >
> > As for schema evolution, I agree with what Micah proposes as a first
> step.
> > That would again add some overhead, perhaps. As for feasibility, at least
> > on the C++/Python side, I think there would be a decent amount of
> > refactoring needed, and there's also the question of how to expose this
> in
> > the API - the APIs there are based on reader/writer interfaces that don't
> > expose schema evolution.
> >
> > It may be cleaner on the Java side given you've poked there already. That
> > said, even if the Flight API is flexible but not so convenient,
> presumably
> > part of the value of Barrage is to take that and present a clean
> interface
> > with a stable schema again.
> >
> > Best,
> > David
> >
> > On Tue, Mar 9, 2021, at 00:03, Micah Kornfield wrote:
> > > >
> > > > You know what? This is actually a nicer solution than I am giving it
> > > > credit for. I've been trying to think about how to handle the
> > > > Integer.MAX_VALUE limit that arrow strongly suggests to maintain
> > > > compatibility with Java, while still respecting the need to apply an
> > update
> > > > atomically.
> > >
> > > For Flight, the contraint actually is maximum of a 32-bit length
> payload
> > > (I don't recall exactly if it is 2GB or 4GB but either way, you are
> > > probably going to run into issues sending a single payload anywhere
> near
> > > that large).
> > >
> > > Are you suggesting this pattern of messages per incremental update?
> > > > - FlightData with [the new] metadata header that includes
> > > > added/removed/modified information, the number of add record batches,
> > and
> > > > the number of modified record batches. Noting that there could be
> more
> > than
> > > > one record batch per added or modified to enable serializing more
> than
> > > > 2^31-1 rows in a single update. Also noting that it would have an
> empty
> > > > body (similar to Schema).
> > > > - A set of FlightData record batches using the normal RecordBatch
> > > > flatbuffer.
> > > > - A set of FlightData record batches also using the normal
> RecordBatch
> > > > flatbuffer.
> > >
> > >
> > > I haven't thought too deeply about this too deeply. I think depending
> on
> > > recovery needs it could differ.  One place to start is avoid extra
> medata
> > > message, and just have a marker bit indicating there are more messages
> > that
> > > will be coming that are required to be in this transaction and another
> > > bit/value indicating end transaction.
> > >
> > > My biggest concern with this approach is that small updates are likely
> > > > going to have significant overhead. Maybe it won't matter, but it is
> > the
> > > > first thing thought that jumps out. We do typically coalesce updates
> > > > somewhere between 50ms and 1s depending on the sensitivity of the
> > listener;
> > > > so maybe that's enough to eliminate my concern. I might just need to
> > get
> > > > data/statistics to get a better feeling for this concern.
> > >
> > > I think this is definitely something to measure.  I wouldn't expect the
> > > performance differential to be that large.
> > >
> > > Regarding the schema evolution idea:
> > > > What can I do to get started? Does it make sense to target the
> feature
> > as
> > > > a new field in the protobuf so that it can be used in contexts with
> > other
> > > > header metadata types? Do you have time to riff on the format that
> will
> > > > apply to the other contexts? I believe all I would need is a bitset
> > > > identifying which columns are included, but if enabling/disabling
> > features
> > > > is a nice-to-have then a bitset is going to be a bit weak. I can
> also,
> > for
> > > > now, cheat and send empty field nodes and empty buffers for those
> > columns
> > > > (but I am, already, slightly concerned with overhead).
> > >
> > > I think David might be able to give more guidance.  My recollection of
> > the
> > > library specifics are hazy, but I think we could potentially just
> > interpret
> > > a new schema arriving as indicating all record batches after that
> schema
> > > would follow the new schema.  Would that work for your use case?  David
> > > would probably be able to give guidance on how feasible a change like
> > that
> > > would be.  Typically, before we officially alter the specification we
> > want
> > > to see working implementation in Java and C++ that pass an integration
> > > test.  But I think we can figure out the specifics here if we can
> > > understand concrete requirements.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Mar 8, 2021 at 6:42 PM Nate Bauernfeind <
> > > natebauernfeind@deephaven.io> wrote:
> > >
> > > > >note that FlightData already has a separate app_metadata field
> > > >
> > > > That is an interesting point; are there any conventions on how to use
> > the
> > > > app_metadata compatibly without stepping on other ideas/projects
> doing
> > the
> > > > same? It would be convenient for the server to verify that the client
> > is
> > > > making the request that the server interprets. Do projects use a
> magic
> > > > number prefix? Or possibly is there some sort of common header? I
> > suspect
> > > > that other projects may benefit from having the ability to publish
> > > > incremental updates, too. So, I'm just curious if there is any
> > pre-existing
> > > > domain-knowledge in this respect.
> > > >
> > > > Nate
> > > >
> > > > On Mon, Mar 8, 2021 at 1:55 PM David Li <li...@apache.org> wrote:
> > > >
> > > > > Hey - pretty much, I think. I'd just like to note that FlightData
> > already
> > > > > has a separate app_metadata field, for metadata on top of any
> > Arrow-level
> > > > > data, so you could ship the Barrage metadata alongside the first
> > record
> > > > > batch, without having to modify anything about the record batch
> > itself,
> > > > and
> > > > > without having to define a new metadata header at the Arrow level -
> > > > > everything could be implemented on top of the existing definitions.
> > > > >
> > > > > David
> > > > >
> > > > > On Sat, Mar 6, 2021, at 01:07, Nate Bauernfeind wrote:
> > > > > > Eww. I didn't specify why I had two sets of record batches.
> > Slightly
> > > > > > revised:
> > > > > >
> > > > > > Are you suggesting this pattern of messages per incremental
> update?
> > > > > > - FlightData with [the new] metadata header that includes
> > > > > > added/removed/modified information, the number of add record
> > batches,
> > > > and
> > > > > > the number of modified record batches. Noting that there could be
> > more
> > > > > than
> > > > > > one record batch per added or modified to enable serializing more
> > than
> > > > > > 2^31-1 rows in a single update. Also noting that it would have an
> > empty
> > > > > > body (similar to Schema).
> > > > > > - A set of FlightData record batches using the normal RecordBatch
> > > > > > flatbuffer for added rows.
> > > > > > - A set of FlightData record batches also using the normal
> > RecordBatch
> > > > > > flatbuffer for modified rows.
> > > > > >
> > > > > > On Fri, Mar 5, 2021 at 11:00 PM Nate Bauernfeind <
> > > > > > natebauernfeind@deephaven.io> wrote:
> > > > > >
> > > > > > > > It seems that atomic application could also be something
> > controlled
> > > > > in
> > > > > > > metadata (i.e. this is batch 1 or X)?
> > > > > > >
> > > > > > > You know what? This is actually a nicer solution than I am
> > giving it
> > > > > > > credit for. I've been trying to think about how to handle the
> > > > > > > Integer.MAX_VALUE limit that arrow strongly suggests to
> maintain
> > > > > > > compatibility with Java, while still respecting the need to
> > apply an
> > > > > update
> > > > > > > atomically.
> > > > > > >
> > > > > > > Alright, yeah, I'm game with this approach.
> > > > > > >
> > > > > > > > Right - presumably this could go in the Flight metadata
> > instead of
> > > > > > > having to be inlined into the batch's metadata.
> > > > > > >
> > > > > > > I'm not sure I follow. These fields (addedRows,
> > addedRowsIncluded,
> > > > > > > removedRows, modifiedRows, and modifiedRowsIncluded) apply only
> > to a
> > > > > > > specific atomic incremental update. For a given update these
> are
> > the
> > > > > > > indices for the rows that were added/removed/modified -- and
> > > > therefore
> > > > > > > cannot be part of the "global" Flight metadata.
> > > > > > >
> > > > > > > Are you suggesting this pattern of messages per incremental
> > update?
> > > > > > > - FlightData with [the new] metadata header that includes
> > > > > > > added/removed/modified information, the number of add record
> > batches,
> > > > > and
> > > > > > > the number of modified record batches. Noting that there could
> be
> > > > more
> > > > > than
> > > > > > > one record batch per added or modified to enable serializing
> more
> > > > than
> > > > > > > 2^31-1 rows in a single update. Also noting that it would have
> an
> > > > empty
> > > > > > > body (similar to Schema).
> > > > > > > - A set of FlightData record batches using the normal
> RecordBatch
> > > > > > > flatbuffer.
> > > > > > > - A set of FlightData record batches also using the normal
> > > > RecordBatch
> > > > > > > flatbuffer.
> > > > > > >
> > > > > > > My biggest concern with this approach is that small updates are
> > > > likely
> > > > > > > going to have significant overhead. Maybe it won't matter, but
> > it is
> > > > > the
> > > > > > > first thing thought that jumps out. We do typically coalesce
> > updates
> > > > > > > somewhere between 50ms and 1s depending on the sensitivity of
> the
> > > > > listener;
> > > > > > > so maybe that's enough to eliminate my concern. I might just
> > need to
> > > > > get
> > > > > > > data/statistics to get a better feeling for this concern.
> > > > > > >
> > > > > > > Regarding the schema evolution idea:
> > > > > > > What can I do to get started? Does it make sense to target the
> > > > feature
> > > > > as
> > > > > > > a new field in the protobuf so that it can be used in contexts
> > with
> > > > > other
> > > > > > > header metadata types? Do you have time to riff on the format
> > that
> > > > will
> > > > > > > apply to the other contexts? I believe all I would need is a
> > bitset
> > > > > > > identifying which columns are included, but if
> enabling/disabling
> > > > > features
> > > > > > > is a nice-to-have then a bitset is going to be a bit weak. I
> can
> > > > also,
> > > > > for
> > > > > > > now, cheat and send empty field nodes and empty buffers for
> those
> > > > > columns
> > > > > > > (but I am, already, slightly concerned with overhead).
> > > > > > >
> > > > > > > So, based on the feedback so far, I should be able to boil down
> > the
> > > > > way I
> > > > > > > integrate with Arrow to, more or less, a pair of flatbuffers.
> I'm
> > > > > going to
> > > > > > > start riffing on these changes and see where I end up. Feel
> free
> > to
> > > > > jump up
> > > > > > > and down if I misunderstood you.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 5, 2021 at 9:23 PM Micah Kornfield <
> > > > emkornfield@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> >
> > > > > > >> > And then having two sets of buffers, is the same as having
> two
> > > > > record
> > > > > > >> > batches, albeit you need both sets to be delivered together,
> > as
> > > > > noted.
> > > > > > >>
> > > > > > >>
> > > > > > >> It seems that atomic application could also be something
> > controlled
> > > > in
> > > > > > >> metadata (i.e. this is batch 1 or X)?
> > > > > > >>
> > > > > > >> The schema evolution question is interesting, it could be
> > useful in
> > > > > other
> > > > > > >> contexts as well.  (e.g. switching dictionary encoding
> on/off).
> > > > > > >>
> > > > > > >> -Micah
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Mar 5, 2021 at 11:42 AM David Li <lidavidm@apache.org
> >
> > > > wrote:
> > > > > > >>
> > > > > > >> > (responses inline)
> > > > > > >> >
> > > > > > >> > On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote:
> > > > > > >> > > Regarding the BarrageRecordBatch:
> > > > > > >> > >
> > > > > > >> > > I have been concatenating them; it’s one batch with two
> > sets of
> > > > > arrow
> > > > > > >> > > payloads. They don’t have separate metadata headers; the
> > update
> > > > > is to
> > > > > > >> be
> > > > > > >> > > applied atomically. I have only studied the Java Arrow
> > Flight
> > > > > > >> > > implementation, and I believe it is usable maybe with some
> > minor
> > > > > > >> changes.
> > > > > > >> > > The piece of code in Flight that does the deserialization
> > takes
> > > > > two
> > > > > > >> > > parallel lists/iterators, a `Buffer` list (these describe
> > the
> > > > > length
> > > > > > >> of a
> > > > > > >> > > section of the body payload) and a `FieldNode` list (these
> > > > > describe
> > > > > > >> num
> > > > > > >> > > rows and null_count). Each field node is 2-3 buffers
> > depending
> > > > on
> > > > > > >> schema
> > > > > > >> > > type. Buffers are allowed to have length of 0, to omit
> their
> > > > > payloads;
> > > > > > >> > > this, for example, is how you omit the validity buffer
> when
> > > > > > >> null_count is
> > > > > > >> > > zero.
> > > > > > >> > >
> > > > > > >> > > The proposed barrage payload keeps this structural pattern
> > (list
> > > > > of
> > > > > > >> > buffer,
> > > > > > >> > > list of field node) with the following modifications:
> > > > > > >> > > - we only include field nodes / buffers for subscribed
> > columns
> > > > > > >> > > - the first set of field nodes are for added rows; these
> > may be
> > > > > > >> omitted
> > > > > > >> > if
> > > > > > >> > > there are no added rows included in the update
> > > > > > >> > > - the second set of field nodes are for modified rows; we
> > omit
> > > > > columns
> > > > > > >> > that
> > > > > > >> > > have no modifications included in the update
> > > > > > >> > >
> > > > > > >> > > I believe the only thing that is missing is the ability to
> > > > > control the
> > > > > > >> > > field types to be deserialized (like a third list/iterator
> > > > > parallel to
> > > > > > >> > > field nodes and buffers).
> > > > > > >> >
> > > > > > >> > Right. I think we're on the same page here, but looking at
> > this
> > > > from
> > > > > > >> > different angles. I think being able to control which
> columns
> > to
> > > > > > >> > deserialize/being able to only include a subset of buffers,
> is
> > > > > > >> essentially
> > > > > > >> > equivalent to having a stream with schema evolution. And
> then
> > > > > having two
> > > > > > >> > sets of buffers, is the same as having two record batches,
> > albeit
> > > > > you
> > > > > > >> need
> > > > > > >> > both sets to be delivered together, as noted. Regardless, we
> > can
> > > > > work
> > > > > > >> out
> > > > > > >> > how to handle this.
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > Note that the BarrageRecordBatch.addedRowsIncluded,
> > > > > > >> > > BarrageFieldNode.addedRows, BarrageFieldNode.modifiedRows
> > and
> > > > > > >> > > BarrageFieldNode.includedRows (all part of the flatbuffer
> > > > > metadata)
> > > > > > >> are
> > > > > > >> > > intended to be used by code one layer of abstraction
> higher
> > than
> > > > > that
> > > > > > >> > > actual wire-format parser. The parser doesn't really need
> > them
> > > > > except
> > > > > > >> to
> > > > > > >> > > know which columns to expect in the payload. Technically,
> we
> > > > could
> > > > > > >> encode
> > > > > > >> > > the field nodes / buffers as empty, too (but why be
> > wasteful if
> > > > > this
> > > > > > >> > > information is already encoded?).
> > > > > > >> >
> > > > > > >> > Right - presumably this could go in the Flight metadata
> > instead of
> > > > > > >> having
> > > > > > >> > to be inlined into the batch's metadata.
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > Regarding Browser Flight Support:
> > > > > > >> > >
> > > > > > >> > > Was this company FactSet by chance? (I saw they are
> > mentioned in
> > > > > the
> > > > > > >> JS
> > > > > > >> > > thread that recently was bumped on the dev list.)
> > > > > > >> > >
> > > > > > >> > > I looked at the ticket and wanted to comment how we are
> > handling
> > > > > > >> > > bi-directional streams for our web-ui. We use
> ArrowFlight's
> > > > > concept of
> > > > > > >> > > Ticket to allow a client to create and identify temporary
> > state
> > > > > (new
> > > > > > >> > tables
> > > > > > >> > > / views / REPL sessions / etc). Any bidirectional stream
> we
> > > > > support
> > > > > > >> also
> > > > > > >> > > has a server-streaming only variant with the ability for
> the
> > > > > client to
> > > > > > >> > > attach a Ticket to reference/identify that stream. The
> > client
> > > > may
> > > > > then
> > > > > > >> > send
> > > > > > >> > > a message, out-of-band, to the Ticket. They are sequenced
> > by the
> > > > > > >> client
> > > > > > >> > > (since gRPC doesn't guarantee ordered delivery) and
> > delivered to
> > > > > the
> > > > > > >> > piece
> > > > > > >> > > of code controlling that server-stream. It does require
> > that the
> > > > > > >> server
> > > > > > >> > be
> > > > > > >> > > a bit stateful; but it works =).
> > > > > > >> >
> > > > > > >> > I still can't figure out who it was and now I wonder if it
> > was all
> > > > > in my
> > > > > > >> > imagination. I'm hoping they'll see this and chime in, in
> the
> > > > > spirit of
> > > > > > >> > community participation :)
> > > > > > >> >
> > > > > > >> > I agree bidirectionality will be a challenge. I think
> > WebSockets
> > > > has
> > > > > > >> been
> > > > > > >> > proposed as well, but that is also stateful (well, as soon
> as
> > you
> > > > > have
> > > > > > >> > bidirectionality, you're going to have statefulness).
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > On Thu, Mar 4, 2021 at 6:58 AM David Li <
> > lidavidm@apache.org>
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Re: the multiple batches, that makes sense. In that
> case,
> > > > > depending
> > > > > > >> on
> > > > > > >> > how
> > > > > > >> > > > exactly the two record batches are laid out, I'd suggest
> > > > > > >> considering a
> > > > > > >> > > > Union of Struct columns (where a Struct is essentially
> > > > > > >> interchangeable
> > > > > > >> > with
> > > > > > >> > > > a record batch or table) - that would let you encode two
> > > > > distinct
> > > > > > >> > record
> > > > > > >> > > > batches inside the same physical batch. Or if the two
> > batches
> > > > > have
> > > > > > >> > > > identical schemas, you could just concatenate them and
> > include
> > > > > > >> indices
> > > > > > >> > in
> > > > > > >> > > > your metadata.
> > > > > > >> > > >
> > > > > > >> > > > As for browser Flight support - there's an existing
> > ticket:
> > > > > > >> > > > https://issues.apache.org/jira/browse/ARROW-9860
> > > > > > >> > > >
> > > > > > >> > > > I was sure I had seen another organization talking about
> > > > browser
> > > > > > >> > support
> > > > > > >> > > > recently, but now I can't find them. I'll update here if
> > I do
> > > > > figure
> > > > > > >> > it out.
> > > > > > >> > > >
> > > > > > >> > > > Best,
> > > > > > >> > > > David
> > > > > > >> > > >
> > > > > > >> > > > On Wed, Mar 3, 2021, at 21:00, Nate Bauernfeind wrote:
> > > > > > >> > > > > >  if each payload has two batches with different
> > purposes
> > > > > [...]
> > > > > > >> > > > >
> > > > > > >> > > > > The purposes of the payloads are slightly different,
> > however
> > > > > they
> > > > > > >> are
> > > > > > >> > > > > intended to be applied atomically. If there are
> > guarantees
> > > > by
> > > > > the
> > > > > > >> > table
> > > > > > >> > > > > operation generating the updates then those guarantees
> > are
> > > > > only
> > > > > > >> > valid on
> > > > > > >> > > > > each boundary of applying the update to your local
> > state.
> > > > In a
> > > > > > >> > sense, one
> > > > > > >> > > > > is relatively useless without the other. Record
> batches
> > fit
> > > > > well
> > > > > > >> in
> > > > > > >> > > > > map-reduce paradigms / algorithms, but what we have is
> > > > > stateful to
> > > > > > >> > > > > enable/support incremental updates. For example,
> > sorting a
> > > > > flight
> > > > > > >> of
> > > > > > >> > data
> > > > > > >> > > > > is best done map-reduce-style and requires one to
> > re-sort
> > > > the
> > > > > > >> entire
> > > > > > >> > data
> > > > > > >> > > > > set when it changes. Our approach focuses on producing
> > > > > incremental
> > > > > > >> > > > updates
> > > > > > >> > > > > which are used to manipulate your existing client
> state
> > > > using
> > > > > a
> > > > > > >> much
> > > > > > >> > > > > smaller footprint (in both time and space). You can
> > imagine,
> > > > > in
> > > > > > >> the
> > > > > > >> > sort
> > > > > > >> > > > > scenario, if you evaluate the table after adding rows
> > but
> > > > > before
> > > > > > >> > > > modifying
> > > > > > >> > > > > existing rows your table won’t be sorted between the
> two
> > > > > updates.
> > > > > > >> The
> > > > > > >> > > > > client would then need to wait until it receives the
> > pair of
> > > > > > >> > > > RecordBatches
> > > > > > >> > > > > anyways, so it seems more natural to deliver them
> > together.
> > > > > > >> > > > >
> > > > > > >> > > > > > As a side note - is said UI browser-based? Another
> > project
> > > > > > >> > recently was
> > > > > > >> > > > > planning to look at JavaScript support for Flight
> (using
> > > > > > >> WebSockets
> > > > > > >> > as
> > > > > > >> > > > the
> > > > > > >> > > > > transport, IIRC) and it might make sense to join
> forces
> > if
> > > > > that’s
> > > > > > >> a
> > > > > > >> > path
> > > > > > >> > > > > you were also going to pursue.
> > > > > > >> > > > >
> > > > > > >> > > > > Yes, our UI runs in the browser, although table
> > operations
> > > > > > >> > themselves run
> > > > > > >> > > > > on the server to keep the browser lean and fast. That
> > said,
> > > > > the
> > > > > > >> > browser
> > > > > > >> > > > > isn’t the only target for the API we’re iterating on.
> > We’re
> > > > > > >> engaged
> > > > > > >> > in a
> > > > > > >> > > > > rewrite to unify our “first-class” Java API for
> > intra-engine
> > > > > > >> (server,
> > > > > > >> > > > > heavyweight client) usage and our cross-language
> > > > > > >> > > > (Javascript/C++/C#/Python)
> > > > > > >> > > > > “open” API. Our existing customers use the engine to
> > drive
> > > > > > >> > multi-process
> > > > > > >> > > > > data applications, REPL/notebook experiences, and
> > > > dashboards.
> > > > > We
> > > > > > >> are
> > > > > > >> > > > > preserving these capabilities as we make the engine
> > > > available
> > > > > as
> > > > > > >> open
> > > > > > >> > > > > source software. One goal of the OSS effort is to
> > produce a
> > > > > > >> singular
> > > > > > >> > > > modern
> > > > > > >> > > > > API that’s more interoperable with the data science
> and
> > > > > > >> development
> > > > > > >> > > > > community as a whole. In the interest of minimizing
> > > > > entry/egress
> > > > > > >> > points,
> > > > > > >> > > > we
> > > > > > >> > > > > are migrating to gRPC for everything in addition to
> the
> > data
> > > > > IPC
> > > > > > >> > layer,
> > > > > > >> > > > so
> > > > > > >> > > > > not just the barrage/arrow-flight piece.
> > > > > > >> > > > >
> > > > > > >> > > > > The point of all this is to make the Deephaven engine
> as
> > > > > > >> accessible
> > > > > > >> > as
> > > > > > >> > > > > possible for a broad user base, including developers
> > using
> > > > > the API
> > > > > > >> > from
> > > > > > >> > > > > their language of choice or scripts/code running
> > co-located
> > > > > > >> within an
> > > > > > >> > > > > engine process. Our software can be used to explore or
> > build
> > > > > > >> > applications
> > > > > > >> > > > > and visualizations around static as well as real-time
> > data
> > > > > > >> (imagine
> > > > > > >> > > > joins,
> > > > > > >> > > > > aggregations, sorts, filters, time-series joins, etc),
> > > > perform
> > > > > > >> table
> > > > > > >> > > > > operations with code or with a few clicks in a GUI, or
> > as a
> > > > > > >> > > > building-block
> > > > > > >> > > > > in a multi-stage data pipeline. We think making
> > ourselves as
> > > > > > >> > > > interoperable
> > > > > > >> > > > > as possible with tools built on Arrow is an important
> > part
> > > > of
> > > > > > >> > attaining
> > > > > > >> > > > > this goal.
> > > > > > >> > > > >
> > > > > > >> > > > > That said, we have run into quite a few pain points
> > > > migrating
> > > > > to
> > > > > > >> > gRPC,
> > > > > > >> > > > such
> > > > > > >> > > > > as 1) no-client-side streaming is supported by any
> > browser,
> > > > 2)
> > > > > > >> today,
> > > > > > >> > > > > server-side streams require a proxy layer of some sort
> > (such
> > > > > as
> > > > > > >> > envoy),
> > > > > > >> > > > 3)
> > > > > > >> > > > > flatbuffer’s javascript/typescript support is a little
> > weak,
> > > > > and
> > > > > > >> I’m
> > > > > > >> > sure
> > > > > > >> > > > > there are others that aren’t coming to mind at the
> > moment.
> > > > We
> > > > > have
> > > > > > >> > some
> > > > > > >> > > > > interesting solutions to these problems, but, today,
> > these
> > > > > issues
> > > > > > >> > are a
> > > > > > >> > > > > decent chunk of our focus. That said, the UI is usable
> > today
> > > > > by
> > > > > > >> our
> > > > > > >> > > > > enterprise clients, but it interacts with the server
> > over
> > > > > > >> websockets
> > > > > > >> > and
> > > > > > >> > > > a
> > > > > > >> > > > > protocol that is heavily influenced by 10-years of
> > existing
> > > > > > >> > proprietary
> > > > > > >> > > > > java-to-java IPC (which are NOT friendly to being
> robust
> > > > over
> > > > > > >> > > > intermittent
> > > > > > >> > > > > failures). Today, we’re just heads-down going the gRPC
> > route
> > > > > and
> > > > > > >> > hoping
> > > > > > >> > > > > that eventually browsers get around to better support
> > for
> > > > > some of
> > > > > > >> > this
> > > > > > >> > > > > stuff (so, maybe one day a proxy isn’t required, etc).
> > Some
> > > > > of our
> > > > > > >> > RPCs
> > > > > > >> > > > > make most sense as bidirectional streams, but to
> > support our
> > > > > > >> web-ui
> > > > > > >> > we
> > > > > > >> > > > also
> > > > > > >> > > > > have a server-streaming variant that we can pass data
> to
> > > > > > >> > “out-of-band”
> > > > > > >> > > > via
> > > > > > >> > > > > a unary call referencing the particular server stream.
> > It’s
> > > > > fun
> > > > > > >> > stuff!
> > > > > > >> > > > I’m
> > > > > > >> > > > > actually very excited about it even if the text
> doesn’t
> > > > sound
> > > > > that
> > > > > > >> > way
> > > > > > >> > > > =).
> > > > > > >> > > > >
> > > > > > >> > > > > If you can point me to that project/person/post we’d
> > love to
> > > > > get
> > > > > > >> in
> > > > > > >> > touch
> > > > > > >> > > > > and are excited to share whatever can be shared.
> > > > > > >> > > > >
> > > > > > >> > > > > Nate
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Mar 3, 2021 at 4:22 PM David Li <
> > > > lidavidm@apache.org>
> > > > > > >> wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Ah okay, thank you for clarifying! In that case, if
> > each
> > > > > payload
> > > > > > >> > has
> > > > > > >> > > > two
> > > > > > >> > > > > > batches with different purposes - might it make
> sense
> > to
> > > > > just
> > > > > > >> make
> > > > > > >> > > > that two
> > > > > > >> > > > > > different payloads, and set a flag/enum in the
> > metadata to
> > > > > > >> indicate
> > > > > > >> > > > how to
> > > > > > >> > > > > > interpret the batch? Then you'd be officially the
> > same as
> > > > > Arrow
> > > > > > >> > Flight
> > > > > > >> > > > :)
> > > > > > >> > > > > >
> > > > > > >> > > > > > As a side note - is said UI browser-based? Another
> > project
> > > > > > >> > recently was
> > > > > > >> > > > > > planning to look at JavaScript support for Flight
> > (using
> > > > > > >> > WebSockets as
> > > > > > >> > > > the
> > > > > > >> > > > > > transport, IIRC) and it might make sense to join
> > forces if
> > > > > > >> that's a
> > > > > > >> > > > path
> > > > > > >> > > > > > you were also going to pursue.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Best,
> > > > > > >> > > > > > David
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Wed, Mar 3, 2021, at 18:05, Nate Bauernfeind
> wrote:
> > > > > > >> > > > > > > Thanks for the interest =).
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > However, if I understand right, you're sending
> > data
> > > > > without
> > > > > > >> a
> > > > > > >> > fixed
> > > > > > >> > > > > > > schema [...]
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > The dataset does have a known schema ahead of
> time,
> > > > which
> > > > > is
> > > > > > >> > similar
> > > > > > >> > > > to
> > > > > > >> > > > > > > Flight. However, as you point out, the
> subscription
> > can
> > > > > change
> > > > > > >> > which
> > > > > > >> > > > > > > columns it is interested in without re-acquiring
> > data
> > > > for
> > > > > > >> > columns it
> > > > > > >> > > > was
> > > > > > >> > > > > > > already subscribed to. This is mostly for
> > convenience.
> > > > We
> > > > > use
> > > > > > >> it
> > > > > > >> > > > > > primarily
> > > > > > >> > > > > > > to limit which columns are sent to our user
> > interface
> > > > > until
> > > > > > >> the
> > > > > > >> > user
> > > > > > >> > > > > > > scrolls them into view.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > The enhancement of the RecordBatch here, aside
> from
> > the
> > > > > > >> > additional
> > > > > > >> > > > > > > metadata, is only in that the payload has two sets
> > of
> > > > > > >> RecordBatch
> > > > > > >> > > > > > payloads.
> > > > > > >> > > > > > > The first payload is for added rows, every added
> row
> > > > must
> > > > > send
> > > > > > >> > data
> > > > > > >> > > > for
> > > > > > >> > > > > > > each column subscribed; based on the subscribed
> > columns
> > > > > this
> > > > > > >> is
> > > > > > >> > > > otherwise
> > > > > > >> > > > > > > fixed width (in the number of columns / buffers).
> > The
> > > > > second
> > > > > > >> > payload
> > > > > > >> > > > is
> > > > > > >> > > > > > for
> > > > > > >> > > > > > > modified rows. Here we only send the columns that
> > have
> > > > > rows
> > > > > > >> that
> > > > > > >> > are
> > > > > > >> > > > > > > modified. Aside from this difference, I have been
> > aiming
> > > > > to be
> > > > > > >> > > > compatible
> > > > > > >> > > > > > > enough to be able to reuse the payload parsing
> that
> > is
> > > > > already
> > > > > > >> > > > written
> > > > > > >> > > > > > for
> > > > > > >> > > > > > > Arrow.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > I don't quite see why it couldn't be carried as
> > > > > metadata on
> > > > > > >> the
> > > > > > >> > > > side
> > > > > > >> > > > > > of a
> > > > > > >> > > > > > > record batch, instead of having to duplicate the
> > record
> > > > > batch
> > > > > > >> > > > structure
> > > > > > >> > > > > > > [...]
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Whoa, this is a good point. I have iterated on
> this
> > a
> > > > few
> > > > > > >> times
> > > > > > >> > to
> > > > > > >> > > > get it
> > > > > > >> > > > > > > closer to Arrow's setup and did not realize that
> > > > > 'BarrageData'
> > > > > > >> > is now
> > > > > > >> > > > > > > officially identical to `FlightData`. This is an
> > > > instance
> > > > > of
> > > > > > >> > being
> > > > > > >> > > > too
> > > > > > >> > > > > > > close to the project and forgetting to step back
> > once
> > > > in a
> > > > > > >> while.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Flight already has a bidirectional streaming
> > endpoint,
> > > > > > >> > DoExchange,
> > > > > > >> > > > that
> > > > > > >> > > > > > > allows arbitrary payloads (with mixed
> metadata/data
> > or
> > > > > only
> > > > > > >> one
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > > > > two), which seems like it should be able to cover
> > the
> > > > > > >> > > > SubscriptionRequest
> > > > > > >> > > > > > > endpoint.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > This is exactly the kind of feedback I'm looking
> > for! I
> > > > > wasn't
> > > > > > >> > > > seeing the
> > > > > > >> > > > > > > solution where the client-side stream doesn't
> > actually
> > > > > need
> > > > > > >> > payload
> > > > > > >> > > > and
> > > > > > >> > > > > > > that the subscription changes can be described
> with
> > > > > another
> > > > > > >> > > > flatbuffer
> > > > > > >> > > > > > > metadata type. I like that.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks David!
> > > > > > >> > > > > > > Nate
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Wed, Mar 3, 2021 at 3:28 PM David Li <
> > > > > lidavidm@apache.org>
> > > > > > >> > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Hey Nate,
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Thanks for sharing this & for the detailed docs
> > and
> > > > > > >> writeup. I
> > > > > > >> > > > think
> > > > > > >> > > > > > your
> > > > > > >> > > > > > > > use case is interesting, but I'd like to clarify
> > a few
> > > > > > >> things.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > I would say Arrow Flight doesn't try to impose a
> > > > > particular
> > > > > > >> > model,
> > > > > > >> > > > but
> > > > > > >> > > > > > I
> > > > > > >> > > > > > > > agree that Barrage does things that aren't
> easily
> > > > doable
> > > > > > >> with
> > > > > > >> > > > Flight.
> > > > > > >> > > > > > > > Flight does name concepts in a way that suggests
> > how
> > > > to
> > > > > > >> apply
> > > > > > >> > it to
> > > > > > >> > > > > > > > something that looks like a database, but you
> can
> > > > mostly
> > > > > > >> think
> > > > > > >> > of
> > > > > > >> > > > > > Flight as
> > > > > > >> > > > > > > > an efficient way to transfer Arrow data over the
> > > > network
> > > > > > >> upon
> > > > > > >> > which
> > > > > > >> > > > > > you can
> > > > > > >> > > > > > > > layer further semantics.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > However, if I understand right, you're sending
> > data
> > > > > without
> > > > > > >> a
> > > > > > >> > fixed
> > > > > > >> > > > > > > > schema, in the sense that each
> BarrageRecordBatch
> > may
> > > > > have
> > > > > > >> > only a
> > > > > > >> > > > > > subset of
> > > > > > >> > > > > > > > the columns declared up front, or may carry new
> > > > > columns? I
> > > > > > >> > think
> > > > > > >> > > > this
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > the main thing you can't easily do currently, as
> > > > Flight
> > > > > (and
> > > > > > >> > Arrow
> > > > > > >> > > > IPC
> > > > > > >> > > > > > in
> > > > > > >> > > > > > > > general) assumes a fixed schema (and expects all
> > > > > columns in
> > > > > > >> a
> > > > > > >> > > > batch to
> > > > > > >> > > > > > have
> > > > > > >> > > > > > > > the same length).
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Otherwise, the encoding for identifying rows and
> > > > > changes is
> > > > > > >> > > > > > interesting,
> > > > > > >> > > > > > > > but I don't quite see why it couldn't be carried
> > as
> > > > > metadata
> > > > > > >> > on the
> > > > > > >> > > > > > side of
> > > > > > >> > > > > > > > a record batch, instead of having to duplicate
> the
> > > > > record
> > > > > > >> batch
> > > > > > >> > > > > > structure,
> > > > > > >> > > > > > > > except for the aforementioned schema issue. And
> in
> > > > that
> > > > > > >> case it
> > > > > > >> > > > might
> > > > > > >> > > > > > be
> > > > > > >> > > > > > > > better to work out the schema evolution issue &
> > any
> > > > > > >> ergonomic
> > > > > > >> > > > issues
> > > > > > >> > > > > > with
> > > > > > >> > > > > > > > Flight's existing metadata fields/API that would
> > > > > prevent you
> > > > > > >> > from
> > > > > > >> > > > using
> > > > > > >> > > > > > > > them, as that way you (and we!) don't have to
> > fully
> > > > > > >> duplicate
> > > > > > >> > one
> > > > > > >> > > > of
> > > > > > >> > > > > > > > Arrow's format definitions. Similarly, Flight
> > already
> > > > > has a
> > > > > > >> > > > > > bidirectional
> > > > > > >> > > > > > > > streaming endpoint, DoExchange, that allows
> > arbitrary
> > > > > > >> payloads
> > > > > > >> > > > (with
> > > > > > >> > > > > > mixed
> > > > > > >> > > > > > > > metadata/data or only one of the two), which
> seems
> > > > like
> > > > > it
> > > > > > >> > should
> > > > > > >> > > > be
> > > > > > >> > > > > > able
> > > > > > >> > > > > > > > to cover the SubscriptionRequest endpoint.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Best,
> > > > > > >> > > > > > > > David
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > On Wed, Mar 3, 2021, at 16:08, Nate Bauernfeind
> > wrote:
> > > > > > >> > > > > > > > > Hello,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > My colleagues at Deephaven Data Labs and I
> have
> > been
> > > > > > >> > addressing
> > > > > > >> > > > > > problems
> > > > > > >> > > > > > > > at
> > > > > > >> > > > > > > > > the intersection of data-driven applications,
> > data
> > > > > > >> science,
> > > > > > >> > and
> > > > > > >> > > > > > updating
> > > > > > >> > > > > > > > > (/ticking) data for some years.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Deephaven has a query engine that supports
> > updating
> > > > > > >> tabular
> > > > > > >> > data
> > > > > > >> > > > via
> > > > > > >> > > > > > a
> > > > > > >> > > > > > > > > protocol that communicates precise changes
> about
> > > > > datasets,
> > > > > > >> > such
> > > > > > >> > > > as 1)
> > > > > > >> > > > > > > > which
> > > > > > >> > > > > > > > > rows were removed, 2) which rows were added,
> 3)
> > > > which
> > > > > rows
> > > > > > >> > were
> > > > > > >> > > > > > modified
> > > > > > >> > > > > > > > > (and for which columns). We are inspired by
> > Arrow
> > > > and
> > > > > > >> would
> > > > > > >> > like
> > > > > > >> > > > to
> > > > > > >> > > > > > > > adopt a
> > > > > > >> > > > > > > > > version of this protocol that adheres to goals
> > > > > similar to
> > > > > > >> > Arrow
> > > > > > >> > > > and
> > > > > > >> > > > > > Arrow
> > > > > > >> > > > > > > > > Flight.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Out of the box, Arrow Flight is insufficient
> to
> > > > > represent
> > > > > > >> > such a
> > > > > > >> > > > > > stream
> > > > > > >> > > > > > > > of
> > > > > > >> > > > > > > > > changes. For example, because you cannot
> > identify a
> > > > > > >> > particular
> > > > > > >> > > > row
> > > > > > >> > > > > > within
> > > > > > >> > > > > > > > > an Arrow Flight, you cannot indicate which
> rows
> > were
> > > > > > >> removed
> > > > > > >> > or
> > > > > > >> > > > > > modified.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > The project integrates with Arrow Flight at
> the
> > > > > > >> > header-metadata
> > > > > > >> > > > > > level. We
> > > > > > >> > > > > > > > > have preliminarily named the project Barrage
> as
> > in a
> > > > > > >> > "barrage of
> > > > > > >> > > > > > arrows"
> > > > > > >> > > > > > > > > which plays in the same "namespace" as a
> > "flight of
> > > > > > >> arrows."
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > We built this as part of an initiative to
> > modernize
> > > > > and
> > > > > > >> open
> > > > > > >> > up
> > > > > > >> > > > our
> > > > > > >> > > > > > table
> > > > > > >> > > > > > > > > IPC mechanisms. This is part of a larger open
> > source
> > > > > > >> effort
> > > > > > >> > which
> > > > > > >> > > > > > will
> > > > > > >> > > > > > > > > become more visible in the next month or so
> once
> > > > we've
> > > > > > >> > finished
> > > > > > >> > > > the
> > > > > > >> > > > > > work
> > > > > > >> > > > > > > > > necessary to share our core software
> components,
> > > > > > >> including a
> > > > > > >> > > > unified
> > > > > > >> > > > > > > > static
> > > > > > >> > > > > > > > > and real time query engine complete with data
> > > > > > >> visualization
> > > > > > >> > > > tools, a
> > > > > > >> > > > > > REPL
> > > > > > >> > > > > > > > > experience, Jupyter integration, and more.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I would like to find out:
> > > > > > >> > > > > > > > > - if we have understood the primary goals of
> > Arrow,
> > > > > and
> > > > > > >> are
> > > > > > >> > > > honoring
> > > > > > >> > > > > > them
> > > > > > >> > > > > > > > > as closely as possible
> > > > > > >> > > > > > > > > - if there are other projects that might
> benefit
> > > > from
> > > > > > >> sharing
> > > > > > >> > > > this
> > > > > > >> > > > > > > > > extension of Arrow Flight
> > > > > > >> > > > > > > > > - if there are any gaps that are best
> addressed
> > > > early
> > > > > on
> > > > > > >> to
> > > > > > >> > > > maximize
> > > > > > >> > > > > > > > future
> > > > > > >> > > > > > > > > compatibility
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > A great place to digest the concepts that
> differ
> > > > from
> > > > > > >> Arrow
> > > > > > >> > > > Flight
> > > > > > >> > > > > > are
> > > > > > >> > > > > > > > here:
> > > > > > >> > > > > > > > >
> > https://deephaven.github.io/barrage/Concepts.html
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > The proposed protocol can be perused here:
> > > > > > >> > > > > > > > > https://github.com/deephaven/barrage
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Internally, we already have a java server and
> > java
> > > > > client
> > > > > > >> > > > > > implemented as
> > > > > > >> > > > > > > > a
> > > > > > >> > > > > > > > > working proof of concept for our use case.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I really look forward to your feedback; thank
> > you!
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Nate Bauernfeind
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Deephaven Data Labs - https://deephaven.io/
> > > > > > >> > > > > > > > > --
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > --
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > >
> >
>

Re: [Flight Extension] Request for Comments

Posted by Nate Bauernfeind <na...@deephaven.io>.

The thread isn't stale, and this is an appropriate question.

Caveat; I have not yet finished applying the feedback from this thread. So,
some of what I say below is not yet reflected in the oss offering (nor is
it reflected in the existing main branch of the barrage repo).

IMO there are two kinds of listener patterns:

1) The listener who wants to listen straight from Arrow. This listener
would initiate an Arrow Flight DoExchange to initiate the subscription (the
details of the subscription stored inside of a flatbuffer type
SubscriptionRequest encoded in the app_metadata of the client sent
FlightData). An update is a set of sequential RecordBatches. The first
RecordBatch of an update has an app_metadata flatbuffer for
BarrageUpdateMetadata (includes information like which rows were added,
modified, removed, etc). This metadata includes the number of add record
batches and the number of mod record batches. First the add batches come,
then the mod batches. In aggregate, this set of record batches represent a
full update. Thus, the listener could receive the set of batches, and the
metadata for the update all at once.

2) The listener who wants a shared object to maintain the data in the
subscription (might be a subscription on the whole table), which provides a
lighter callback that only describes which rows changed (not the data;
since the shared object can be asked). This pattern is ideal if you have
multiple listeners for the same set of data.

Our Java client adopted approach #2; our OSS offering contains a java
worker process that executes table operations. The table operations are
applied iteratively using a very similar update mechanism to the IPC
format. Our java client implementation allows you to pump the results of a
subscription into the local query engine's update mechanism; this
effectively chains multiple workers together.

Here is that implementation; note it is in Java, it doesn't technically use
the official implementation of arrow flight, and it doesn't reflect all of
the feedback we would like to apply.
https://github.com/nbauernfeind/deephaven-core/blob/doput/grpc-api-client/src/main/java/io/deephaven/grpc_api_client/table/BarrageSourcedTable.java

Our query engine listener interface is here:
https://github.com/deephaven/deephaven-core/blob/main/DB/src/main/java/io/deephaven/db/v2/ShiftAwareListener.java.
Again, it is very similar to the IPC just without the accompanying data;
the actual row data is intended to be accessed via other APIs.

For our C++ client, we are planning on stopping at the BarrageSourcedTable
equivalent. Our users could choose between the arrow flight stream or the
slightly more language friendly version that maintains the view. However,
if they want to do any client side analysis of the data they are on their
own (no filtering, aggregations, no ticking aside from the gRPC
subscription, etc).

Nate

P.S. While the deephaven-core repo is technically public, it is relatively
young and will move through a few API-breaking changes over the next few
months. (For example, applying the feedback from earlier in the thread will
break some of what exists today.)

On Tue, Jun 1, 2021 at 9:43 PM Paul Whalen <pg...@gmail.com> wrote:

> Hopefully this thread isn't too stale to pick back up with an open ended
> question.  What interface would a Barrage client library expose?  With
> Flight, application code cares about RecordBatches, but with Barrage it
> seems as though a client library ought to handle the updating of the table
> and expose that updated view to a client application.  But what
> specifically would that view be?
>
> In the last few months I've built out some Flight services that would
> benefit from a protocol like Barrage, and it renewed my interest enough to
> casually start a Go implementation based on Nate's documentation, just as a
> way of wrapping my head around the problem.  I was watching the repo Nate
> shared which ultimately led to the Java implementation embedded in
> Deephaven's open source offering, but since that is part of a larger
> application, it's a little hard to tell where the lines would be drawn.
>
> Paul
>
> On Tue, Mar 9, 2021 at 9:45 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > >
> > > As for schema evolution, I agree with what Micah proposes as a first
> > step.
> > > That would again add some overhead, perhaps. As for feasibility, at
> least
> > > on the C++/Python side, I think there would be a decent amount of
> > > refactoring needed, and there's also the question of how to expose this
> > in
> > > the API - the APIs there are based on reader/writer interfaces that
> don't
> > > expose schema evolution.
> >
> > One more option that might be too slow, is if a schema change is
> necessary,
> > a new flight endpoint is communicated and a new RPC is used?  (reusing
> the
> > same underlying channel could mitigate some performance issues here).
> >
> > On Tue, Mar 9, 2021 at 3:17 PM David Li <li...@apache.org> wrote:
> >
> > > There's not really any convention for the app_metadata field or any of
> > the
> > > other application-defined fields (e.g. DoAction, Criteria). That said,
> I
> > > wouldn't necessarily worry about conflicting with other projects - if a
> > > client connects to a Barrage service, presumably it knows what to
> expect.
> > > And an arbitrary Flight client connecting to an arbitrary Flight server
> > > isn't really something we've thought about. For instance, see the
> Flight
> > > SQL proposal on this mailing list, which similarly defines expected
> > message
> > > formats and schemas for various fields - but doesn't provide any sort
> of
> > > reflection or way for a completely generic client to discover what's
> > going
> > > on from first principles. (There is no OpenAPI/Swagger for Flight!)
> > >
> > > As for schema evolution, I agree with what Micah proposes as a first
> > step.
> > > That would again add some overhead, perhaps. As for feasibility, at
> least
> > > on the C++/Python side, I think there would be a decent amount of
> > > refactoring needed, and there's also the question of how to expose this
> > in
> > > the API - the APIs there are based on reader/writer interfaces that
> don't
> > > expose schema evolution.
> > >
> > > It may be cleaner on the Java side given you've poked there already.
> That
> > > said, even if the Flight API is flexible but not so convenient,
> > presumably
> > > part of the value of Barrage is to take that and present a clean
> > interface
> > > with a stable schema again.
> > >
> > > Best,
> > > David
> > >
> > > On Tue, Mar 9, 2021, at 00:03, Micah Kornfield wrote:
> > > > >
> > > > > You know what? This is actually a nicer solution than I am giving
> it
> > > > > credit for. I've been trying to think about how to handle the
> > > > > Integer.MAX_VALUE limit that arrow strongly suggests to maintain
> > > > > compatibility with Java, while still respecting the need to apply
> an
> > > update
> > > > > atomically.
> > > >
> > > > For Flight, the contraint actually is maximum of a 32-bit length
> > payload
> > > > (I don't recall exactly if it is 2GB or 4GB but either way, you are
> > > > probably going to run into issues sending a single payload anywhere
> > near
> > > > that large).
> > > >
> > > > Are you suggesting this pattern of messages per incremental update?
> > > > > - FlightData with [the new] metadata header that includes
> > > > > added/removed/modified information, the number of add record
> batches,
> > > and
> > > > > the number of modified record batches. Noting that there could be
> > more
> > > than
> > > > > one record batch per added or modified to enable serializing more
> > than
> > > > > 2^31-1 rows in a single update. Also noting that it would have an
> > empty
> > > > > body (similar to Schema).
> > > > > - A set of FlightData record batches using the normal RecordBatch
> > > > > flatbuffer.
> > > > > - A set of FlightData record batches also using the normal
> > RecordBatch
> > > > > flatbuffer.
> > > >
> > > >
> > > > I haven't thought too deeply about this too deeply. I think depending
> > on
> > > > recovery needs it could differ.  One place to start is avoid extra
> > medata
> > > > message, and just have a marker bit indicating there are more
> messages
> > > that
> > > > will be coming that are required to be in this transaction and
> another
> > > > bit/value indicating end transaction.
> > > >
> > > > My biggest concern with this approach is that small updates are
> likely
> > > > > going to have significant overhead. Maybe it won't matter, but it
> is
> > > the
> > > > > first thing thought that jumps out. We do typically coalesce
> updates
> > > > > somewhere between 50ms and 1s depending on the sensitivity of the
> > > listener;
> > > > > so maybe that's enough to eliminate my concern. I might just need
> to
> > > get
> > > > > data/statistics to get a better feeling for this concern.
> > > >
> > > > I think this is definitely something to measure.  I wouldn't expect
> the
> > > > performance differential to be that large.
> > > >
> > > > Regarding the schema evolution idea:
> > > > > What can I do to get started? Does it make sense to target the
> > feature
> > > as
> > > > > a new field in the protobuf so that it can be used in contexts with
> > > other
> > > > > header metadata types? Do you have time to riff on the format that
> > will
> > > > > apply to the other contexts? I believe all I would need is a bitset
> > > > > identifying which columns are included, but if enabling/disabling
> > > features
> > > > > is a nice-to-have then a bitset is going to be a bit weak. I can
> > also,
> > > for
> > > > > now, cheat and send empty field nodes and empty buffers for those
> > > columns
> > > > > (but I am, already, slightly concerned with overhead).
> > > >
> > > > I think David might be able to give more guidance.  My recollection
> of
> > > the
> > > > library specifics are hazy, but I think we could potentially just
> > > interpret
> > > > a new schema arriving as indicating all record batches after that
> > schema
> > > > would follow the new schema.  Would that work for your use case?
> David
> > > > would probably be able to give guidance on how feasible a change like
> > > that
> > > > would be.  Typically, before we officially alter the specification we
> > > want
> > > > to see working implementation in Java and C++ that pass an
> integration
> > > > test.  But I think we can figure out the specifics here if we can
> > > > understand concrete requirements.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Mar 8, 2021 at 6:42 PM Nate Bauernfeind <
> > > > natebauernfeind@deephaven.io> wrote:
> > > >
> > > > > >note that FlightData already has a separate app_metadata field
> > > > >
> > > > > That is an interesting point; are there any conventions on how to
> use
> > > the
> > > > > app_metadata compatibly without stepping on other ideas/projects
> > doing
> > > the
> > > > > same? It would be convenient for the server to verify that the
> client
> > > is
> > > > > making the request that the server interprets. Do projects use a
> > magic
> > > > > number prefix? Or possibly is there some sort of common header? I
> > > suspect
> > > > > that other projects may benefit from having the ability to publish
> > > > > incremental updates, too. So, I'm just curious if there is any
> > > pre-existing
> > > > > domain-knowledge in this respect.
> > > > >
> > > > > Nate
> > > > >
> > > > > On Mon, Mar 8, 2021 at 1:55 PM David Li <li...@apache.org>
> wrote:
> > > > >
> > > > > > Hey - pretty much, I think. I'd just like to note that FlightData
> > > already
> > > > > > has a separate app_metadata field, for metadata on top of any
> > > Arrow-level
> > > > > > data, so you could ship the Barrage metadata alongside the first
> > > record
> > > > > > batch, without having to modify anything about the record batch
> > > itself,
> > > > > and
> > > > > > without having to define a new metadata header at the Arrow
> level -
> > > > > > everything could be implemented on top of the existing
> definitions.
> > > > > >
> > > > > > David
> > > > > >
> > > > > > On Sat, Mar 6, 2021, at 01:07, Nate Bauernfeind wrote:
> > > > > > > Eww. I didn't specify why I had two sets of record batches.
> > > Slightly
> > > > > > > revised:
> > > > > > >
> > > > > > > Are you suggesting this pattern of messages per incremental
> > update?
> > > > > > > - FlightData with [the new] metadata header that includes
> > > > > > > added/removed/modified information, the number of add record
> > > batches,
> > > > > and
> > > > > > > the number of modified record batches. Noting that there could
> be
> > > more
> > > > > > than
> > > > > > > one record batch per added or modified to enable serializing
> more
> > > than
> > > > > > > 2^31-1 rows in a single update. Also noting that it would have
> an
> > > empty
> > > > > > > body (similar to Schema).
> > > > > > > - A set of FlightData record batches using the normal
> RecordBatch
> > > > > > > flatbuffer for added rows.
> > > > > > > - A set of FlightData record batches also using the normal
> > > RecordBatch
> > > > > > > flatbuffer for modified rows.
> > > > > > >
> > > > > > > On Fri, Mar 5, 2021 at 11:00 PM Nate Bauernfeind <
> > > > > > > natebauernfeind@deephaven.io> wrote:
> > > > > > >
> > > > > > > > > It seems that atomic application could also be something
> > > controlled
> > > > > > in
> > > > > > > > metadata (i.e. this is batch 1 or X)?
> > > > > > > >
> > > > > > > > You know what? This is actually a nicer solution than I am
> > > giving it
> > > > > > > > credit for. I've been trying to think about how to handle the
> > > > > > > > Integer.MAX_VALUE limit that arrow strongly suggests to
> > maintain
> > > > > > > > compatibility with Java, while still respecting the need to
> > > apply an
> > > > > > update
> > > > > > > > atomically.
> > > > > > > >
> > > > > > > > Alright, yeah, I'm game with this approach.
> > > > > > > >
> > > > > > > > > Right - presumably this could go in the Flight metadata
> > > instead of
> > > > > > > > having to be inlined into the batch's metadata.
> > > > > > > >
> > > > > > > > I'm not sure I follow. These fields (addedRows,
> > > addedRowsIncluded,
> > > > > > > > removedRows, modifiedRows, and modifiedRowsIncluded) apply
> only
> > > to a
> > > > > > > > specific atomic incremental update. For a given update these
> > are
> > > the
> > > > > > > > indices for the rows that were added/removed/modified -- and
> > > > > therefore
> > > > > > > > cannot be part of the "global" Flight metadata.
> > > > > > > >
> > > > > > > > Are you suggesting this pattern of messages per incremental
> > > update?
> > > > > > > > - FlightData with [the new] metadata header that includes
> > > > > > > > added/removed/modified information, the number of add record
> > > batches,
> > > > > > and
> > > > > > > > the number of modified record batches. Noting that there
> could
> > be
> > > > > more
> > > > > > than
> > > > > > > > one record batch per added or modified to enable serializing
> > more
> > > > > than
> > > > > > > > 2^31-1 rows in a single update. Also noting that it would
> have
> > an
> > > > > empty
> > > > > > > > body (similar to Schema).
> > > > > > > > - A set of FlightData record batches using the normal
> > RecordBatch
> > > > > > > > flatbuffer.
> > > > > > > > - A set of FlightData record batches also using the normal
> > > > > RecordBatch
> > > > > > > > flatbuffer.
> > > > > > > >
> > > > > > > > My biggest concern with this approach is that small updates
> are
> > > > > likely
> > > > > > > > going to have significant overhead. Maybe it won't matter,
> but
> > > it is
> > > > > > the
> > > > > > > > first thing thought that jumps out. We do typically coalesce
> > > updates
> > > > > > > > somewhere between 50ms and 1s depending on the sensitivity of
> > the
> > > > > > listener;
> > > > > > > > so maybe that's enough to eliminate my concern. I might just
> > > need to
> > > > > > get
> > > > > > > > data/statistics to get a better feeling for this concern.
> > > > > > > >
> > > > > > > > Regarding the schema evolution idea:
> > > > > > > > What can I do to get started? Does it make sense to target
> the
> > > > > feature
> > > > > > as
> > > > > > > > a new field in the protobuf so that it can be used in
> contexts
> > > with
> > > > > > other
> > > > > > > > header metadata types? Do you have time to riff on the format
> > > that
> > > > > will
> > > > > > > > apply to the other contexts? I believe all I would need is a
> > > bitset
> > > > > > > > identifying which columns are included, but if
> > enabling/disabling
> > > > > > features
> > > > > > > > is a nice-to-have then a bitset is going to be a bit weak. I
> > can
> > > > > also,
> > > > > > for
> > > > > > > > now, cheat and send empty field nodes and empty buffers for
> > those
> > > > > > columns
> > > > > > > > (but I am, already, slightly concerned with overhead).
> > > > > > > >
> > > > > > > > So, based on the feedback so far, I should be able to boil
> down
> > > the
> > > > > > way I
> > > > > > > > integrate with Arrow to, more or less, a pair of flatbuffers.
> > I'm
> > > > > > going to
> > > > > > > > start riffing on these changes and see where I end up. Feel
> > free
> > > to
> > > > > > jump up
> > > > > > > > and down if I misunderstood you.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 5, 2021 at 9:23 PM Micah Kornfield <
> > > > > emkornfield@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> >
> > > > > > > >> > And then having two sets of buffers, is the same as having
> > two
> > > > > > record
> > > > > > > >> > batches, albeit you need both sets to be delivered
> together,
> > > as
> > > > > > noted.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> It seems that atomic application could also be something
> > > controlled
> > > > > in
> > > > > > > >> metadata (i.e. this is batch 1 or X)?
> > > > > > > >>
> > > > > > > >> The schema evolution question is interesting, it could be
> > > useful in
> > > > > > other
> > > > > > > >> contexts as well.  (e.g. switching dictionary encoding
> > on/off).
> > > > > > > >>
> > > > > > > >> -Micah
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Mar 5, 2021 at 11:42 AM David Li <
> lidavidm@apache.org
> > >
> > > > > wrote:
> > > > > > > >>
> > > > > > > >> > (responses inline)
> > > > > > > >> >
> > > > > > > >> > On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote:
> > > > > > > >> > > Regarding the BarrageRecordBatch:
> > > > > > > >> > >
> > > > > > > >> > > I have been concatenating them; it’s one batch with two
> > > sets of
> > > > > > arrow
> > > > > > > >> > > payloads. They don’t have separate metadata headers; the
> > > update
> > > > > > is to
> > > > > > > >> be
> > > > > > > >> > > applied atomically. I have only studied the Java Arrow
> > > Flight
> > > > > > > >> > > implementation, and I believe it is usable maybe with
> some
> > > minor
> > > > > > > >> changes.
> > > > > > > >> > > The piece of code in Flight that does the
> deserialization
> > > takes
> > > > > > two
> > > > > > > >> > > parallel lists/iterators, a `Buffer` list (these
> describe
> > > the
> > > > > > length
> > > > > > > >> of a
> > > > > > > >> > > section of the body payload) and a `FieldNode` list
> (these
> > > > > > describe
> > > > > > > >> num
> > > > > > > >> > > rows and null_count). Each field node is 2-3 buffers
> > > depending
> > > > > on
> > > > > > > >> schema
> > > > > > > >> > > type. Buffers are allowed to have length of 0, to omit
> > their
> > > > > > payloads;
> > > > > > > >> > > this, for example, is how you omit the validity buffer
> > when
> > > > > > > >> null_count is
> > > > > > > >> > > zero.
> > > > > > > >> > >
> > > > > > > >> > > The proposed barrage payload keeps this structural
> pattern
> > > (list
> > > > > > of
> > > > > > > >> > buffer,
> > > > > > > >> > > list of field node) with the following modifications:
> > > > > > > >> > > - we only include field nodes / buffers for subscribed
> > > columns
> > > > > > > >> > > - the first set of field nodes are for added rows; these
> > > may be
> > > > > > > >> omitted
> > > > > > > >> > if
> > > > > > > >> > > there are no added rows included in the update
> > > > > > > >> > > - the second set of field nodes are for modified rows;
> we
> > > omit
> > > > > > columns
> > > > > > > >> > that
> > > > > > > >> > > have no modifications included in the update
> > > > > > > >> > >
> > > > > > > >> > > I believe the only thing that is missing is the ability
> to
> > > > > > control the
> > > > > > > >> > > field types to be deserialized (like a third
> list/iterator
> > > > > > parallel to
> > > > > > > >> > > field nodes and buffers).
> > > > > > > >> >
> > > > > > > >> > Right. I think we're on the same page here, but looking at
> > > this
> > > > > from
> > > > > > > >> > different angles. I think being able to control which
> > columns
> > > to
> > > > > > > >> > deserialize/being able to only include a subset of
> buffers,
> > is
> > > > > > > >> essentially
> > > > > > > >> > equivalent to having a stream with schema evolution. And
> > then
> > > > > > having two
> > > > > > > >> > sets of buffers, is the same as having two record batches,
> > > albeit
> > > > > > you
> > > > > > > >> need
> > > > > > > >> > both sets to be delivered together, as noted. Regardless,
> we
> > > can
> > > > > > work
> > > > > > > >> out
> > > > > > > >> > how to handle this.
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > Note that the BarrageRecordBatch.addedRowsIncluded,
> > > > > > > >> > > BarrageFieldNode.addedRows,
> BarrageFieldNode.modifiedRows
> > > and
> > > > > > > >> > > BarrageFieldNode.includedRows (all part of the
> flatbuffer
> > > > > > metadata)
> > > > > > > >> are
> > > > > > > >> > > intended to be used by code one layer of abstraction
> > higher
> > > than
> > > > > > that
> > > > > > > >> > > actual wire-format parser. The parser doesn't really
> need
> > > them
> > > > > > except
> > > > > > > >> to
> > > > > > > >> > > know which columns to expect in the payload.
> Technically,
> > we
> > > > > could
> > > > > > > >> encode
> > > > > > > >> > > the field nodes / buffers as empty, too (but why be
> > > wasteful if
> > > > > > this
> > > > > > > >> > > information is already encoded?).
> > > > > > > >> >
> > > > > > > >> > Right - presumably this could go in the Flight metadata
> > > instead of
> > > > > > > >> having
> > > > > > > >> > to be inlined into the batch's metadata.
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > Regarding Browser Flight Support:
> > > > > > > >> > >
> > > > > > > >> > > Was this company FactSet by chance? (I saw they are
> > > mentioned in
> > > > > > the
> > > > > > > >> JS
> > > > > > > >> > > thread that recently was bumped on the dev list.)
> > > > > > > >> > >
> > > > > > > >> > > I looked at the ticket and wanted to comment how we are
> > > handling
> > > > > > > >> > > bi-directional streams for our web-ui. We use
> > ArrowFlight's
> > > > > > concept of
> > > > > > > >> > > Ticket to allow a client to create and identify
> temporary
> > > state
> > > > > > (new
> > > > > > > >> > tables
> > > > > > > >> > > / views / REPL sessions / etc). Any bidirectional stream
> > we
> > > > > > support
> > > > > > > >> also
> > > > > > > >> > > has a server-streaming only variant with the ability for
> > the
> > > > > > client to
> > > > > > > >> > > attach a Ticket to reference/identify that stream. The
> > > client
> > > > > may
> > > > > > then
> > > > > > > >> > send
> > > > > > > >> > > a message, out-of-band, to the Ticket. They are
> sequenced
> > > by the
> > > > > > > >> client
> > > > > > > >> > > (since gRPC doesn't guarantee ordered delivery) and
> > > delivered to
> > > > > > the
> > > > > > > >> > piece
> > > > > > > >> > > of code controlling that server-stream. It does require
> > > that the
> > > > > > > >> server
> > > > > > > >> > be
> > > > > > > >> > > a bit stateful; but it works =).
> > > > > > > >> >
> > > > > > > >> > I still can't figure out who it was and now I wonder if it
> > > was all
> > > > > > in my
> > > > > > > >> > imagination. I'm hoping they'll see this and chime in, in
> > the
> > > > > > spirit of
> > > > > > > >> > community participation :)
> > > > > > > >> >
> > > > > > > >> > I agree bidirectionality will be a challenge. I think
> > > WebSockets
> > > > > has
> > > > > > > >> been
> > > > > > > >> > proposed as well, but that is also stateful (well, as soon
> > as
> > > you
> > > > > > have
> > > > > > > >> > bidirectionality, you're going to have statefulness).
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Mar 4, 2021 at 6:58 AM David Li <
> > > lidavidm@apache.org>
> > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Re: the multiple batches, that makes sense. In that
> > case,
> > > > > > depending
> > > > > > > >> on
> > > > > > > >> > how
> > > > > > > >> > > > exactly the two record batches are laid out, I'd
> suggest
> > > > > > > >> considering a
> > > > > > > >> > > > Union of Struct columns (where a Struct is essentially
> > > > > > > >> interchangeable
> > > > > > > >> > with
> > > > > > > >> > > > a record batch or table) - that would let you encode
> two
> > > > > > distinct
> > > > > > > >> > record
> > > > > > > >> > > > batches inside the same physical batch. Or if the two
> > > batches
> > > > > > have
> > > > > > > >> > > > identical schemas, you could just concatenate them and
> > > include
> > > > > > > >> indices
> > > > > > > >> > in
> > > > > > > >> > > > your metadata.
> > > > > > > >> > > >
> > > > > > > >> > > > As for browser Flight support - there's an existing
> > > ticket:
> > > > > > > >> > > > https://issues.apache.org/jira/browse/ARROW-9860
> > > > > > > >> > > >
> > > > > > > >> > > > I was sure I had seen another organization talking
> about
> > > > > browser
> > > > > > > >> > support
> > > > > > > >> > > > recently, but now I can't find them. I'll update here
> if
> > > I do
> > > > > > figure
> > > > > > > >> > it out.
> > > > > > > >> > > >
> > > > > > > >> > > > Best,
> > > > > > > >> > > > David
> > > > > > > >> > > >
> > > > > > > >> > > > On Wed, Mar 3, 2021, at 21:00, Nate Bauernfeind wrote:
> > > > > > > >> > > > > >  if each payload has two batches with different
> > > purposes
> > > > > > [...]
> > > > > > > >> > > > >
> > > > > > > >> > > > > The purposes of the payloads are slightly different,
> > > however
> > > > > > they
> > > > > > > >> are
> > > > > > > >> > > > > intended to be applied atomically. If there are
> > > guarantees
> > > > > by
> > > > > > the
> > > > > > > >> > table
> > > > > > > >> > > > > operation generating the updates then those
> guarantees
> > > are
> > > > > > only
> > > > > > > >> > valid on
> > > > > > > >> > > > > each boundary of applying the update to your local
> > > state.
> > > > > In a
> > > > > > > >> > sense, one
> > > > > > > >> > > > > is relatively useless without the other. Record
> > batches
> > > fit
> > > > > > well
> > > > > > > >> in
> > > > > > > >> > > > > map-reduce paradigms / algorithms, but what we have
> is
> > > > > > stateful to
> > > > > > > >> > > > > enable/support incremental updates. For example,
> > > sorting a
> > > > > > flight
> > > > > > > >> of
> > > > > > > >> > data
> > > > > > > >> > > > > is best done map-reduce-style and requires one to
> > > re-sort
> > > > > the
> > > > > > > >> entire
> > > > > > > >> > data
> > > > > > > >> > > > > set when it changes. Our approach focuses on
> producing
> > > > > > incremental
> > > > > > > >> > > > updates
> > > > > > > >> > > > > which are used to manipulate your existing client
> > state
> > > > > using
> > > > > > a
> > > > > > > >> much
> > > > > > > >> > > > > smaller footprint (in both time and space). You can
> > > imagine,
> > > > > > in
> > > > > > > >> the
> > > > > > > >> > sort
> > > > > > > >> > > > > scenario, if you evaluate the table after adding
> rows
> > > but
> > > > > > before
> > > > > > > >> > > > modifying
> > > > > > > >> > > > > existing rows your table won’t be sorted between the
> > two
> > > > > > updates.
> > > > > > > >> The
> > > > > > > >> > > > > client would then need to wait until it receives the
> > > pair of
> > > > > > > >> > > > RecordBatches
> > > > > > > >> > > > > anyways, so it seems more natural to deliver them
> > > together.
> > > > > > > >> > > > >
> > > > > > > >> > > > > > As a side note - is said UI browser-based? Another
> > > project
> > > > > > > >> > recently was
> > > > > > > >> > > > > planning to look at JavaScript support for Flight
> > (using
> > > > > > > >> WebSockets
> > > > > > > >> > as
> > > > > > > >> > > > the
> > > > > > > >> > > > > transport, IIRC) and it might make sense to join
> > forces
> > > if
> > > > > > that’s
> > > > > > > >> a
> > > > > > > >> > path
> > > > > > > >> > > > > you were also going to pursue.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Yes, our UI runs in the browser, although table
> > > operations
> > > > > > > >> > themselves run
> > > > > > > >> > > > > on the server to keep the browser lean and fast.
> That
> > > said,
> > > > > > the
> > > > > > > >> > browser
> > > > > > > >> > > > > isn’t the only target for the API we’re iterating
> on.
> > > We’re
> > > > > > > >> engaged
> > > > > > > >> > in a
> > > > > > > >> > > > > rewrite to unify our “first-class” Java API for
> > > intra-engine
> > > > > > > >> (server,
> > > > > > > >> > > > > heavyweight client) usage and our cross-language
> > > > > > > >> > > > (Javascript/C++/C#/Python)
> > > > > > > >> > > > > “open” API. Our existing customers use the engine to
> > > drive
> > > > > > > >> > multi-process
> > > > > > > >> > > > > data applications, REPL/notebook experiences, and
> > > > > dashboards.
> > > > > > We
> > > > > > > >> are
> > > > > > > >> > > > > preserving these capabilities as we make the engine
> > > > > available
> > > > > > as
> > > > > > > >> open
> > > > > > > >> > > > > source software. One goal of the OSS effort is to
> > > produce a
> > > > > > > >> singular
> > > > > > > >> > > > modern
> > > > > > > >> > > > > API that’s more interoperable with the data science
> > and
> > > > > > > >> development
> > > > > > > >> > > > > community as a whole. In the interest of minimizing
> > > > > > entry/egress
> > > > > > > >> > points,
> > > > > > > >> > > > we
> > > > > > > >> > > > > are migrating to gRPC for everything in addition to
> > the
> > > data
> > > > > > IPC
> > > > > > > >> > layer,
> > > > > > > >> > > > so
> > > > > > > >> > > > > not just the barrage/arrow-flight piece.
> > > > > > > >> > > > >
> > > > > > > >> > > > > The point of all this is to make the Deephaven
> engine
> > as
> > > > > > > >> accessible
> > > > > > > >> > as
> > > > > > > >> > > > > possible for a broad user base, including developers
> > > using
> > > > > > the API
> > > > > > > >> > from
> > > > > > > >> > > > > their language of choice or scripts/code running
> > > co-located
> > > > > > > >> within an
> > > > > > > >> > > > > engine process. Our software can be used to explore
> or
> > > build
> > > > > > > >> > applications
> > > > > > > >> > > > > and visualizations around static as well as
> real-time
> > > data
> > > > > > > >> (imagine
> > > > > > > >> > > > joins,
> > > > > > > >> > > > > aggregations, sorts, filters, time-series joins,
> etc),
> > > > > perform
> > > > > > > >> table
> > > > > > > >> > > > > operations with code or with a few clicks in a GUI,
> or
> > > as a
> > > > > > > >> > > > building-block
> > > > > > > >> > > > > in a multi-stage data pipeline. We think making
> > > ourselves as
> > > > > > > >> > > > interoperable
> > > > > > > >> > > > > as possible with tools built on Arrow is an
> important
> > > part
> > > > > of
> > > > > > > >> > attaining
> > > > > > > >> > > > > this goal.
> > > > > > > >> > > > >
> > > > > > > >> > > > > That said, we have run into quite a few pain points
> > > > > migrating
> > > > > > to
> > > > > > > >> > gRPC,
> > > > > > > >> > > > such
> > > > > > > >> > > > > as 1) no-client-side streaming is supported by any
> > > browser,
> > > > > 2)
> > > > > > > >> today,
> > > > > > > >> > > > > server-side streams require a proxy layer of some
> sort
> > > (such
> > > > > > as
> > > > > > > >> > envoy),
> > > > > > > >> > > > 3)
> > > > > > > >> > > > > flatbuffer’s javascript/typescript support is a
> little
> > > weak,
> > > > > > and
> > > > > > > >> I’m
> > > > > > > >> > sure
> > > > > > > >> > > > > there are others that aren’t coming to mind at the
> > > moment.
> > > > > We
> > > > > > have
> > > > > > > >> > some
> > > > > > > >> > > > > interesting solutions to these problems, but, today,
> > > these
> > > > > > issues
> > > > > > > >> > are a
> > > > > > > >> > > > > decent chunk of our focus. That said, the UI is
> usable
> > > today
> > > > > > by
> > > > > > > >> our
> > > > > > > >> > > > > enterprise clients, but it interacts with the server
> > > over
> > > > > > > >> websockets
> > > > > > > >> > and
> > > > > > > >> > > > a
> > > > > > > >> > > > > protocol that is heavily influenced by 10-years of
> > > existing
> > > > > > > >> > proprietary
> > > > > > > >> > > > > java-to-java IPC (which are NOT friendly to being
> > robust
> > > > > over
> > > > > > > >> > > > intermittent
> > > > > > > >> > > > > failures). Today, we’re just heads-down going the
> gRPC
> > > route
> > > > > > and
> > > > > > > >> > hoping
> > > > > > > >> > > > > that eventually browsers get around to better
> support
> > > for
> > > > > > some of
> > > > > > > >> > this
> > > > > > > >> > > > > stuff (so, maybe one day a proxy isn’t required,
> etc).
> > > Some
> > > > > > of our
> > > > > > > >> > RPCs
> > > > > > > >> > > > > make most sense as bidirectional streams, but to
> > > support our
> > > > > > > >> web-ui
> > > > > > > >> > we
> > > > > > > >> > > > also
> > > > > > > >> > > > > have a server-streaming variant that we can pass
> data
> > to
> > > > > > > >> > “out-of-band”
> > > > > > > >> > > > via
> > > > > > > >> > > > > a unary call referencing the particular server
> stream.
> > > It’s
> > > > > > fun
> > > > > > > >> > stuff!
> > > > > > > >> > > > I’m
> > > > > > > >> > > > > actually very excited about it even if the text
> > doesn’t
> > > > > sound
> > > > > > that
> > > > > > > >> > way
> > > > > > > >> > > > =).
> > > > > > > >> > > > >
> > > > > > > >> > > > > If you can point me to that project/person/post we’d
> > > love to
> > > > > > get
> > > > > > > >> in
> > > > > > > >> > touch
> > > > > > > >> > > > > and are excited to share whatever can be shared.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Nate
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Wed, Mar 3, 2021 at 4:22 PM David Li <
> > > > > lidavidm@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Ah okay, thank you for clarifying! In that case,
> if
> > > each
> > > > > > payload
> > > > > > > >> > has
> > > > > > > >> > > > two
> > > > > > > >> > > > > > batches with different purposes - might it make
> > sense
> > > to
> > > > > > just
> > > > > > > >> make
> > > > > > > >> > > > that two
> > > > > > > >> > > > > > different payloads, and set a flag/enum in the
> > > metadata to
> > > > > > > >> indicate
> > > > > > > >> > > > how to
> > > > > > > >> > > > > > interpret the batch? Then you'd be officially the
> > > same as
> > > > > > Arrow
> > > > > > > >> > Flight
> > > > > > > >> > > > :)
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > As a side note - is said UI browser-based? Another
> > > project
> > > > > > > >> > recently was
> > > > > > > >> > > > > > planning to look at JavaScript support for Flight
> > > (using
> > > > > > > >> > WebSockets as
> > > > > > > >> > > > the
> > > > > > > >> > > > > > transport, IIRC) and it might make sense to join
> > > forces if
> > > > > > > >> that's a
> > > > > > > >> > > > path
> > > > > > > >> > > > > > you were also going to pursue.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Best,
> > > > > > > >> > > > > > David
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Wed, Mar 3, 2021, at 18:05, Nate Bauernfeind
> > wrote:
> > > > > > > >> > > > > > > Thanks for the interest =).
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > However, if I understand right, you're sending
> > > data
> > > > > > without
> > > > > > > >> a
> > > > > > > >> > fixed
> > > > > > > >> > > > > > > schema [...]
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > The dataset does have a known schema ahead of
> > time,
> > > > > which
> > > > > > is
> > > > > > > >> > similar
> > > > > > > >> > > > to
> > > > > > > >> > > > > > > Flight. However, as you point out, the
> > subscription
> > > can
> > > > > > change
> > > > > > > >> > which
> > > > > > > >> > > > > > > columns it is interested in without re-acquiring
> > > data
> > > > > for
> > > > > > > >> > columns it
> > > > > > > >> > > > was
> > > > > > > >> > > > > > > already subscribed to. This is mostly for
> > > convenience.
> > > > > We
> > > > > > use
> > > > > > > >> it
> > > > > > > >> > > > > > primarily
> > > > > > > >> > > > > > > to limit which columns are sent to our user
> > > interface
> > > > > > until
> > > > > > > >> the
> > > > > > > >> > user
> > > > > > > >> > > > > > > scrolls them into view.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > The enhancement of the RecordBatch here, aside
> > from
> > > the
> > > > > > > >> > additional
> > > > > > > >> > > > > > > metadata, is only in that the payload has two
> sets
> > > of
> > > > > > > >> RecordBatch
> > > > > > > >> > > > > > payloads.
> > > > > > > >> > > > > > > The first payload is for added rows, every added
> > row
> > > > > must
> > > > > > send
> > > > > > > >> > data
> > > > > > > >> > > > for
> > > > > > > >> > > > > > > each column subscribed; based on the subscribed
> > > columns
> > > > > > this
> > > > > > > >> is
> > > > > > > >> > > > otherwise
> > > > > > > >> > > > > > > fixed width (in the number of columns /
> buffers).
> > > The
> > > > > > second
> > > > > > > >> > payload
> > > > > > > >> > > > is
> > > > > > > >> > > > > > for
> > > > > > > >> > > > > > > modified rows. Here we only send the columns
> that
> > > have
> > > > > > rows
> > > > > > > >> that
> > > > > > > >> > are
> > > > > > > >> > > > > > > modified. Aside from this difference, I have
> been
> > > aiming
> > > > > > to be
> > > > > > > >> > > > compatible
> > > > > > > >> > > > > > > enough to be able to reuse the payload parsing
> > that
> > > is
> > > > > > already
> > > > > > > >> > > > written
> > > > > > > >> > > > > > for
> > > > > > > >> > > > > > > Arrow.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > I don't quite see why it couldn't be carried
> as
> > > > > > metadata on
> > > > > > > >> the
> > > > > > > >> > > > side
> > > > > > > >> > > > > > of a
> > > > > > > >> > > > > > > record batch, instead of having to duplicate the
> > > record
> > > > > > batch
> > > > > > > >> > > > structure
> > > > > > > >> > > > > > > [...]
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Whoa, this is a good point. I have iterated on
> > this
> > > a
> > > > > few
> > > > > > > >> times
> > > > > > > >> > to
> > > > > > > >> > > > get it
> > > > > > > >> > > > > > > closer to Arrow's setup and did not realize that
> > > > > > 'BarrageData'
> > > > > > > >> > is now
> > > > > > > >> > > > > > > officially identical to `FlightData`. This is an
> > > > > instance
> > > > > > of
> > > > > > > >> > being
> > > > > > > >> > > > too
> > > > > > > >> > > > > > > close to the project and forgetting to step back
> > > once
> > > > > in a
> > > > > > > >> while.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Flight already has a bidirectional streaming
> > > endpoint,
> > > > > > > >> > DoExchange,
> > > > > > > >> > > > that
> > > > > > > >> > > > > > > allows arbitrary payloads (with mixed
> > metadata/data
> > > or
> > > > > > only
> > > > > > > >> one
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > > > > two), which seems like it should be able to
> cover
> > > the
> > > > > > > >> > > > SubscriptionRequest
> > > > > > > >> > > > > > > endpoint.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > This is exactly the kind of feedback I'm looking
> > > for! I
> > > > > > wasn't
> > > > > > > >> > > > seeing the
> > > > > > > >> > > > > > > solution where the client-side stream doesn't
> > > actually
> > > > > > need
> > > > > > > >> > payload
> > > > > > > >> > > > and
> > > > > > > >> > > > > > > that the subscription changes can be described
> > with
> > > > > > another
> > > > > > > >> > > > flatbuffer
> > > > > > > >> > > > > > > metadata type. I like that.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Thanks David!
> > > > > > > >> > > > > > > Nate
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Wed, Mar 3, 2021 at 3:28 PM David Li <
> > > > > > lidavidm@apache.org>
> > > > > > > >> > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Hey Nate,
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Thanks for sharing this & for the detailed
> docs
> > > and
> > > > > > > >> writeup. I
> > > > > > > >> > > > think
> > > > > > > >> > > > > > your
> > > > > > > >> > > > > > > > use case is interesting, but I'd like to
> clarify
> > > a few
> > > > > > > >> things.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > I would say Arrow Flight doesn't try to
> impose a
> > > > > > particular
> > > > > > > >> > model,
> > > > > > > >> > > > but
> > > > > > > >> > > > > > I
> > > > > > > >> > > > > > > > agree that Barrage does things that aren't
> > easily
> > > > > doable
> > > > > > > >> with
> > > > > > > >> > > > Flight.
> > > > > > > >> > > > > > > > Flight does name concepts in a way that
> suggests
> > > how
> > > > > to
> > > > > > > >> apply
> > > > > > > >> > it to
> > > > > > > >> > > > > > > > something that looks like a database, but you
> > can
> > > > > mostly
> > > > > > > >> think
> > > > > > > >> > of
> > > > > > > >> > > > > > Flight as
> > > > > > > >> > > > > > > > an efficient way to transfer Arrow data over
> the
> > > > > network
> > > > > > > >> upon
> > > > > > > >> > which
> > > > > > > >> > > > > > you can
> > > > > > > >> > > > > > > > layer further semantics.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > However, if I understand right, you're sending
> > > data
> > > > > > without
> > > > > > > >> a
> > > > > > > >> > fixed
> > > > > > > >> > > > > > > > schema, in the sense that each
> > BarrageRecordBatch
> > > may
> > > > > > have
> > > > > > > >> > only a
> > > > > > > >> > > > > > subset of
> > > > > > > >> > > > > > > > the columns declared up front, or may carry
> new
> > > > > > columns? I
> > > > > > > >> > think
> > > > > > > >> > > > this
> > > > > > > >> > > > > > is
> > > > > > > >> > > > > > > > the main thing you can't easily do currently,
> as
> > > > > Flight
> > > > > > (and
> > > > > > > >> > Arrow
> > > > > > > >> > > > IPC
> > > > > > > >> > > > > > in
> > > > > > > >> > > > > > > > general) assumes a fixed schema (and expects
> all
> > > > > > columns in
> > > > > > > >> a
> > > > > > > >> > > > batch to
> > > > > > > >> > > > > > have
> > > > > > > >> > > > > > > > the same length).
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Otherwise, the encoding for identifying rows
> and
> > > > > > changes is
> > > > > > > >> > > > > > interesting,
> > > > > > > >> > > > > > > > but I don't quite see why it couldn't be
> carried
> > > as
> > > > > > metadata
> > > > > > > >> > on the
> > > > > > > >> > > > > > side of
> > > > > > > >> > > > > > > > a record batch, instead of having to duplicate
> > the
> > > > > > record
> > > > > > > >> batch
> > > > > > > >> > > > > > structure,
> > > > > > > >> > > > > > > > except for the aforementioned schema issue.
> And
> > in
> > > > > that
> > > > > > > >> case it
> > > > > > > >> > > > might
> > > > > > > >> > > > > > be
> > > > > > > >> > > > > > > > better to work out the schema evolution issue
> &
> > > any
> > > > > > > >> ergonomic
> > > > > > > >> > > > issues
> > > > > > > >> > > > > > with
> > > > > > > >> > > > > > > > Flight's existing metadata fields/API that
> would
> > > > > > prevent you
> > > > > > > >> > from
> > > > > > > >> > > > using
> > > > > > > >> > > > > > > > them, as that way you (and we!) don't have to
> > > fully
> > > > > > > >> duplicate
> > > > > > > >> > one
> > > > > > > >> > > > of
> > > > > > > >> > > > > > > > Arrow's format definitions. Similarly, Flight
> > > already
> > > > > > has a
> > > > > > > >> > > > > > bidirectional
> > > > > > > >> > > > > > > > streaming endpoint, DoExchange, that allows
> > > arbitrary
> > > > > > > >> payloads
> > > > > > > >> > > > (with
> > > > > > > >> > > > > > mixed
> > > > > > > >> > > > > > > > metadata/data or only one of the two), which
> > seems
> > > > > like
> > > > > > it
> > > > > > > >> > should
> > > > > > > >> > > > be
> > > > > > > >> > > > > > able
> > > > > > > >> > > > > > > > to cover the SubscriptionRequest endpoint.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Best,
> > > > > > > >> > > > > > > > David
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > On Wed, Mar 3, 2021, at 16:08, Nate
> Bauernfeind
> > > wrote:
> > > > > > > >> > > > > > > > > Hello,
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > My colleagues at Deephaven Data Labs and I
> > have
> > > been
> > > > > > > >> > addressing
> > > > > > > >> > > > > > problems
> > > > > > > >> > > > > > > > at
> > > > > > > >> > > > > > > > > the intersection of data-driven
> applications,
> > > data
> > > > > > > >> science,
> > > > > > > >> > and
> > > > > > > >> > > > > > updating
> > > > > > > >> > > > > > > > > (/ticking) data for some years.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Deephaven has a query engine that supports
> > > updating
> > > > > > > >> tabular
> > > > > > > >> > data
> > > > > > > >> > > > via
> > > > > > > >> > > > > > a
> > > > > > > >> > > > > > > > > protocol that communicates precise changes
> > about
> > > > > > datasets,
> > > > > > > >> > such
> > > > > > > >> > > > as 1)
> > > > > > > >> > > > > > > > which
> > > > > > > >> > > > > > > > > rows were removed, 2) which rows were added,
> > 3)
> > > > > which
> > > > > > rows
> > > > > > > >> > were
> > > > > > > >> > > > > > modified
> > > > > > > >> > > > > > > > > (and for which columns). We are inspired by
> > > Arrow
> > > > > and
> > > > > > > >> would
> > > > > > > >> > like
> > > > > > > >> > > > to
> > > > > > > >> > > > > > > > adopt a
> > > > > > > >> > > > > > > > > version of this protocol that adheres to
> goals
> > > > > > similar to
> > > > > > > >> > Arrow
> > > > > > > >> > > > and
> > > > > > > >> > > > > > Arrow
> > > > > > > >> > > > > > > > > Flight.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Out of the box, Arrow Flight is insufficient
> > to
> > > > > > represent
> > > > > > > >> > such a
> > > > > > > >> > > > > > stream
> > > > > > > >> > > > > > > > of
> > > > > > > >> > > > > > > > > changes. For example, because you cannot
> > > identify a
> > > > > > > >> > particular
> > > > > > > >> > > > row
> > > > > > > >> > > > > > within
> > > > > > > >> > > > > > > > > an Arrow Flight, you cannot indicate which
> > rows
> > > were
> > > > > > > >> removed
> > > > > > > >> > or
> > > > > > > >> > > > > > modified.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > The project integrates with Arrow Flight at
> > the
> > > > > > > >> > header-metadata
> > > > > > > >> > > > > > level. We
> > > > > > > >> > > > > > > > > have preliminarily named the project Barrage
> > as
> > > in a
> > > > > > > >> > "barrage of
> > > > > > > >> > > > > > arrows"
> > > > > > > >> > > > > > > > > which plays in the same "namespace" as a
> > > "flight of
> > > > > > > >> arrows."
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > We built this as part of an initiative to
> > > modernize
> > > > > > and
> > > > > > > >> open
> > > > > > > >> > up
> > > > > > > >> > > > our
> > > > > > > >> > > > > > table
> > > > > > > >> > > > > > > > > IPC mechanisms. This is part of a larger
> open
> > > source
> > > > > > > >> effort
> > > > > > > >> > which
> > > > > > > >> > > > > > will
> > > > > > > >> > > > > > > > > become more visible in the next month or so
> > once
> > > > > we've
> > > > > > > >> > finished
> > > > > > > >> > > > the
> > > > > > > >> > > > > > work
> > > > > > > >> > > > > > > > > necessary to share our core software
> > components,
> > > > > > > >> including a
> > > > > > > >> > > > unified
> > > > > > > >> > > > > > > > static
> > > > > > > >> > > > > > > > > and real time query engine complete with
> data
> > > > > > > >> visualization
> > > > > > > >> > > > tools, a
> > > > > > > >> > > > > > REPL
> > > > > > > >> > > > > > > > > experience, Jupyter integration, and more.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > I would like to find out:
> > > > > > > >> > > > > > > > > - if we have understood the primary goals of
> > > Arrow,
> > > > > > and
> > > > > > > >> are
> > > > > > > >> > > > honoring
> > > > > > > >> > > > > > them
> > > > > > > >> > > > > > > > > as closely as possible
> > > > > > > >> > > > > > > > > - if there are other projects that might
> > benefit
> > > > > from
> > > > > > > >> sharing
> > > > > > > >> > > > this
> > > > > > > >> > > > > > > > > extension of Arrow Flight
> > > > > > > >> > > > > > > > > - if there are any gaps that are best
> > addressed
> > > > > early
> > > > > > on
> > > > > > > >> to
> > > > > > > >> > > > maximize
> > > > > > > >> > > > > > > > future
> > > > > > > >> > > > > > > > > compatibility
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > A great place to digest the concepts that
> > differ
> > > > > from
> > > > > > > >> Arrow
> > > > > > > >> > > > Flight
> > > > > > > >> > > > > > are
> > > > > > > >> > > > > > > > here:
> > > > > > > >> > > > > > > > >
> > > https://deephaven.github.io/barrage/Concepts.html
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > The proposed protocol can be perused here:
> > > > > > > >> > > > > > > > > https://github.com/deephaven/barrage
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Internally, we already have a java server
> and
> > > java
> > > > > > client
> > > > > > > >> > > > > > implemented as
> > > > > > > >> > > > > > > > a
> > > > > > > >> > > > > > > > > working proof of concept for our use case.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > I really look forward to your feedback;
> thank
> > > you!
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Nate Bauernfeind
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Deephaven Data Labs - https://deephaven.io/
> > > > > > > >> > > > > > > > > --
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > --
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > >
> > >
> >
>


--