You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Matt Topol <zo...@gmail.com> on 2024/03/01 22:27:19 UTC

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

> @pgwhalen: As a potential "end user developer," (and aspiring
contributor) this
immediately excited me when I first saw it.

Yay! Good to hear that!

> @pgwhalen: And it wasn't clear to me whether updating batches in
place (and the producer/consumer coordination that comes with that) was
supported or encouraged as part of the proposal.

So, updating batches in place was not a particular use-case we were
targeting with this approach. Instead using shared memory to produce and
consume the buffers/batches without having to physically copy the data.
Trying to update a batch in place is a dangerous prospect for a number of
reasons, but as you've mentioned it can technically be made safe if the
shape is staying the same and you're only modifying fixed-width data types
(i.e. not only is the *shape* unchanged but the sizes of the underlying
data buffers are also remaining unchanged). The producer/consumer
coordination that would be needed for updating batches in place is not part
of this proposal but is definitely something we can look into as a
follow-up to this for extending it. There's a number of discussions that
would need to be had around that so I don't want to add on another
complexity to this already complex proposal.

That said, if you or anyone see something in this proposal that would
hinder or prevent being able to use it for your use case please let me know
so we can address it. Even though the proposal as it currently exists
doesn't fully support the in-place updating of batches, I don't want to
make things harder for us in such a follow-up where we'd end up requiring
an entirely new protocol to support that.

> @octalene.dev: I know of a third party that is interested in Arrow for
HPC environments that could be interested in the proposal and I can see if
they're interested in providing feedback.

Awesome! Thanks much!


For reference to anyone who hasn't looked at the document in a while, since
the original discussion thread on this I have added a full "Background
Context" page to the beginning of the proposal to help anyone who isn't
already familiar with the issues this protocol is trying to solve or isn't
already familiar with ucx or libfabric transports to better understand
*why* I'm
proposing this and what it is trying to solve. The point of this background
information is to help ensure that anyone who might have thoughts on
protocols in general or APIs should still be able to understand the base
reasons and goals that we're trying to achieve with this protocol proposal.
You don't need to already understand managing GPU/device memory or ucx to
be able to have meaningful input on the document.

Thanks again to all who have contributed so far and please spread to any
contacts that you think might be interested in this for their particular
use cases.

--Matt

On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid> wrote:

> I am interested in this as well, but I haven't gotten to a point where I
> can have valuable input (I haven't tried other transports). I know of a
> third party that is interested in Arrow for HPC environments that could be
> interested in the proposal and I can see if they're interested in providing
> feedback.
>
> I glanced at the document before but I'll go through again to see if there
> is anything I can comment on.
>
>
>
> # ------------------------------
> # Aldrin
>
>
> https://github.com/drin/
> https://gitlab.com/octalene
> https://keybase.io/octalene
>
>
> On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <pg...@gmail.com>
> wrote:
>
> > As a potential "end user developer," (and aspiring contributor) this
> > immediately excited me when I first saw it.
> >
>
> > I work at a trading firm, and my team has developed an IPC mechanism for
> > efficiently transmitting pandas dataframes both remotely via TCP and
> > locally via shared memory, where the interface for the application
> > developer is the same for both. The data in the dataframes may change
> > rapidly, so when communicating locally via shared memory, if the shape of
> > the dataframe doesn't change, we update the memory in place, coordinating
> > between the producer and consumer via TCP.
> >
>
> > We intend to move away from our remote TCP mechanism towards Arrow
> Flight,
> > or a lighter-weight version of Arrow IPC. For the local shared memory
> > mechanism which we previously did not have a good answer for, it seems
> like
> > Disassociated Arrow IPC maps quite well to our problem.
> >
>
> > So some features that enable our use case are:
> > - Updating existing batches in place is supported
> > - The interface is pretty similar to Flight
> >
>
> > I'd imagine we're not the only financial firm to implement something like
> > this, given how widespread pandas usage is, so that could be a place to
> > seek feedback.
> >
>
> > As I was reading the proposal initially, I gleaned that the most
> important
> > audience was those writing interfaces to GPUs/remote memory/non-standard
> > transports/etc. And it wasn't clear to me whether updating batches in
> > place (and the producer/consumer coordination that comes with that) was
> > supported or encouraged as part of the proposal. But regardless, as an
> end
> > user, this seems like an easier and more efficient way to glue pieces in
> > the Arrow ecosystem together if it was adopted broadly.
> >
>
> > Paul
> >
>
> > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewizard@gmail.com wrote:
> >
>
> > > I'll continue my efforts of trying to reach out to other interested
> > > parties, but if anyone else here has any contacts or connections that
> they
> > > think might be interested please forward them the link to the Google
> doc.
> > >
>
> > > I really do want to get as much engagement and feedback as possible on
> > > this.
> > >
>
> > > Thanks!
> > >
>
> > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com wrote:
> > >
>
> > > > Have there been efforts to proactively reach out to other third
> parties
> > > > that might have an interest in this or be a potential user at some
> point?
> > > > There are a lot of interested parties in Arrow that may not actively
> > > > follow
> > > > the mailing list.
> > > >
>
> > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
> NVIDIA
> > > > or
> > > > working on UCX), or other communities like that might have
> constructive
> > > > thoughts about this. DLPack (https://dmlc.github.io/dlpack/latest/)
> also
> > > > seems adjacent and worth reaching out to. Other ideas for projects or
> > > > companies that could be reached out to for feedback.
> > > >
>
> > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou antoine@python.org
> > > > wrote:
> > > >
>
> > > > > If there's no engagement, then I'm afraid it might mean that third
> > > > > parties have no interest in this. I don't really have any solution
> for
> > > > > generating engagement except nagging and pinging people explicitly
> :-)
> > > > >
>
> > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > >
>
> > > > > > I would like to see the same Antoine, currently given the lack of
> > > > > > engagement (both for OR against) I was going to take the silence
> as
> > > > > > assent
> > > > > > and hope for non-Voltron Data PMC members to vote in this.
> > > > > >
>
> > > > > > If anyone has any suggestions on how we could potentially
> generate
> > > > > > more
> > > > > > engagement and discussion on this, please let me know as I want
> as
> > > > > > many
> > > > > > parties in the community as possible to be part of this.
> > > > > >
>
> > > > > > Thanks everyone.
> > > > > >
>
> > > > > > --Matt
> > > > > >
>
> > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> antoine@python.org
> > > > > > wrote:
> > > > > >
>
> > > > > > > Hello,
> > > > > > >
>
> > > > > > > I'd really like to see more engagement and criticism from
> > > > > > > non-Voltron
> > > > > > > Data parties before this is formally adopted as an Arrow spec.
> > > > > > >
>
> > > > > > > Regards
> > > > > > >
>
> > > > > > > Antoine.
> > > > > > >
>
> > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > > > >
>
> > > > > > > > Hey all,
> > > > > > > >
>
> > > > > > > > I'd like to propose a vote for us to officially adopt the
> protocol
> > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
> > > > > > > > Transports.
> > > > > > > > This
> > > > > > > > proposal was originally discussed at 2. Once this proposal is
> > > > > > > > adopted,
> > > > > > > > I
> > > > > > > > will work on adding the necessary documentation to the Arrow
> > > > > > > > website
> > > > > > > > along
> > > > > > > > with examples etc.
> > > > > > > >
>
> > > > > > > > The vote will be open for at least 72 hours.
> > > > > > > >
>
> > > > > > > > [ ] +1 Accept this Proposal
> > > > > > > > [ ] +0
> > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > > > > >
>
> > > > > > > > Thank you everyone!
> > > > > > > >
>
> > > > > > > > --Matt
> > > > > > > >
>
> > > > > > > > [1]:
> > >
>
> > >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb

[RESULT] Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
The vote carries with 3 binding +1 votes, 1 non-binding +1 vote, and no -1
votes.

I'll put together a PR for the Arrow docs laying out the spec and marking
it experimental.

Thanks everyone!

--Matt

On Tue, Apr 2, 2024 at 2:56 PM Weston Pace <we...@gmail.com> wrote:

> Forgot link:
>
> [1]
>
> https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory
>
> On Tue, Apr 2, 2024 at 11:38 AM Weston Pace <we...@gmail.com> wrote:
>
> > Thanks for taking the time to address my concerns.
> >
> > > I've split the S3/HTTP URI flight pieces out into a separate document
> and
> > > separate thing to vote on at the request of several people who wanted
> to
> > > view these as two separate proposals to vote on. So this vote *only*
> > covers
> > > adopting the protocol spec as an "Experimental Protocol" so we can
> start
> > > seeing real world usage to help refine and improve it. That said, I
> > believe
> > > all clients currently would reject any non-grpc URI.
> >
> > Ah, I was confused and my comments were mostly about the s3/http
> proposal.
> >
> > Regarding the proposal at hand, I went through it in more detail.  I
> don't
> > know much about ucx so I considered two different use cases:
> >
> >  * The previously mentioned shared memory approach.  I think this is
> > compelling as people have asked about shared memory communication from
> time
> > to time and I've always suggested flight over unix sockets though that
> > forces a copy.
> >  * I think this could also form the basis for large transfers of arrow
> > data over a wasm boundary.  Wasm has a concept of shared memory
> objects[1]
> > and a wasm data library could use this to stream data into javascript
> > without a copy.
> >
> > I've added a few more questions to the doc.  Either way, if we're only
> > talking about an experimental protocol / suggested recommendation then
> I'm
> > fine voting +1 on this (I'm not sure a formal vote is even needed).  I
> > would want to see at least 2 implementations if we wanted to remove the
> > experimental label.
> >
> > On Sun, Mar 31, 2024 at 2:43 PM Joel Lubinitsky <jo...@gmail.com>
> > wrote:
> >
> >> +1 to the dissociated transports proposal
> >>
> >> On Sun, Mar 31, 2024 at 11:14 AM David Li <li...@apache.org> wrote:
> >>
> >> > +1 from me as before
> >> >
> >> > On Thu, Mar 28, 2024, at 18:06, Matt Topol wrote:
> >> > >>  There is a word doc with no implementation or PR.  I think there
> >> could
> >> > > be an implementation / PR.
> >> > >
> >> > > In the word doc there is a link to a POC implementation[1] showing
> >> this
> >> > > protocol working with a flight service, ucx and libcudf. The key
> piece
> >> > here
> >> > > is that we're voting on adopting this protocol spec (i.e. I'll add
> it
> >> to
> >> > > the documentation website) rather than us explicitly providing full
> >> > > implementations or abstractions around it. We can provide reference
> >> > > implementations like the POC, but I don't think they should be in
> the
> >> > Arrow
> >> > > monorepo or else we run the risk of a lot of the same issues that
> >> Flight
> >> > > has: i.e. Adding anything to Flight in C++ requires fully wrapping
> the
> >> > > grpc/flight primitives with Arrow equivalents to export which
> >> increases
> >> > the
> >> > > maintenance burden on us and makes it more difficult for users to
> >> > leverage
> >> > > the underlying knobs and dials.
> >> > >
> >> > >> For example, does any ADBC client respect this protocol today?  If
> a
> >> > > flight server responds with an S3/HTTP URI will the ADBC client
> >> download
> >> > > the files from the correct place?  Will it at least notice that the
> >> URI
> >> > is
> >> > > not a GRPC URI and give a "I don't have a connector for downloading
> >> from
> >> > > HTTP/S3" error?
> >> > >
> >> > > I've split the S3/HTTP URI flight pieces out into a separate
> document
> >> and
> >> > > separate thing to vote on at the request of several people who
> wanted
> >> to
> >> > > view these as two separate proposals to vote on. So this vote *only*
> >> > covers
> >> > > adopting the protocol spec as an "Experimental Protocol" so we can
> >> start
> >> > > seeing real world usage to help refine and improve it. That said, I
> >> > believe
> >> > > all clients currently would reject any non-grpc URI.
> >> > >
> >> > >>   I was speaking with someone yesterday and they explained that
> >> > > they ended up not choosing Flight for an internal project because
> >> Flight
> >> > > didn't support something called "cloud fetch" which I have now
> >> learned is
> >> > >
> >> > > I was reading through that link, and it seems like it's pretty much
> >> > > *identical* to Flight as it currently exists, except that it is
> using
> >> > cloud
> >> > > storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather
> >> than a
> >> > > service sitting in front of those serving up Arrow IPC *streams*.
> >> Which
> >> > has
> >> > > been requested by others in the community, hence the second proposal
> >> that
> >> > > was split out [2].
> >> > >
> >> > >>  So a big +1 for the idea of disassociated transports but I'm not
> >> sure
> >> > why
> >> > > we need a vote to start working on it (but I'm not opposed if a vote
> >> > helps)
> >> > >
> >> > > Mostly I found that the google doc was easier for iterating on the
> >> > protocol
> >> > > specification than a markdown PR for the Arrow documentation as I
> >> could
> >> > > more visually express things without a preview of the rendered
> >> markdown.
> >> > If
> >> > > it would get people to be more likely to vote on this, I can write
> up
> >> the
> >> > > documentation markdown now and create a PR rather than waiting until
> >> we
> >> > > decide we're even going to adopt this protocol as an "official"
> arrow
> >> > > protocol.
> >> > >
> >> > > Lemme know if there's any other unanswered questions!
> >> > >
> >> > > --Matt
> >> > >
> >> > > [1]: https://github.com/zeroshade/cudf-flight-ucx
> >> > > [2]:
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq
> >> > >
> >> > > On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com>
> >> > wrote:
> >> > >
> >> > >> I'm sorry for the very late reply.  Until yesterday I had no real
> >> > concept
> >> > >> of what this was talking about and so I had stayed out.
> >> > >>
> >> > >> I'm +0 only because it isn't clear what we are voting on.  There
> is a
> >> > word
> >> > >> doc with no implementation or PR.  I think there could be an
> >> > implementation
> >> > >> / PR.  For example, does any ADBC client respect this protocol
> today?
> >> > If a
> >> > >> flight server responds with an S3/HTTP URI will the ADBC client
> >> download
> >> > >> the files from the correct place?  Will it at least notice that the
> >> URI
> >> > is
> >> > >> not a GRPC URI and give a "I don't have a connector for downloading
> >> from
> >> > >> HTTP/S3" error?  In general, I think we do want this in Flight (see
> >> > >> comments below) and I am very supportive of the idea.  However, if
> >> > adopting
> >> > >> this as an experimental proposal helps move this forward then I
> think
> >> > >> that's fine.
> >> > >>
> >> > >> That being said, I do want to express support for the proposal as a
> >> > >> concept, at least the "disassociated transports" portion (I can't
> >> speak
> >> > to
> >> > >> UCX/etc.).  I was speaking with someone yesterday and they
> explained
> >> > that
> >> > >> they ended up not choosing Flight for an internal project because
> >> Flight
> >> > >> didn't support something called "cloud fetch" which I have now
> >> learned
> >> > is
> >> > >> [1].  I had recalled looking at this proposal before and this
> person
> >> > seemed
> >> > >> interested and optimistic to know this was being considered for
> >> Flight.
> >> > >> This proposal, as I understand it, should make it possible for
> cloud
> >> > >> servers to support a cloud fetch style API.  From the discussion I
> >> got
> >> > the
> >> > >> impression that this cloud fetch approach is useful and generally
> >> > >> applicable.
> >> > >>
> >> > >> So a big +1 for the idea of disassociated transports but I'm not
> sure
> >> > why
> >> > >> we need a vote to start working on it (but I'm not opposed if a
> vote
> >> > helps)
> >> > >>
> >> > >> [1]
> >> > >>
> >> > >>
> >> >
> >>
> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
> >> > >>
> >> > >> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zotthewizard@gmail.com
> >
> >> > wrote:
> >> > >>
> >> > >> > I'll keep this new vote open for at least the next 72 hours. As
> >> before
> >> > >> > please reply with:
> >> > >> >
> >> > >> > [ ] +1 Accept this Proposal
> >> > >> > [ ] +0
> >> > >> > [ ] -1 Do not accept this proposal because...
> >> > >> >
> >> > >> > Thanks everyone!
> >> > >> >
> >> > >> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <
> >> > bengilgit@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> > > +1
> >> > >> > >
> >> > >> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zotthewizard@gmail.com
> >
> >> > wrote:
> >> > >> > >
> >> > >> > > > Should I start a new thread for a new vote? Or repeat the
> >> original
> >> > >> vote
> >> > >> > > > email here?
> >> > >> > > >
> >> > >> > > > Just asking since there hasn't been any responses so far.
> >> > >> > > >
> >> > >> > > > --Matt
> >> > >> > > >
> >> > >> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <
> >> > zotthewizard@gmail.com>
> >> > >> > > > wrote:
> >> > >> > > >
> >> > >> > > > > Absolutely, it will be marked experimental until we see
> some
> >> > people
> >> > >> > > using
> >> > >> > > > > it and can get more real-world feedback.
> >> > >> > > > >
> >> > >> > > > > There's also already a couple things that will be
> >> followed-up on
> >> > >> > after
> >> > >> > > > the
> >> > >> > > > > initial adoption for expansion which were discussed in the
> >> > >> comments.
> >> > >> > > > >
> >> > >> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <
> lidavidm@apache.org
> >> >
> >> > >> wrote:
> >> > >> > > > >
> >> > >> > > > >> I think let's try again. Would it be reasonable to declare
> >> this
> >> > >> > > > >> 'experimental' for the time being, just as we did with
> >> > >> Flight/Flight
> >> > >> > > > >> SQL/etc?
> >> > >> > > > >>
> >> > >> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> >> > >> > > > >> > Hey All, It's been another month and we've gotten a
> whole
> >> > bunch
> >> > >> of
> >> > >> > > > >> feedback
> >> > >> > > > >> > and engagement on the document from a variety of
> >> individuals.
> >> > >> > Myself
> >> > >> > > > >> and a
> >> > >> > > > >> > few others have proactively attempted to reach out to as
> >> many
> >> > >> > third
> >> > >> > > > >> parties
> >> > >> > > > >> > as we could, hoping to pull more engagement also. While
> it
> >> > would
> >> > >> > be
> >> > >> > > > >> great
> >> > >> > > > >> > to get even more feedback, the comments have slowed down
> >> and
> >> > we
> >> > >> > > > haven't
> >> > >> > > > >> > gotten anything in a few days at this point.
> >> > >> > > > >> >
> >> > >> > > > >> > If there's no objections, I'd like to try to open up for
> >> > voting
> >> > >> > > again
> >> > >> > > > to
> >> > >> > > > >> > officially adopt this as a protocol to add to our docs.
> >> > >> > > > >> >
> >> > >> > > > >> > Thanks all!
> >> > >> > > > >> >
> >> > >> > > > >> > --Matt
> >> > >> > > > >> >
> >> > >> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <
> >> > pgwhalen@gmail.com>
> >> > >> > > > wrote:
> >> > >> > > > >> >
> >> > >> > > > >> >> Agreed that it makes sense not to focus on in-place
> >> updating
> >> > >> for
> >> > >> > > this
> >> > >> > > > >> >> proposal.  I’m not even sure it’s a great fit as a
> >> “general
> >> > >> > > purpose”
> >> > >> > > > >> Arrow
> >> > >> > > > >> >> protocol, because of all the assumptions and
> restrictions
> >> > >> > required
> >> > >> > > as
> >> > >> > > > >> you
> >> > >> > > > >> >> noted.
> >> > >> > > > >> >>
> >> > >> > > > >> >> I took another look at the proposal and don’t think
> >> there’s
> >> > >> > > anything
> >> > >> > > > >> >> preventing in-place updating in the future - ultimately
> >> the
> >> > >> data
> >> > >> > > body
> >> > >> > > > >> could
> >> > >> > > > >> >> just be in the same location for subsequent messages.
> >> > >> > > > >> >>
> >> > >> > > > >> >> Thanks!
> >> > >> > > > >> >> Paul
> >> > >> > > > >> >>
> >> > >> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
> >> > >> > zotthewizard@gmail.com>
> >> > >> > > > >> wrote:
> >> > >> > > > >> >>
> >> > >> > > > >> >> > > @pgwhalen: As a potential "end user developer,"
> (and
> >> > >> aspiring
> >> > >> > > > >> >> > contributor) this
> >> > >> > > > >> >> > immediately excited me when I first saw it.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > Yay! Good to hear that!
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether
> updating
> >> > >> batches
> >> > >> > > in
> >> > >> > > > >> >> > place (and the producer/consumer coordination that
> >> comes
> >> > with
> >> > >> > > that)
> >> > >> > > > >> was
> >> > >> > > > >> >> > supported or encouraged as part of the proposal.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > So, updating batches in place was not a particular
> >> > use-case
> >> > >> we
> >> > >> > > were
> >> > >> > > > >> >> > targeting with this approach. Instead using shared
> >> memory
> >> > to
> >> > >> > > > produce
> >> > >> > > > >> and
> >> > >> > > > >> >> > consume the buffers/batches without having to
> >> physically
> >> > copy
> >> > >> > the
> >> > >> > > > >> data.
> >> > >> > > > >> >> > Trying to update a batch in place is a dangerous
> >> prospect
> >> > >> for a
> >> > >> > > > >> number of
> >> > >> > > > >> >> > reasons, but as you've mentioned it can technically
> be
> >> > made
> >> > >> > safe
> >> > >> > > if
> >> > >> > > > >> the
> >> > >> > > > >> >> > shape is staying the same and you're only modifying
> >> > >> fixed-width
> >> > >> > > > data
> >> > >> > > > >> >> types
> >> > >> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes
> >> of
> >> > the
> >> > >> > > > >> underlying
> >> > >> > > > >> >> > data buffers are also remaining unchanged). The
> >> > >> > producer/consumer
> >> > >> > > > >> >> > coordination that would be needed for updating
> batches
> >> in
> >> > >> place
> >> > >> > > is
> >> > >> > > > >> not
> >> > >> > > > >> >> part
> >> > >> > > > >> >> > of this proposal but is definitely something we can
> >> look
> >> > into
> >> > >> > as
> >> > >> > > a
> >> > >> > > > >> >> > follow-up to this for extending it. There's a number
> of
> >> > >> > > discussions
> >> > >> > > > >> that
> >> > >> > > > >> >> > would need to be had around that so I don't want to
> >> add on
> >> > >> > > another
> >> > >> > > > >> >> > complexity to this already complex proposal.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > That said, if you or anyone see something in this
> >> proposal
> >> > >> that
> >> > >> > > > would
> >> > >> > > > >> >> > hinder or prevent being able to use it for your use
> >> case
> >> > >> please
> >> > >> > > let
> >> > >> > > > >> me
> >> > >> > > > >> >> know
> >> > >> > > > >> >> > so we can address it. Even though the proposal as it
> >> > >> currently
> >> > >> > > > exists
> >> > >> > > > >> >> > doesn't fully support the in-place updating of
> >> batches, I
> >> > >> don't
> >> > >> > > > want
> >> > >> > > > >> to
> >> > >> > > > >> >> > make things harder for us in such a follow-up where
> >> we'd
> >> > end
> >> > >> up
> >> > >> > > > >> requiring
> >> > >> > > > >> >> > an entirely new protocol to support that.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > > @octalene.dev: I know of a third party that is
> >> > interested
> >> > >> in
> >> > >> > > > >> Arrow for
> >> > >> > > > >> >> > HPC environments that could be interested in the
> >> proposal
> >> > >> and I
> >> > >> > > can
> >> > >> > > > >> see
> >> > >> > > > >> >> if
> >> > >> > > > >> >> > they're interested in providing feedback.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > Awesome! Thanks much!
> >> > >> > > > >> >> >
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > For reference to anyone who hasn't looked at the
> >> document
> >> > in
> >> > >> a
> >> > >> > > > while,
> >> > >> > > > >> >> since
> >> > >> > > > >> >> > the original discussion thread on this I have added a
> >> full
> >> > >> > > > >> "Background
> >> > >> > > > >> >> > Context" page to the beginning of the proposal to
> help
> >> > anyone
> >> > >> > who
> >> > >> > > > >> isn't
> >> > >> > > > >> >> > already familiar with the issues this protocol is
> >> trying
> >> > to
> >> > >> > solve
> >> > >> > > > or
> >> > >> > > > >> >> isn't
> >> > >> > > > >> >> > already familiar with ucx or libfabric transports to
> >> > better
> >> > >> > > > >> understand
> >> > >> > > > >> >> > *why* I'm
> >> > >> > > > >> >> > proposing this and what it is trying to solve. The
> >> point
> >> > of
> >> > >> > this
> >> > >> > > > >> >> background
> >> > >> > > > >> >> > information is to help ensure that anyone who might
> >> have
> >> > >> > thoughts
> >> > >> > > > on
> >> > >> > > > >> >> > protocols in general or APIs should still be able to
> >> > >> understand
> >> > >> > > the
> >> > >> > > > >> base
> >> > >> > > > >> >> > reasons and goals that we're trying to achieve with
> >> this
> >> > >> > protocol
> >> > >> > > > >> >> proposal.
> >> > >> > > > >> >> > You don't need to already understand managing
> >> GPU/device
> >> > >> memory
> >> > >> > > or
> >> > >> > > > >> ucx to
> >> > >> > > > >> >> > be able to have meaningful input on the document.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > Thanks again to all who have contributed so far and
> >> please
> >> > >> > spread
> >> > >> > > > to
> >> > >> > > > >> any
> >> > >> > > > >> >> > contacts that you think might be interested in this
> for
> >> > their
> >> > >> > > > >> particular
> >> > >> > > > >> >> > use cases.
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > --Matt
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> >> > >> > > <octalene.dev@pm.me.invalid
> >> > >> > > > >
> >> > >> > > > >> >> wrote:
> >> > >> > > > >> >> >
> >> > >> > > > >> >> > > I am interested in this as well, but I haven't
> gotten
> >> > to a
> >> > >> > > point
> >> > >> > > > >> where
> >> > >> > > > >> >> I
> >> > >> > > > >> >> > > can have valuable input (I haven't tried other
> >> > >> transports). I
> >> > >> > > > know
> >> > >> > > > >> of a
> >> > >> > > > >> >> > > third party that is interested in Arrow for HPC
> >> > >> environments
> >> > >> > > that
> >> > >> > > > >> could
> >> > >> > > > >> >> > be
> >> > >> > > > >> >> > > interested in the proposal and I can see if they're
> >> > >> > interested
> >> > >> > > in
> >> > >> > > > >> >> > providing
> >> > >> > > > >> >> > > feedback.
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > I glanced at the document before but I'll go
> through
> >> > again
> >> > >> to
> >> > >> > > see
> >> > >> > > > >> if
> >> > >> > > > >> >> > there
> >> > >> > > > >> >> > > is anything I can comment on.
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > # ------------------------------
> >> > >> > > > >> >> > > # Aldrin
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > https://github.com/drin/
> >> > >> > > > >> >> > > https://gitlab.com/octalene
> >> > >> > > > >> >> > > https://keybase.io/octalene
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul
> >> Whalen <
> >> > >> > > > >> >> > pgwhalen@gmail.com>
> >> > >> > > > >> >> > > wrote:
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > As a potential "end user developer," (and
> aspiring
> >> > >> > > contributor)
> >> > >> > > > >> this
> >> > >> > > > >> >> > > > immediately excited me when I first saw it.
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > I work at a trading firm, and my team has
> >> developed an
> >> > >> IPC
> >> > >> > > > >> mechanism
> >> > >> > > > >> >> > for
> >> > >> > > > >> >> > > > efficiently transmitting pandas dataframes both
> >> > remotely
> >> > >> > via
> >> > >> > > > TCP
> >> > >> > > > >> and
> >> > >> > > > >> >> > > > locally via shared memory, where the interface
> for
> >> the
> >> > >> > > > >> application
> >> > >> > > > >> >> > > > developer is the same for both. The data in the
> >> > >> dataframes
> >> > >> > > may
> >> > >> > > > >> change
> >> > >> > > > >> >> > > > rapidly, so when communicating locally via shared
> >> > memory,
> >> > >> > if
> >> > >> > > > the
> >> > >> > > > >> >> shape
> >> > >> > > > >> >> > of
> >> > >> > > > >> >> > > > the dataframe doesn't change, we update the
> memory
> >> in
> >> > >> > place,
> >> > >> > > > >> >> > coordinating
> >> > >> > > > >> >> > > > between the producer and consumer via TCP.
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > We intend to move away from our remote TCP
> >> mechanism
> >> > >> > towards
> >> > >> > > > >> Arrow
> >> > >> > > > >> >> > > Flight,
> >> > >> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the
> >> > local
> >> > >> > > shared
> >> > >> > > > >> memory
> >> > >> > > > >> >> > > > mechanism which we previously did not have a good
> >> > answer
> >> > >> > for,
> >> > >> > > > it
> >> > >> > > > >> >> seems
> >> > >> > > > >> >> > > like
> >> > >> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our
> >> > problem.
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > So some features that enable our use case are:
> >> > >> > > > >> >> > > > - Updating existing batches in place is supported
> >> > >> > > > >> >> > > > - The interface is pretty similar to Flight
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > I'd imagine we're not the only financial firm to
> >> > >> implement
> >> > >> > > > >> something
> >> > >> > > > >> >> > like
> >> > >> > > > >> >> > > > this, given how widespread pandas usage is, so
> that
> >> > could
> >> > >> > be
> >> > >> > > a
> >> > >> > > > >> place
> >> > >> > > > >> >> to
> >> > >> > > > >> >> > > > seek feedback.
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > As I was reading the proposal initially, I
> gleaned
> >> > that
> >> > >> the
> >> > >> > > > most
> >> > >> > > > >> >> > > important
> >> > >> > > > >> >> > > > audience was those writing interfaces to
> >> GPUs/remote
> >> > >> > > > >> >> > memory/non-standard
> >> > >> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
> >> > >> updating
> >> > >> > > > >> batches in
> >> > >> > > > >> >> > > > place (and the producer/consumer coordination
> that
> >> > comes
> >> > >> > with
> >> > >> > > > >> that)
> >> > >> > > > >> >> was
> >> > >> > > > >> >> > > > supported or encouraged as part of the proposal.
> >> But
> >> > >> > > > regardless,
> >> > >> > > > >> as
> >> > >> > > > >> >> an
> >> > >> > > > >> >> > > end
> >> > >> > > > >> >> > > > user, this seems like an easier and more
> efficient
> >> > way to
> >> > >> > > glue
> >> > >> > > > >> pieces
> >> > >> > > > >> >> > in
> >> > >> > > > >> >> > > > the Arrow ecosystem together if it was adopted
> >> > broadly.
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > Paul
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> >> > >> > > > >> zotthewizard@gmail.com
> >> > >> > > > >> >> > wrote:
> >> > >> > > > >> >> > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > I'll continue my efforts of trying to reach out
> >> to
> >> > >> other
> >> > >> > > > >> interested
> >> > >> > > > >> >> > > > > parties, but if anyone else here has any
> >> contacts or
> >> > >> > > > >> connections
> >> > >> > > > >> >> that
> >> > >> > > > >> >> > > they
> >> > >> > > > >> >> > > > > think might be interested please forward them
> the
> >> > link
> >> > >> to
> >> > >> > > the
> >> > >> > > > >> >> Google
> >> > >> > > > >> >> > > doc.
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > I really do want to get as much engagement and
> >> > feedback
> >> > >> > as
> >> > >> > > > >> possible
> >> > >> > > > >> >> > on
> >> > >> > > > >> >> > > > > this.
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > Thanks!
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> >> > >> > > > wesmckinn@gmail.com
> >> > >> > > > >> >> > wrote:
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > Have there been efforts to proactively reach
> >> out
> >> > to
> >> > >> > other
> >> > >> > > > >> third
> >> > >> > > > >> >> > > parties
> >> > >> > > > >> >> > > > > > that might have an interest in this or be a
> >> > potential
> >> > >> > > user
> >> > >> > > > at
> >> > >> > > > >> >> some
> >> > >> > > > >> >> > > point?
> >> > >> > > > >> >> > > > > > There are a lot of interested parties in
> Arrow
> >> > that
> >> > >> may
> >> > >> > > not
> >> > >> > > > >> >> > actively
> >> > >> > > > >> >> > > > > > follow
> >> > >> > > > >> >> > > > > > the mailing list.
> >> > >> > > > >> >> > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
> >> > >> (especially
> >> > >> > > > >> folks at
> >> > >> > > > >> >> > > NVIDIA
> >> > >> > > > >> >> > > > > > or
> >> > >> > > > >> >> > > > > > working on UCX), or other communities like
> that
> >> > might
> >> > >> > > have
> >> > >> > > > >> >> > > constructive
> >> > >> > > > >> >> > > > > > thoughts about this. DLPack (
> >> > >> > > > >> >> https://dmlc.github.io/dlpack/latest/
> >> > >> > > > >> >> > )
> >> > >> > > > >> >> > > also
> >> > >> > > > >> >> > > > > > seems adjacent and worth reaching out to.
> Other
> >> > ideas
> >> > >> > for
> >> > >> > > > >> >> projects
> >> > >> > > > >> >> > or
> >> > >> > > > >> >> > > > > > companies that could be reached out to for
> >> > feedback.
> >> > >> > > > >> >> > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine
> Pitrou
> >> > >> > > > >> >> antoine@python.org
> >> > >> > > > >> >> > > > > > wrote:
> >> > >> > > > >> >> > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > If there's no engagement, then I'm afraid
> it
> >> > might
> >> > >> > mean
> >> > >> > > > >> that
> >> > >> > > > >> >> > third
> >> > >> > > > >> >> > > > > > > parties have no interest in this. I don't
> >> really
> >> > >> have
> >> > >> > > any
> >> > >> > > > >> >> > solution
> >> > >> > > > >> >> > > for
> >> > >> > > > >> >> > > > > > > generating engagement except nagging and
> >> pinging
> >> > >> > people
> >> > >> > > > >> >> > explicitly
> >> > >> > > > >> >> > > :-)
> >> > >> > > > >> >> > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> >> > >> > > > >> >> > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > I would like to see the same Antoine,
> >> > currently
> >> > >> > given
> >> > >> > > > the
> >> > >> > > > >> >> lack
> >> > >> > > > >> >> > of
> >> > >> > > > >> >> > > > > > > > engagement (both for OR against) I was
> >> going
> >> > to
> >> > >> > take
> >> > >> > > > the
> >> > >> > > > >> >> > silence
> >> > >> > > > >> >> > > as
> >> > >> > > > >> >> > > > > > > > assent
> >> > >> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members
> >> to
> >> > vote
> >> > >> > in
> >> > >> > > > >> this.
> >> > >> > > > >> >> > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > If anyone has any suggestions on how we
> >> could
> >> > >> > > > potentially
> >> > >> > > > >> >> > > generate
> >> > >> > > > >> >> > > > > > > > more
> >> > >> > > > >> >> > > > > > > > engagement and discussion on this, please
> >> let
> >> > me
> >> > >> > know
> >> > >> > > > as
> >> > >> > > > >> I
> >> > >> > > > >> >> want
> >> > >> > > > >> >> > > as
> >> > >> > > > >> >> > > > > > > > many
> >> > >> > > > >> >> > > > > > > > parties in the community as possible to
> be
> >> > part
> >> > >> of
> >> > >> > > > this.
> >> > >> > > > >> >> > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > Thanks everyone.
> >> > >> > > > >> >> > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > --Matt
> >> > >> > > > >> >> > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine
> >> > Pitrou
> >> > >> > > > >> >> > > antoine@python.org
> >> > >> > > > >> >> > > > > > > > wrote:
> >> > >> > > > >> >> > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > Hello,
> >> > >> > > > >> >> > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > I'd really like to see more engagement
> >> and
> >> > >> > > criticism
> >> > >> > > > >> from
> >> > >> > > > >> >> > > > > > > > > non-Voltron
> >> > >> > > > >> >> > > > > > > > > Data parties before this is formally
> >> > adopted as
> >> > >> > an
> >> > >> > > > >> Arrow
> >> > >> > > > >> >> > spec.
> >> > >> > > > >> >> > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > Regards
> >> > >> > > > >> >> > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > Antoine.
> >> > >> > > > >> >> > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a
> >> écrit :
> >> > >> > > > >> >> > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > Hey all,
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
> >> > >> officially
> >> > >> > > > >> adopt the
> >> > >> > > > >> >> > > protocol
> >> > >> > > > >> >> > > > > > > > > > described in the google doc[1] for
> >> > >> Dissociated
> >> > >> > > > Arrow
> >> > >> > > > >> IPC
> >> > >> > > > >> >> > > > > > > > > > Transports.
> >> > >> > > > >> >> > > > > > > > > > This
> >> > >> > > > >> >> > > > > > > > > > proposal was originally discussed at
> 2.
> >> > Once
> >> > >> > this
> >> > >> > > > >> >> proposal
> >> > >> > > > >> >> > is
> >> > >> > > > >> >> > > > > > > > > > adopted,
> >> > >> > > > >> >> > > > > > > > > > I
> >> > >> > > > >> >> > > > > > > > > > will work on adding the necessary
> >> > >> documentation
> >> > >> > > to
> >> > >> > > > >> the
> >> > >> > > > >> >> > Arrow
> >> > >> > > > >> >> > > > > > > > > > website
> >> > >> > > > >> >> > > > > > > > > > along
> >> > >> > > > >> >> > > > > > > > > > with examples etc.
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > The vote will be open for at least 72
> >> > hours.
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> >> > >> > > > >> >> > > > > > > > > > [ ] +0
> >> > >> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal
> >> > because...
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > Thank you everyone!
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > --Matt
> >> > >> > > > >> >> > > > > > > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > > > > > > > [1]:
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> > > > >
> >> > >> > > > >> >> > >
> >> > >> > > > >> >> >
> >> > >> > > > >> >>
> >> > >> > > > >>
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >> > >> > > > >> >> >
> >> > >> > > > >> >>
> >> > >> > > > >>
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Weston Pace <we...@gmail.com>.
Forgot link:

[1]
https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory

On Tue, Apr 2, 2024 at 11:38 AM Weston Pace <we...@gmail.com> wrote:

> Thanks for taking the time to address my concerns.
>
> > I've split the S3/HTTP URI flight pieces out into a separate document and
> > separate thing to vote on at the request of several people who wanted to
> > view these as two separate proposals to vote on. So this vote *only*
> covers
> > adopting the protocol spec as an "Experimental Protocol" so we can start
> > seeing real world usage to help refine and improve it. That said, I
> believe
> > all clients currently would reject any non-grpc URI.
>
> Ah, I was confused and my comments were mostly about the s3/http proposal.
>
> Regarding the proposal at hand, I went through it in more detail.  I don't
> know much about ucx so I considered two different use cases:
>
>  * The previously mentioned shared memory approach.  I think this is
> compelling as people have asked about shared memory communication from time
> to time and I've always suggested flight over unix sockets though that
> forces a copy.
>  * I think this could also form the basis for large transfers of arrow
> data over a wasm boundary.  Wasm has a concept of shared memory objects[1]
> and a wasm data library could use this to stream data into javascript
> without a copy.
>
> I've added a few more questions to the doc.  Either way, if we're only
> talking about an experimental protocol / suggested recommendation then I'm
> fine voting +1 on this (I'm not sure a formal vote is even needed).  I
> would want to see at least 2 implementations if we wanted to remove the
> experimental label.
>
> On Sun, Mar 31, 2024 at 2:43 PM Joel Lubinitsky <jo...@gmail.com>
> wrote:
>
>> +1 to the dissociated transports proposal
>>
>> On Sun, Mar 31, 2024 at 11:14 AM David Li <li...@apache.org> wrote:
>>
>> > +1 from me as before
>> >
>> > On Thu, Mar 28, 2024, at 18:06, Matt Topol wrote:
>> > >>  There is a word doc with no implementation or PR.  I think there
>> could
>> > > be an implementation / PR.
>> > >
>> > > In the word doc there is a link to a POC implementation[1] showing
>> this
>> > > protocol working with a flight service, ucx and libcudf. The key piece
>> > here
>> > > is that we're voting on adopting this protocol spec (i.e. I'll add it
>> to
>> > > the documentation website) rather than us explicitly providing full
>> > > implementations or abstractions around it. We can provide reference
>> > > implementations like the POC, but I don't think they should be in the
>> > Arrow
>> > > monorepo or else we run the risk of a lot of the same issues that
>> Flight
>> > > has: i.e. Adding anything to Flight in C++ requires fully wrapping the
>> > > grpc/flight primitives with Arrow equivalents to export which
>> increases
>> > the
>> > > maintenance burden on us and makes it more difficult for users to
>> > leverage
>> > > the underlying knobs and dials.
>> > >
>> > >> For example, does any ADBC client respect this protocol today?  If a
>> > > flight server responds with an S3/HTTP URI will the ADBC client
>> download
>> > > the files from the correct place?  Will it at least notice that the
>> URI
>> > is
>> > > not a GRPC URI and give a "I don't have a connector for downloading
>> from
>> > > HTTP/S3" error?
>> > >
>> > > I've split the S3/HTTP URI flight pieces out into a separate document
>> and
>> > > separate thing to vote on at the request of several people who wanted
>> to
>> > > view these as two separate proposals to vote on. So this vote *only*
>> > covers
>> > > adopting the protocol spec as an "Experimental Protocol" so we can
>> start
>> > > seeing real world usage to help refine and improve it. That said, I
>> > believe
>> > > all clients currently would reject any non-grpc URI.
>> > >
>> > >>   I was speaking with someone yesterday and they explained that
>> > > they ended up not choosing Flight for an internal project because
>> Flight
>> > > didn't support something called "cloud fetch" which I have now
>> learned is
>> > >
>> > > I was reading through that link, and it seems like it's pretty much
>> > > *identical* to Flight as it currently exists, except that it is using
>> > cloud
>> > > storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather
>> than a
>> > > service sitting in front of those serving up Arrow IPC *streams*.
>> Which
>> > has
>> > > been requested by others in the community, hence the second proposal
>> that
>> > > was split out [2].
>> > >
>> > >>  So a big +1 for the idea of disassociated transports but I'm not
>> sure
>> > why
>> > > we need a vote to start working on it (but I'm not opposed if a vote
>> > helps)
>> > >
>> > > Mostly I found that the google doc was easier for iterating on the
>> > protocol
>> > > specification than a markdown PR for the Arrow documentation as I
>> could
>> > > more visually express things without a preview of the rendered
>> markdown.
>> > If
>> > > it would get people to be more likely to vote on this, I can write up
>> the
>> > > documentation markdown now and create a PR rather than waiting until
>> we
>> > > decide we're even going to adopt this protocol as an "official" arrow
>> > > protocol.
>> > >
>> > > Lemme know if there's any other unanswered questions!
>> > >
>> > > --Matt
>> > >
>> > > [1]: https://github.com/zeroshade/cudf-flight-ucx
>> > > [2]:
>> > >
>> >
>> https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq
>> > >
>> > > On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com>
>> > wrote:
>> > >
>> > >> I'm sorry for the very late reply.  Until yesterday I had no real
>> > concept
>> > >> of what this was talking about and so I had stayed out.
>> > >>
>> > >> I'm +0 only because it isn't clear what we are voting on.  There is a
>> > word
>> > >> doc with no implementation or PR.  I think there could be an
>> > implementation
>> > >> / PR.  For example, does any ADBC client respect this protocol today?
>> > If a
>> > >> flight server responds with an S3/HTTP URI will the ADBC client
>> download
>> > >> the files from the correct place?  Will it at least notice that the
>> URI
>> > is
>> > >> not a GRPC URI and give a "I don't have a connector for downloading
>> from
>> > >> HTTP/S3" error?  In general, I think we do want this in Flight (see
>> > >> comments below) and I am very supportive of the idea.  However, if
>> > adopting
>> > >> this as an experimental proposal helps move this forward then I think
>> > >> that's fine.
>> > >>
>> > >> That being said, I do want to express support for the proposal as a
>> > >> concept, at least the "disassociated transports" portion (I can't
>> speak
>> > to
>> > >> UCX/etc.).  I was speaking with someone yesterday and they explained
>> > that
>> > >> they ended up not choosing Flight for an internal project because
>> Flight
>> > >> didn't support something called "cloud fetch" which I have now
>> learned
>> > is
>> > >> [1].  I had recalled looking at this proposal before and this person
>> > seemed
>> > >> interested and optimistic to know this was being considered for
>> Flight.
>> > >> This proposal, as I understand it, should make it possible for cloud
>> > >> servers to support a cloud fetch style API.  From the discussion I
>> got
>> > the
>> > >> impression that this cloud fetch approach is useful and generally
>> > >> applicable.
>> > >>
>> > >> So a big +1 for the idea of disassociated transports but I'm not sure
>> > why
>> > >> we need a vote to start working on it (but I'm not opposed if a vote
>> > helps)
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> >
>> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
>> > >>
>> > >> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com>
>> > wrote:
>> > >>
>> > >> > I'll keep this new vote open for at least the next 72 hours. As
>> before
>> > >> > please reply with:
>> > >> >
>> > >> > [ ] +1 Accept this Proposal
>> > >> > [ ] +0
>> > >> > [ ] -1 Do not accept this proposal because...
>> > >> >
>> > >> > Thanks everyone!
>> > >> >
>> > >> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <
>> > bengilgit@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > +1
>> > >> > >
>> > >> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com>
>> > wrote:
>> > >> > >
>> > >> > > > Should I start a new thread for a new vote? Or repeat the
>> original
>> > >> vote
>> > >> > > > email here?
>> > >> > > >
>> > >> > > > Just asking since there hasn't been any responses so far.
>> > >> > > >
>> > >> > > > --Matt
>> > >> > > >
>> > >> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <
>> > zotthewizard@gmail.com>
>> > >> > > > wrote:
>> > >> > > >
>> > >> > > > > Absolutely, it will be marked experimental until we see some
>> > people
>> > >> > > using
>> > >> > > > > it and can get more real-world feedback.
>> > >> > > > >
>> > >> > > > > There's also already a couple things that will be
>> followed-up on
>> > >> > after
>> > >> > > > the
>> > >> > > > > initial adoption for expansion which were discussed in the
>> > >> comments.
>> > >> > > > >
>> > >> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <lidavidm@apache.org
>> >
>> > >> wrote:
>> > >> > > > >
>> > >> > > > >> I think let's try again. Would it be reasonable to declare
>> this
>> > >> > > > >> 'experimental' for the time being, just as we did with
>> > >> Flight/Flight
>> > >> > > > >> SQL/etc?
>> > >> > > > >>
>> > >> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
>> > >> > > > >> > Hey All, It's been another month and we've gotten a whole
>> > bunch
>> > >> of
>> > >> > > > >> feedback
>> > >> > > > >> > and engagement on the document from a variety of
>> individuals.
>> > >> > Myself
>> > >> > > > >> and a
>> > >> > > > >> > few others have proactively attempted to reach out to as
>> many
>> > >> > third
>> > >> > > > >> parties
>> > >> > > > >> > as we could, hoping to pull more engagement also. While it
>> > would
>> > >> > be
>> > >> > > > >> great
>> > >> > > > >> > to get even more feedback, the comments have slowed down
>> and
>> > we
>> > >> > > > haven't
>> > >> > > > >> > gotten anything in a few days at this point.
>> > >> > > > >> >
>> > >> > > > >> > If there's no objections, I'd like to try to open up for
>> > voting
>> > >> > > again
>> > >> > > > to
>> > >> > > > >> > officially adopt this as a protocol to add to our docs.
>> > >> > > > >> >
>> > >> > > > >> > Thanks all!
>> > >> > > > >> >
>> > >> > > > >> > --Matt
>> > >> > > > >> >
>> > >> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <
>> > pgwhalen@gmail.com>
>> > >> > > > wrote:
>> > >> > > > >> >
>> > >> > > > >> >> Agreed that it makes sense not to focus on in-place
>> updating
>> > >> for
>> > >> > > this
>> > >> > > > >> >> proposal.  I’m not even sure it’s a great fit as a
>> “general
>> > >> > > purpose”
>> > >> > > > >> Arrow
>> > >> > > > >> >> protocol, because of all the assumptions and restrictions
>> > >> > required
>> > >> > > as
>> > >> > > > >> you
>> > >> > > > >> >> noted.
>> > >> > > > >> >>
>> > >> > > > >> >> I took another look at the proposal and don’t think
>> there’s
>> > >> > > anything
>> > >> > > > >> >> preventing in-place updating in the future - ultimately
>> the
>> > >> data
>> > >> > > body
>> > >> > > > >> could
>> > >> > > > >> >> just be in the same location for subsequent messages.
>> > >> > > > >> >>
>> > >> > > > >> >> Thanks!
>> > >> > > > >> >> Paul
>> > >> > > > >> >>
>> > >> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
>> > >> > zotthewizard@gmail.com>
>> > >> > > > >> wrote:
>> > >> > > > >> >>
>> > >> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and
>> > >> aspiring
>> > >> > > > >> >> > contributor) this
>> > >> > > > >> >> > immediately excited me when I first saw it.
>> > >> > > > >> >> >
>> > >> > > > >> >> > Yay! Good to hear that!
>> > >> > > > >> >> >
>> > >> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating
>> > >> batches
>> > >> > > in
>> > >> > > > >> >> > place (and the producer/consumer coordination that
>> comes
>> > with
>> > >> > > that)
>> > >> > > > >> was
>> > >> > > > >> >> > supported or encouraged as part of the proposal.
>> > >> > > > >> >> >
>> > >> > > > >> >> > So, updating batches in place was not a particular
>> > use-case
>> > >> we
>> > >> > > were
>> > >> > > > >> >> > targeting with this approach. Instead using shared
>> memory
>> > to
>> > >> > > > produce
>> > >> > > > >> and
>> > >> > > > >> >> > consume the buffers/batches without having to
>> physically
>> > copy
>> > >> > the
>> > >> > > > >> data.
>> > >> > > > >> >> > Trying to update a batch in place is a dangerous
>> prospect
>> > >> for a
>> > >> > > > >> number of
>> > >> > > > >> >> > reasons, but as you've mentioned it can technically be
>> > made
>> > >> > safe
>> > >> > > if
>> > >> > > > >> the
>> > >> > > > >> >> > shape is staying the same and you're only modifying
>> > >> fixed-width
>> > >> > > > data
>> > >> > > > >> >> types
>> > >> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes
>> of
>> > the
>> > >> > > > >> underlying
>> > >> > > > >> >> > data buffers are also remaining unchanged). The
>> > >> > producer/consumer
>> > >> > > > >> >> > coordination that would be needed for updating batches
>> in
>> > >> place
>> > >> > > is
>> > >> > > > >> not
>> > >> > > > >> >> part
>> > >> > > > >> >> > of this proposal but is definitely something we can
>> look
>> > into
>> > >> > as
>> > >> > > a
>> > >> > > > >> >> > follow-up to this for extending it. There's a number of
>> > >> > > discussions
>> > >> > > > >> that
>> > >> > > > >> >> > would need to be had around that so I don't want to
>> add on
>> > >> > > another
>> > >> > > > >> >> > complexity to this already complex proposal.
>> > >> > > > >> >> >
>> > >> > > > >> >> > That said, if you or anyone see something in this
>> proposal
>> > >> that
>> > >> > > > would
>> > >> > > > >> >> > hinder or prevent being able to use it for your use
>> case
>> > >> please
>> > >> > > let
>> > >> > > > >> me
>> > >> > > > >> >> know
>> > >> > > > >> >> > so we can address it. Even though the proposal as it
>> > >> currently
>> > >> > > > exists
>> > >> > > > >> >> > doesn't fully support the in-place updating of
>> batches, I
>> > >> don't
>> > >> > > > want
>> > >> > > > >> to
>> > >> > > > >> >> > make things harder for us in such a follow-up where
>> we'd
>> > end
>> > >> up
>> > >> > > > >> requiring
>> > >> > > > >> >> > an entirely new protocol to support that.
>> > >> > > > >> >> >
>> > >> > > > >> >> > > @octalene.dev: I know of a third party that is
>> > interested
>> > >> in
>> > >> > > > >> Arrow for
>> > >> > > > >> >> > HPC environments that could be interested in the
>> proposal
>> > >> and I
>> > >> > > can
>> > >> > > > >> see
>> > >> > > > >> >> if
>> > >> > > > >> >> > they're interested in providing feedback.
>> > >> > > > >> >> >
>> > >> > > > >> >> > Awesome! Thanks much!
>> > >> > > > >> >> >
>> > >> > > > >> >> >
>> > >> > > > >> >> > For reference to anyone who hasn't looked at the
>> document
>> > in
>> > >> a
>> > >> > > > while,
>> > >> > > > >> >> since
>> > >> > > > >> >> > the original discussion thread on this I have added a
>> full
>> > >> > > > >> "Background
>> > >> > > > >> >> > Context" page to the beginning of the proposal to help
>> > anyone
>> > >> > who
>> > >> > > > >> isn't
>> > >> > > > >> >> > already familiar with the issues this protocol is
>> trying
>> > to
>> > >> > solve
>> > >> > > > or
>> > >> > > > >> >> isn't
>> > >> > > > >> >> > already familiar with ucx or libfabric transports to
>> > better
>> > >> > > > >> understand
>> > >> > > > >> >> > *why* I'm
>> > >> > > > >> >> > proposing this and what it is trying to solve. The
>> point
>> > of
>> > >> > this
>> > >> > > > >> >> background
>> > >> > > > >> >> > information is to help ensure that anyone who might
>> have
>> > >> > thoughts
>> > >> > > > on
>> > >> > > > >> >> > protocols in general or APIs should still be able to
>> > >> understand
>> > >> > > the
>> > >> > > > >> base
>> > >> > > > >> >> > reasons and goals that we're trying to achieve with
>> this
>> > >> > protocol
>> > >> > > > >> >> proposal.
>> > >> > > > >> >> > You don't need to already understand managing
>> GPU/device
>> > >> memory
>> > >> > > or
>> > >> > > > >> ucx to
>> > >> > > > >> >> > be able to have meaningful input on the document.
>> > >> > > > >> >> >
>> > >> > > > >> >> > Thanks again to all who have contributed so far and
>> please
>> > >> > spread
>> > >> > > > to
>> > >> > > > >> any
>> > >> > > > >> >> > contacts that you think might be interested in this for
>> > their
>> > >> > > > >> particular
>> > >> > > > >> >> > use cases.
>> > >> > > > >> >> >
>> > >> > > > >> >> > --Matt
>> > >> > > > >> >> >
>> > >> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
>> > >> > > <octalene.dev@pm.me.invalid
>> > >> > > > >
>> > >> > > > >> >> wrote:
>> > >> > > > >> >> >
>> > >> > > > >> >> > > I am interested in this as well, but I haven't gotten
>> > to a
>> > >> > > point
>> > >> > > > >> where
>> > >> > > > >> >> I
>> > >> > > > >> >> > > can have valuable input (I haven't tried other
>> > >> transports). I
>> > >> > > > know
>> > >> > > > >> of a
>> > >> > > > >> >> > > third party that is interested in Arrow for HPC
>> > >> environments
>> > >> > > that
>> > >> > > > >> could
>> > >> > > > >> >> > be
>> > >> > > > >> >> > > interested in the proposal and I can see if they're
>> > >> > interested
>> > >> > > in
>> > >> > > > >> >> > providing
>> > >> > > > >> >> > > feedback.
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > I glanced at the document before but I'll go through
>> > again
>> > >> to
>> > >> > > see
>> > >> > > > >> if
>> > >> > > > >> >> > there
>> > >> > > > >> >> > > is anything I can comment on.
>> > >> > > > >> >> > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > # ------------------------------
>> > >> > > > >> >> > > # Aldrin
>> > >> > > > >> >> > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > https://github.com/drin/
>> > >> > > > >> >> > > https://gitlab.com/octalene
>> > >> > > > >> >> > > https://keybase.io/octalene
>> > >> > > > >> >> > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul
>> Whalen <
>> > >> > > > >> >> > pgwhalen@gmail.com>
>> > >> > > > >> >> > > wrote:
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > As a potential "end user developer," (and aspiring
>> > >> > > contributor)
>> > >> > > > >> this
>> > >> > > > >> >> > > > immediately excited me when I first saw it.
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > I work at a trading firm, and my team has
>> developed an
>> > >> IPC
>> > >> > > > >> mechanism
>> > >> > > > >> >> > for
>> > >> > > > >> >> > > > efficiently transmitting pandas dataframes both
>> > remotely
>> > >> > via
>> > >> > > > TCP
>> > >> > > > >> and
>> > >> > > > >> >> > > > locally via shared memory, where the interface for
>> the
>> > >> > > > >> application
>> > >> > > > >> >> > > > developer is the same for both. The data in the
>> > >> dataframes
>> > >> > > may
>> > >> > > > >> change
>> > >> > > > >> >> > > > rapidly, so when communicating locally via shared
>> > memory,
>> > >> > if
>> > >> > > > the
>> > >> > > > >> >> shape
>> > >> > > > >> >> > of
>> > >> > > > >> >> > > > the dataframe doesn't change, we update the memory
>> in
>> > >> > place,
>> > >> > > > >> >> > coordinating
>> > >> > > > >> >> > > > between the producer and consumer via TCP.
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > We intend to move away from our remote TCP
>> mechanism
>> > >> > towards
>> > >> > > > >> Arrow
>> > >> > > > >> >> > > Flight,
>> > >> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the
>> > local
>> > >> > > shared
>> > >> > > > >> memory
>> > >> > > > >> >> > > > mechanism which we previously did not have a good
>> > answer
>> > >> > for,
>> > >> > > > it
>> > >> > > > >> >> seems
>> > >> > > > >> >> > > like
>> > >> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our
>> > problem.
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > So some features that enable our use case are:
>> > >> > > > >> >> > > > - Updating existing batches in place is supported
>> > >> > > > >> >> > > > - The interface is pretty similar to Flight
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > I'd imagine we're not the only financial firm to
>> > >> implement
>> > >> > > > >> something
>> > >> > > > >> >> > like
>> > >> > > > >> >> > > > this, given how widespread pandas usage is, so that
>> > could
>> > >> > be
>> > >> > > a
>> > >> > > > >> place
>> > >> > > > >> >> to
>> > >> > > > >> >> > > > seek feedback.
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > As I was reading the proposal initially, I gleaned
>> > that
>> > >> the
>> > >> > > > most
>> > >> > > > >> >> > > important
>> > >> > > > >> >> > > > audience was those writing interfaces to
>> GPUs/remote
>> > >> > > > >> >> > memory/non-standard
>> > >> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
>> > >> updating
>> > >> > > > >> batches in
>> > >> > > > >> >> > > > place (and the producer/consumer coordination that
>> > comes
>> > >> > with
>> > >> > > > >> that)
>> > >> > > > >> >> was
>> > >> > > > >> >> > > > supported or encouraged as part of the proposal.
>> But
>> > >> > > > regardless,
>> > >> > > > >> as
>> > >> > > > >> >> an
>> > >> > > > >> >> > > end
>> > >> > > > >> >> > > > user, this seems like an easier and more efficient
>> > way to
>> > >> > > glue
>> > >> > > > >> pieces
>> > >> > > > >> >> > in
>> > >> > > > >> >> > > > the Arrow ecosystem together if it was adopted
>> > broadly.
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > Paul
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
>> > >> > > > >> zotthewizard@gmail.com
>> > >> > > > >> >> > wrote:
>> > >> > > > >> >> > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > I'll continue my efforts of trying to reach out
>> to
>> > >> other
>> > >> > > > >> interested
>> > >> > > > >> >> > > > > parties, but if anyone else here has any
>> contacts or
>> > >> > > > >> connections
>> > >> > > > >> >> that
>> > >> > > > >> >> > > they
>> > >> > > > >> >> > > > > think might be interested please forward them the
>> > link
>> > >> to
>> > >> > > the
>> > >> > > > >> >> Google
>> > >> > > > >> >> > > doc.
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > I really do want to get as much engagement and
>> > feedback
>> > >> > as
>> > >> > > > >> possible
>> > >> > > > >> >> > on
>> > >> > > > >> >> > > > > this.
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > Thanks!
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
>> > >> > > > wesmckinn@gmail.com
>> > >> > > > >> >> > wrote:
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > Have there been efforts to proactively reach
>> out
>> > to
>> > >> > other
>> > >> > > > >> third
>> > >> > > > >> >> > > parties
>> > >> > > > >> >> > > > > > that might have an interest in this or be a
>> > potential
>> > >> > > user
>> > >> > > > at
>> > >> > > > >> >> some
>> > >> > > > >> >> > > point?
>> > >> > > > >> >> > > > > > There are a lot of interested parties in Arrow
>> > that
>> > >> may
>> > >> > > not
>> > >> > > > >> >> > actively
>> > >> > > > >> >> > > > > > follow
>> > >> > > > >> >> > > > > > the mailing list.
>> > >> > > > >> >> > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
>> > >> (especially
>> > >> > > > >> folks at
>> > >> > > > >> >> > > NVIDIA
>> > >> > > > >> >> > > > > > or
>> > >> > > > >> >> > > > > > working on UCX), or other communities like that
>> > might
>> > >> > > have
>> > >> > > > >> >> > > constructive
>> > >> > > > >> >> > > > > > thoughts about this. DLPack (
>> > >> > > > >> >> https://dmlc.github.io/dlpack/latest/
>> > >> > > > >> >> > )
>> > >> > > > >> >> > > also
>> > >> > > > >> >> > > > > > seems adjacent and worth reaching out to. Other
>> > ideas
>> > >> > for
>> > >> > > > >> >> projects
>> > >> > > > >> >> > or
>> > >> > > > >> >> > > > > > companies that could be reached out to for
>> > feedback.
>> > >> > > > >> >> > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
>> > >> > > > >> >> antoine@python.org
>> > >> > > > >> >> > > > > > wrote:
>> > >> > > > >> >> > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > If there's no engagement, then I'm afraid it
>> > might
>> > >> > mean
>> > >> > > > >> that
>> > >> > > > >> >> > third
>> > >> > > > >> >> > > > > > > parties have no interest in this. I don't
>> really
>> > >> have
>> > >> > > any
>> > >> > > > >> >> > solution
>> > >> > > > >> >> > > for
>> > >> > > > >> >> > > > > > > generating engagement except nagging and
>> pinging
>> > >> > people
>> > >> > > > >> >> > explicitly
>> > >> > > > >> >> > > :-)
>> > >> > > > >> >> > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
>> > >> > > > >> >> > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > I would like to see the same Antoine,
>> > currently
>> > >> > given
>> > >> > > > the
>> > >> > > > >> >> lack
>> > >> > > > >> >> > of
>> > >> > > > >> >> > > > > > > > engagement (both for OR against) I was
>> going
>> > to
>> > >> > take
>> > >> > > > the
>> > >> > > > >> >> > silence
>> > >> > > > >> >> > > as
>> > >> > > > >> >> > > > > > > > assent
>> > >> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members
>> to
>> > vote
>> > >> > in
>> > >> > > > >> this.
>> > >> > > > >> >> > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > If anyone has any suggestions on how we
>> could
>> > >> > > > potentially
>> > >> > > > >> >> > > generate
>> > >> > > > >> >> > > > > > > > more
>> > >> > > > >> >> > > > > > > > engagement and discussion on this, please
>> let
>> > me
>> > >> > know
>> > >> > > > as
>> > >> > > > >> I
>> > >> > > > >> >> want
>> > >> > > > >> >> > > as
>> > >> > > > >> >> > > > > > > > many
>> > >> > > > >> >> > > > > > > > parties in the community as possible to be
>> > part
>> > >> of
>> > >> > > > this.
>> > >> > > > >> >> > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > Thanks everyone.
>> > >> > > > >> >> > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > --Matt
>> > >> > > > >> >> > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine
>> > Pitrou
>> > >> > > > >> >> > > antoine@python.org
>> > >> > > > >> >> > > > > > > > wrote:
>> > >> > > > >> >> > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > Hello,
>> > >> > > > >> >> > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > I'd really like to see more engagement
>> and
>> > >> > > criticism
>> > >> > > > >> from
>> > >> > > > >> >> > > > > > > > > non-Voltron
>> > >> > > > >> >> > > > > > > > > Data parties before this is formally
>> > adopted as
>> > >> > an
>> > >> > > > >> Arrow
>> > >> > > > >> >> > spec.
>> > >> > > > >> >> > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > Regards
>> > >> > > > >> >> > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > Antoine.
>> > >> > > > >> >> > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a
>> écrit :
>> > >> > > > >> >> > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > Hey all,
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
>> > >> officially
>> > >> > > > >> adopt the
>> > >> > > > >> >> > > protocol
>> > >> > > > >> >> > > > > > > > > > described in the google doc[1] for
>> > >> Dissociated
>> > >> > > > Arrow
>> > >> > > > >> IPC
>> > >> > > > >> >> > > > > > > > > > Transports.
>> > >> > > > >> >> > > > > > > > > > This
>> > >> > > > >> >> > > > > > > > > > proposal was originally discussed at 2.
>> > Once
>> > >> > this
>> > >> > > > >> >> proposal
>> > >> > > > >> >> > is
>> > >> > > > >> >> > > > > > > > > > adopted,
>> > >> > > > >> >> > > > > > > > > > I
>> > >> > > > >> >> > > > > > > > > > will work on adding the necessary
>> > >> documentation
>> > >> > > to
>> > >> > > > >> the
>> > >> > > > >> >> > Arrow
>> > >> > > > >> >> > > > > > > > > > website
>> > >> > > > >> >> > > > > > > > > > along
>> > >> > > > >> >> > > > > > > > > > with examples etc.
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > The vote will be open for at least 72
>> > hours.
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
>> > >> > > > >> >> > > > > > > > > > [ ] +0
>> > >> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal
>> > because...
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > Thank you everyone!
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > --Matt
>> > >> > > > >> >> > > > > > > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > > > > > > > [1]:
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> > > > >
>> > >> > > > >> >> > >
>> > >> > > > >> >> >
>> > >> > > > >> >>
>> > >> > > > >>
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>> > >> > > > >> >> >
>> > >> > > > >> >>
>> > >> > > > >>
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>>
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Weston Pace <we...@gmail.com>.
Thanks for taking the time to address my concerns.

> I've split the S3/HTTP URI flight pieces out into a separate document and
> separate thing to vote on at the request of several people who wanted to
> view these as two separate proposals to vote on. So this vote *only*
covers
> adopting the protocol spec as an "Experimental Protocol" so we can start
> seeing real world usage to help refine and improve it. That said, I
believe
> all clients currently would reject any non-grpc URI.

Ah, I was confused and my comments were mostly about the s3/http proposal.

Regarding the proposal at hand, I went through it in more detail.  I don't
know much about ucx so I considered two different use cases:

 * The previously mentioned shared memory approach.  I think this is
compelling as people have asked about shared memory communication from time
to time and I've always suggested flight over unix sockets though that
forces a copy.
 * I think this could also form the basis for large transfers of arrow data
over a wasm boundary.  Wasm has a concept of shared memory objects[1] and a
wasm data library could use this to stream data into javascript without a
copy.

I've added a few more questions to the doc.  Either way, if we're only
talking about an experimental protocol / suggested recommendation then I'm
fine voting +1 on this (I'm not sure a formal vote is even needed).  I
would want to see at least 2 implementations if we wanted to remove the
experimental label.

On Sun, Mar 31, 2024 at 2:43 PM Joel Lubinitsky <jo...@gmail.com> wrote:

> +1 to the dissociated transports proposal
>
> On Sun, Mar 31, 2024 at 11:14 AM David Li <li...@apache.org> wrote:
>
> > +1 from me as before
> >
> > On Thu, Mar 28, 2024, at 18:06, Matt Topol wrote:
> > >>  There is a word doc with no implementation or PR.  I think there
> could
> > > be an implementation / PR.
> > >
> > > In the word doc there is a link to a POC implementation[1] showing this
> > > protocol working with a flight service, ucx and libcudf. The key piece
> > here
> > > is that we're voting on adopting this protocol spec (i.e. I'll add it
> to
> > > the documentation website) rather than us explicitly providing full
> > > implementations or abstractions around it. We can provide reference
> > > implementations like the POC, but I don't think they should be in the
> > Arrow
> > > monorepo or else we run the risk of a lot of the same issues that
> Flight
> > > has: i.e. Adding anything to Flight in C++ requires fully wrapping the
> > > grpc/flight primitives with Arrow equivalents to export which increases
> > the
> > > maintenance burden on us and makes it more difficult for users to
> > leverage
> > > the underlying knobs and dials.
> > >
> > >> For example, does any ADBC client respect this protocol today?  If a
> > > flight server responds with an S3/HTTP URI will the ADBC client
> download
> > > the files from the correct place?  Will it at least notice that the URI
> > is
> > > not a GRPC URI and give a "I don't have a connector for downloading
> from
> > > HTTP/S3" error?
> > >
> > > I've split the S3/HTTP URI flight pieces out into a separate document
> and
> > > separate thing to vote on at the request of several people who wanted
> to
> > > view these as two separate proposals to vote on. So this vote *only*
> > covers
> > > adopting the protocol spec as an "Experimental Protocol" so we can
> start
> > > seeing real world usage to help refine and improve it. That said, I
> > believe
> > > all clients currently would reject any non-grpc URI.
> > >
> > >>   I was speaking with someone yesterday and they explained that
> > > they ended up not choosing Flight for an internal project because
> Flight
> > > didn't support something called "cloud fetch" which I have now learned
> is
> > >
> > > I was reading through that link, and it seems like it's pretty much
> > > *identical* to Flight as it currently exists, except that it is using
> > cloud
> > > storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather than
> a
> > > service sitting in front of those serving up Arrow IPC *streams*. Which
> > has
> > > been requested by others in the community, hence the second proposal
> that
> > > was split out [2].
> > >
> > >>  So a big +1 for the idea of disassociated transports but I'm not sure
> > why
> > > we need a vote to start working on it (but I'm not opposed if a vote
> > helps)
> > >
> > > Mostly I found that the google doc was easier for iterating on the
> > protocol
> > > specification than a markdown PR for the Arrow documentation as I could
> > > more visually express things without a preview of the rendered
> markdown.
> > If
> > > it would get people to be more likely to vote on this, I can write up
> the
> > > documentation markdown now and create a PR rather than waiting until we
> > > decide we're even going to adopt this protocol as an "official" arrow
> > > protocol.
> > >
> > > Lemme know if there's any other unanswered questions!
> > >
> > > --Matt
> > >
> > > [1]: https://github.com/zeroshade/cudf-flight-ucx
> > > [2]:
> > >
> >
> https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq
> > >
> > > On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com>
> > wrote:
> > >
> > >> I'm sorry for the very late reply.  Until yesterday I had no real
> > concept
> > >> of what this was talking about and so I had stayed out.
> > >>
> > >> I'm +0 only because it isn't clear what we are voting on.  There is a
> > word
> > >> doc with no implementation or PR.  I think there could be an
> > implementation
> > >> / PR.  For example, does any ADBC client respect this protocol today?
> > If a
> > >> flight server responds with an S3/HTTP URI will the ADBC client
> download
> > >> the files from the correct place?  Will it at least notice that the
> URI
> > is
> > >> not a GRPC URI and give a "I don't have a connector for downloading
> from
> > >> HTTP/S3" error?  In general, I think we do want this in Flight (see
> > >> comments below) and I am very supportive of the idea.  However, if
> > adopting
> > >> this as an experimental proposal helps move this forward then I think
> > >> that's fine.
> > >>
> > >> That being said, I do want to express support for the proposal as a
> > >> concept, at least the "disassociated transports" portion (I can't
> speak
> > to
> > >> UCX/etc.).  I was speaking with someone yesterday and they explained
> > that
> > >> they ended up not choosing Flight for an internal project because
> Flight
> > >> didn't support something called "cloud fetch" which I have now learned
> > is
> > >> [1].  I had recalled looking at this proposal before and this person
> > seemed
> > >> interested and optimistic to know this was being considered for
> Flight.
> > >> This proposal, as I understand it, should make it possible for cloud
> > >> servers to support a cloud fetch style API.  From the discussion I got
> > the
> > >> impression that this cloud fetch approach is useful and generally
> > >> applicable.
> > >>
> > >> So a big +1 for the idea of disassociated transports but I'm not sure
> > why
> > >> we need a vote to start working on it (but I'm not opposed if a vote
> > helps)
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
> > >>
> > >> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com>
> > wrote:
> > >>
> > >> > I'll keep this new vote open for at least the next 72 hours. As
> before
> > >> > please reply with:
> > >> >
> > >> > [ ] +1 Accept this Proposal
> > >> > [ ] +0
> > >> > [ ] -1 Do not accept this proposal because...
> > >> >
> > >> > Thanks everyone!
> > >> >
> > >> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <
> > bengilgit@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > Should I start a new thread for a new vote? Or repeat the
> original
> > >> vote
> > >> > > > email here?
> > >> > > >
> > >> > > > Just asking since there hasn't been any responses so far.
> > >> > > >
> > >> > > > --Matt
> > >> > > >
> > >> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <
> > zotthewizard@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Absolutely, it will be marked experimental until we see some
> > people
> > >> > > using
> > >> > > > > it and can get more real-world feedback.
> > >> > > > >
> > >> > > > > There's also already a couple things that will be followed-up
> on
> > >> > after
> > >> > > > the
> > >> > > > > initial adoption for expansion which were discussed in the
> > >> comments.
> > >> > > > >
> > >> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org>
> > >> wrote:
> > >> > > > >
> > >> > > > >> I think let's try again. Would it be reasonable to declare
> this
> > >> > > > >> 'experimental' for the time being, just as we did with
> > >> Flight/Flight
> > >> > > > >> SQL/etc?
> > >> > > > >>
> > >> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> > >> > > > >> > Hey All, It's been another month and we've gotten a whole
> > bunch
> > >> of
> > >> > > > >> feedback
> > >> > > > >> > and engagement on the document from a variety of
> individuals.
> > >> > Myself
> > >> > > > >> and a
> > >> > > > >> > few others have proactively attempted to reach out to as
> many
> > >> > third
> > >> > > > >> parties
> > >> > > > >> > as we could, hoping to pull more engagement also. While it
> > would
> > >> > be
> > >> > > > >> great
> > >> > > > >> > to get even more feedback, the comments have slowed down
> and
> > we
> > >> > > > haven't
> > >> > > > >> > gotten anything in a few days at this point.
> > >> > > > >> >
> > >> > > > >> > If there's no objections, I'd like to try to open up for
> > voting
> > >> > > again
> > >> > > > to
> > >> > > > >> > officially adopt this as a protocol to add to our docs.
> > >> > > > >> >
> > >> > > > >> > Thanks all!
> > >> > > > >> >
> > >> > > > >> > --Matt
> > >> > > > >> >
> > >> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <
> > pgwhalen@gmail.com>
> > >> > > > wrote:
> > >> > > > >> >
> > >> > > > >> >> Agreed that it makes sense not to focus on in-place
> updating
> > >> for
> > >> > > this
> > >> > > > >> >> proposal.  I’m not even sure it’s a great fit as a
> “general
> > >> > > purpose”
> > >> > > > >> Arrow
> > >> > > > >> >> protocol, because of all the assumptions and restrictions
> > >> > required
> > >> > > as
> > >> > > > >> you
> > >> > > > >> >> noted.
> > >> > > > >> >>
> > >> > > > >> >> I took another look at the proposal and don’t think
> there’s
> > >> > > anything
> > >> > > > >> >> preventing in-place updating in the future - ultimately
> the
> > >> data
> > >> > > body
> > >> > > > >> could
> > >> > > > >> >> just be in the same location for subsequent messages.
> > >> > > > >> >>
> > >> > > > >> >> Thanks!
> > >> > > > >> >> Paul
> > >> > > > >> >>
> > >> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
> > >> > zotthewizard@gmail.com>
> > >> > > > >> wrote:
> > >> > > > >> >>
> > >> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and
> > >> aspiring
> > >> > > > >> >> > contributor) this
> > >> > > > >> >> > immediately excited me when I first saw it.
> > >> > > > >> >> >
> > >> > > > >> >> > Yay! Good to hear that!
> > >> > > > >> >> >
> > >> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating
> > >> batches
> > >> > > in
> > >> > > > >> >> > place (and the producer/consumer coordination that comes
> > with
> > >> > > that)
> > >> > > > >> was
> > >> > > > >> >> > supported or encouraged as part of the proposal.
> > >> > > > >> >> >
> > >> > > > >> >> > So, updating batches in place was not a particular
> > use-case
> > >> we
> > >> > > were
> > >> > > > >> >> > targeting with this approach. Instead using shared
> memory
> > to
> > >> > > > produce
> > >> > > > >> and
> > >> > > > >> >> > consume the buffers/batches without having to physically
> > copy
> > >> > the
> > >> > > > >> data.
> > >> > > > >> >> > Trying to update a batch in place is a dangerous
> prospect
> > >> for a
> > >> > > > >> number of
> > >> > > > >> >> > reasons, but as you've mentioned it can technically be
> > made
> > >> > safe
> > >> > > if
> > >> > > > >> the
> > >> > > > >> >> > shape is staying the same and you're only modifying
> > >> fixed-width
> > >> > > > data
> > >> > > > >> >> types
> > >> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of
> > the
> > >> > > > >> underlying
> > >> > > > >> >> > data buffers are also remaining unchanged). The
> > >> > producer/consumer
> > >> > > > >> >> > coordination that would be needed for updating batches
> in
> > >> place
> > >> > > is
> > >> > > > >> not
> > >> > > > >> >> part
> > >> > > > >> >> > of this proposal but is definitely something we can look
> > into
> > >> > as
> > >> > > a
> > >> > > > >> >> > follow-up to this for extending it. There's a number of
> > >> > > discussions
> > >> > > > >> that
> > >> > > > >> >> > would need to be had around that so I don't want to add
> on
> > >> > > another
> > >> > > > >> >> > complexity to this already complex proposal.
> > >> > > > >> >> >
> > >> > > > >> >> > That said, if you or anyone see something in this
> proposal
> > >> that
> > >> > > > would
> > >> > > > >> >> > hinder or prevent being able to use it for your use case
> > >> please
> > >> > > let
> > >> > > > >> me
> > >> > > > >> >> know
> > >> > > > >> >> > so we can address it. Even though the proposal as it
> > >> currently
> > >> > > > exists
> > >> > > > >> >> > doesn't fully support the in-place updating of batches,
> I
> > >> don't
> > >> > > > want
> > >> > > > >> to
> > >> > > > >> >> > make things harder for us in such a follow-up where we'd
> > end
> > >> up
> > >> > > > >> requiring
> > >> > > > >> >> > an entirely new protocol to support that.
> > >> > > > >> >> >
> > >> > > > >> >> > > @octalene.dev: I know of a third party that is
> > interested
> > >> in
> > >> > > > >> Arrow for
> > >> > > > >> >> > HPC environments that could be interested in the
> proposal
> > >> and I
> > >> > > can
> > >> > > > >> see
> > >> > > > >> >> if
> > >> > > > >> >> > they're interested in providing feedback.
> > >> > > > >> >> >
> > >> > > > >> >> > Awesome! Thanks much!
> > >> > > > >> >> >
> > >> > > > >> >> >
> > >> > > > >> >> > For reference to anyone who hasn't looked at the
> document
> > in
> > >> a
> > >> > > > while,
> > >> > > > >> >> since
> > >> > > > >> >> > the original discussion thread on this I have added a
> full
> > >> > > > >> "Background
> > >> > > > >> >> > Context" page to the beginning of the proposal to help
> > anyone
> > >> > who
> > >> > > > >> isn't
> > >> > > > >> >> > already familiar with the issues this protocol is trying
> > to
> > >> > solve
> > >> > > > or
> > >> > > > >> >> isn't
> > >> > > > >> >> > already familiar with ucx or libfabric transports to
> > better
> > >> > > > >> understand
> > >> > > > >> >> > *why* I'm
> > >> > > > >> >> > proposing this and what it is trying to solve. The point
> > of
> > >> > this
> > >> > > > >> >> background
> > >> > > > >> >> > information is to help ensure that anyone who might have
> > >> > thoughts
> > >> > > > on
> > >> > > > >> >> > protocols in general or APIs should still be able to
> > >> understand
> > >> > > the
> > >> > > > >> base
> > >> > > > >> >> > reasons and goals that we're trying to achieve with this
> > >> > protocol
> > >> > > > >> >> proposal.
> > >> > > > >> >> > You don't need to already understand managing GPU/device
> > >> memory
> > >> > > or
> > >> > > > >> ucx to
> > >> > > > >> >> > be able to have meaningful input on the document.
> > >> > > > >> >> >
> > >> > > > >> >> > Thanks again to all who have contributed so far and
> please
> > >> > spread
> > >> > > > to
> > >> > > > >> any
> > >> > > > >> >> > contacts that you think might be interested in this for
> > their
> > >> > > > >> particular
> > >> > > > >> >> > use cases.
> > >> > > > >> >> >
> > >> > > > >> >> > --Matt
> > >> > > > >> >> >
> > >> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> > >> > > <octalene.dev@pm.me.invalid
> > >> > > > >
> > >> > > > >> >> wrote:
> > >> > > > >> >> >
> > >> > > > >> >> > > I am interested in this as well, but I haven't gotten
> > to a
> > >> > > point
> > >> > > > >> where
> > >> > > > >> >> I
> > >> > > > >> >> > > can have valuable input (I haven't tried other
> > >> transports). I
> > >> > > > know
> > >> > > > >> of a
> > >> > > > >> >> > > third party that is interested in Arrow for HPC
> > >> environments
> > >> > > that
> > >> > > > >> could
> > >> > > > >> >> > be
> > >> > > > >> >> > > interested in the proposal and I can see if they're
> > >> > interested
> > >> > > in
> > >> > > > >> >> > providing
> > >> > > > >> >> > > feedback.
> > >> > > > >> >> > >
> > >> > > > >> >> > > I glanced at the document before but I'll go through
> > again
> > >> to
> > >> > > see
> > >> > > > >> if
> > >> > > > >> >> > there
> > >> > > > >> >> > > is anything I can comment on.
> > >> > > > >> >> > >
> > >> > > > >> >> > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > # ------------------------------
> > >> > > > >> >> > > # Aldrin
> > >> > > > >> >> > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > https://github.com/drin/
> > >> > > > >> >> > > https://gitlab.com/octalene
> > >> > > > >> >> > > https://keybase.io/octalene
> > >> > > > >> >> > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen
> <
> > >> > > > >> >> > pgwhalen@gmail.com>
> > >> > > > >> >> > > wrote:
> > >> > > > >> >> > >
> > >> > > > >> >> > > > As a potential "end user developer," (and aspiring
> > >> > > contributor)
> > >> > > > >> this
> > >> > > > >> >> > > > immediately excited me when I first saw it.
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > I work at a trading firm, and my team has developed
> an
> > >> IPC
> > >> > > > >> mechanism
> > >> > > > >> >> > for
> > >> > > > >> >> > > > efficiently transmitting pandas dataframes both
> > remotely
> > >> > via
> > >> > > > TCP
> > >> > > > >> and
> > >> > > > >> >> > > > locally via shared memory, where the interface for
> the
> > >> > > > >> application
> > >> > > > >> >> > > > developer is the same for both. The data in the
> > >> dataframes
> > >> > > may
> > >> > > > >> change
> > >> > > > >> >> > > > rapidly, so when communicating locally via shared
> > memory,
> > >> > if
> > >> > > > the
> > >> > > > >> >> shape
> > >> > > > >> >> > of
> > >> > > > >> >> > > > the dataframe doesn't change, we update the memory
> in
> > >> > place,
> > >> > > > >> >> > coordinating
> > >> > > > >> >> > > > between the producer and consumer via TCP.
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > We intend to move away from our remote TCP mechanism
> > >> > towards
> > >> > > > >> Arrow
> > >> > > > >> >> > > Flight,
> > >> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the
> > local
> > >> > > shared
> > >> > > > >> memory
> > >> > > > >> >> > > > mechanism which we previously did not have a good
> > answer
> > >> > for,
> > >> > > > it
> > >> > > > >> >> seems
> > >> > > > >> >> > > like
> > >> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our
> > problem.
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > So some features that enable our use case are:
> > >> > > > >> >> > > > - Updating existing batches in place is supported
> > >> > > > >> >> > > > - The interface is pretty similar to Flight
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > I'd imagine we're not the only financial firm to
> > >> implement
> > >> > > > >> something
> > >> > > > >> >> > like
> > >> > > > >> >> > > > this, given how widespread pandas usage is, so that
> > could
> > >> > be
> > >> > > a
> > >> > > > >> place
> > >> > > > >> >> to
> > >> > > > >> >> > > > seek feedback.
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > As I was reading the proposal initially, I gleaned
> > that
> > >> the
> > >> > > > most
> > >> > > > >> >> > > important
> > >> > > > >> >> > > > audience was those writing interfaces to GPUs/remote
> > >> > > > >> >> > memory/non-standard
> > >> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
> > >> updating
> > >> > > > >> batches in
> > >> > > > >> >> > > > place (and the producer/consumer coordination that
> > comes
> > >> > with
> > >> > > > >> that)
> > >> > > > >> >> was
> > >> > > > >> >> > > > supported or encouraged as part of the proposal. But
> > >> > > > regardless,
> > >> > > > >> as
> > >> > > > >> >> an
> > >> > > > >> >> > > end
> > >> > > > >> >> > > > user, this seems like an easier and more efficient
> > way to
> > >> > > glue
> > >> > > > >> pieces
> > >> > > > >> >> > in
> > >> > > > >> >> > > > the Arrow ecosystem together if it was adopted
> > broadly.
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > Paul
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> > >> > > > >> zotthewizard@gmail.com
> > >> > > > >> >> > wrote:
> > >> > > > >> >> > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > I'll continue my efforts of trying to reach out to
> > >> other
> > >> > > > >> interested
> > >> > > > >> >> > > > > parties, but if anyone else here has any contacts
> or
> > >> > > > >> connections
> > >> > > > >> >> that
> > >> > > > >> >> > > they
> > >> > > > >> >> > > > > think might be interested please forward them the
> > link
> > >> to
> > >> > > the
> > >> > > > >> >> Google
> > >> > > > >> >> > > doc.
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > I really do want to get as much engagement and
> > feedback
> > >> > as
> > >> > > > >> possible
> > >> > > > >> >> > on
> > >> > > > >> >> > > > > this.
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > Thanks!
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> > >> > > > wesmckinn@gmail.com
> > >> > > > >> >> > wrote:
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > Have there been efforts to proactively reach out
> > to
> > >> > other
> > >> > > > >> third
> > >> > > > >> >> > > parties
> > >> > > > >> >> > > > > > that might have an interest in this or be a
> > potential
> > >> > > user
> > >> > > > at
> > >> > > > >> >> some
> > >> > > > >> >> > > point?
> > >> > > > >> >> > > > > > There are a lot of interested parties in Arrow
> > that
> > >> may
> > >> > > not
> > >> > > > >> >> > actively
> > >> > > > >> >> > > > > > follow
> > >> > > > >> >> > > > > > the mailing list.
> > >> > > > >> >> > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
> > >> (especially
> > >> > > > >> folks at
> > >> > > > >> >> > > NVIDIA
> > >> > > > >> >> > > > > > or
> > >> > > > >> >> > > > > > working on UCX), or other communities like that
> > might
> > >> > > have
> > >> > > > >> >> > > constructive
> > >> > > > >> >> > > > > > thoughts about this. DLPack (
> > >> > > > >> >> https://dmlc.github.io/dlpack/latest/
> > >> > > > >> >> > )
> > >> > > > >> >> > > also
> > >> > > > >> >> > > > > > seems adjacent and worth reaching out to. Other
> > ideas
> > >> > for
> > >> > > > >> >> projects
> > >> > > > >> >> > or
> > >> > > > >> >> > > > > > companies that could be reached out to for
> > feedback.
> > >> > > > >> >> > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> > >> > > > >> >> antoine@python.org
> > >> > > > >> >> > > > > > wrote:
> > >> > > > >> >> > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > If there's no engagement, then I'm afraid it
> > might
> > >> > mean
> > >> > > > >> that
> > >> > > > >> >> > third
> > >> > > > >> >> > > > > > > parties have no interest in this. I don't
> really
> > >> have
> > >> > > any
> > >> > > > >> >> > solution
> > >> > > > >> >> > > for
> > >> > > > >> >> > > > > > > generating engagement except nagging and
> pinging
> > >> > people
> > >> > > > >> >> > explicitly
> > >> > > > >> >> > > :-)
> > >> > > > >> >> > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > >> > > > >> >> > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > I would like to see the same Antoine,
> > currently
> > >> > given
> > >> > > > the
> > >> > > > >> >> lack
> > >> > > > >> >> > of
> > >> > > > >> >> > > > > > > > engagement (both for OR against) I was going
> > to
> > >> > take
> > >> > > > the
> > >> > > > >> >> > silence
> > >> > > > >> >> > > as
> > >> > > > >> >> > > > > > > > assent
> > >> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to
> > vote
> > >> > in
> > >> > > > >> this.
> > >> > > > >> >> > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > If anyone has any suggestions on how we
> could
> > >> > > > potentially
> > >> > > > >> >> > > generate
> > >> > > > >> >> > > > > > > > more
> > >> > > > >> >> > > > > > > > engagement and discussion on this, please
> let
> > me
> > >> > know
> > >> > > > as
> > >> > > > >> I
> > >> > > > >> >> want
> > >> > > > >> >> > > as
> > >> > > > >> >> > > > > > > > many
> > >> > > > >> >> > > > > > > > parties in the community as possible to be
> > part
> > >> of
> > >> > > > this.
> > >> > > > >> >> > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > Thanks everyone.
> > >> > > > >> >> > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > --Matt
> > >> > > > >> >> > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine
> > Pitrou
> > >> > > > >> >> > > antoine@python.org
> > >> > > > >> >> > > > > > > > wrote:
> > >> > > > >> >> > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > Hello,
> > >> > > > >> >> > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > I'd really like to see more engagement and
> > >> > > criticism
> > >> > > > >> from
> > >> > > > >> >> > > > > > > > > non-Voltron
> > >> > > > >> >> > > > > > > > > Data parties before this is formally
> > adopted as
> > >> > an
> > >> > > > >> Arrow
> > >> > > > >> >> > spec.
> > >> > > > >> >> > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > Regards
> > >> > > > >> >> > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > Antoine.
> > >> > > > >> >> > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit
> :
> > >> > > > >> >> > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > Hey all,
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
> > >> officially
> > >> > > > >> adopt the
> > >> > > > >> >> > > protocol
> > >> > > > >> >> > > > > > > > > > described in the google doc[1] for
> > >> Dissociated
> > >> > > > Arrow
> > >> > > > >> IPC
> > >> > > > >> >> > > > > > > > > > Transports.
> > >> > > > >> >> > > > > > > > > > This
> > >> > > > >> >> > > > > > > > > > proposal was originally discussed at 2.
> > Once
> > >> > this
> > >> > > > >> >> proposal
> > >> > > > >> >> > is
> > >> > > > >> >> > > > > > > > > > adopted,
> > >> > > > >> >> > > > > > > > > > I
> > >> > > > >> >> > > > > > > > > > will work on adding the necessary
> > >> documentation
> > >> > > to
> > >> > > > >> the
> > >> > > > >> >> > Arrow
> > >> > > > >> >> > > > > > > > > > website
> > >> > > > >> >> > > > > > > > > > along
> > >> > > > >> >> > > > > > > > > > with examples etc.
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > The vote will be open for at least 72
> > hours.
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> > >> > > > >> >> > > > > > > > > > [ ] +0
> > >> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal
> > because...
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > Thank you everyone!
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > --Matt
> > >> > > > >> >> > > > > > > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > > > > > > > [1]:
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> > > > >
> > >> > > > >> >> > >
> > >> > > > >> >> >
> > >> > > > >> >>
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> > >> > > > >> >> >
> > >> > > > >> >>
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Joel Lubinitsky <jo...@gmail.com>.
+1 to the dissociated transports proposal

On Sun, Mar 31, 2024 at 11:14 AM David Li <li...@apache.org> wrote:

> +1 from me as before
>
> On Thu, Mar 28, 2024, at 18:06, Matt Topol wrote:
> >>  There is a word doc with no implementation or PR.  I think there could
> > be an implementation / PR.
> >
> > In the word doc there is a link to a POC implementation[1] showing this
> > protocol working with a flight service, ucx and libcudf. The key piece
> here
> > is that we're voting on adopting this protocol spec (i.e. I'll add it to
> > the documentation website) rather than us explicitly providing full
> > implementations or abstractions around it. We can provide reference
> > implementations like the POC, but I don't think they should be in the
> Arrow
> > monorepo or else we run the risk of a lot of the same issues that Flight
> > has: i.e. Adding anything to Flight in C++ requires fully wrapping the
> > grpc/flight primitives with Arrow equivalents to export which increases
> the
> > maintenance burden on us and makes it more difficult for users to
> leverage
> > the underlying knobs and dials.
> >
> >> For example, does any ADBC client respect this protocol today?  If a
> > flight server responds with an S3/HTTP URI will the ADBC client download
> > the files from the correct place?  Will it at least notice that the URI
> is
> > not a GRPC URI and give a "I don't have a connector for downloading from
> > HTTP/S3" error?
> >
> > I've split the S3/HTTP URI flight pieces out into a separate document and
> > separate thing to vote on at the request of several people who wanted to
> > view these as two separate proposals to vote on. So this vote *only*
> covers
> > adopting the protocol spec as an "Experimental Protocol" so we can start
> > seeing real world usage to help refine and improve it. That said, I
> believe
> > all clients currently would reject any non-grpc URI.
> >
> >>   I was speaking with someone yesterday and they explained that
> > they ended up not choosing Flight for an internal project because Flight
> > didn't support something called "cloud fetch" which I have now learned is
> >
> > I was reading through that link, and it seems like it's pretty much
> > *identical* to Flight as it currently exists, except that it is using
> cloud
> > storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather than a
> > service sitting in front of those serving up Arrow IPC *streams*. Which
> has
> > been requested by others in the community, hence the second proposal that
> > was split out [2].
> >
> >>  So a big +1 for the idea of disassociated transports but I'm not sure
> why
> > we need a vote to start working on it (but I'm not opposed if a vote
> helps)
> >
> > Mostly I found that the google doc was easier for iterating on the
> protocol
> > specification than a markdown PR for the Arrow documentation as I could
> > more visually express things without a preview of the rendered markdown.
> If
> > it would get people to be more likely to vote on this, I can write up the
> > documentation markdown now and create a PR rather than waiting until we
> > decide we're even going to adopt this protocol as an "official" arrow
> > protocol.
> >
> > Lemme know if there's any other unanswered questions!
> >
> > --Matt
> >
> > [1]: https://github.com/zeroshade/cudf-flight-ucx
> > [2]:
> >
> https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq
> >
> > On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com>
> wrote:
> >
> >> I'm sorry for the very late reply.  Until yesterday I had no real
> concept
> >> of what this was talking about and so I had stayed out.
> >>
> >> I'm +0 only because it isn't clear what we are voting on.  There is a
> word
> >> doc with no implementation or PR.  I think there could be an
> implementation
> >> / PR.  For example, does any ADBC client respect this protocol today?
> If a
> >> flight server responds with an S3/HTTP URI will the ADBC client download
> >> the files from the correct place?  Will it at least notice that the URI
> is
> >> not a GRPC URI and give a "I don't have a connector for downloading from
> >> HTTP/S3" error?  In general, I think we do want this in Flight (see
> >> comments below) and I am very supportive of the idea.  However, if
> adopting
> >> this as an experimental proposal helps move this forward then I think
> >> that's fine.
> >>
> >> That being said, I do want to express support for the proposal as a
> >> concept, at least the "disassociated transports" portion (I can't speak
> to
> >> UCX/etc.).  I was speaking with someone yesterday and they explained
> that
> >> they ended up not choosing Flight for an internal project because Flight
> >> didn't support something called "cloud fetch" which I have now learned
> is
> >> [1].  I had recalled looking at this proposal before and this person
> seemed
> >> interested and optimistic to know this was being considered for Flight.
> >> This proposal, as I understand it, should make it possible for cloud
> >> servers to support a cloud fetch style API.  From the discussion I got
> the
> >> impression that this cloud fetch approach is useful and generally
> >> applicable.
> >>
> >> So a big +1 for the idea of disassociated transports but I'm not sure
> why
> >> we need a vote to start working on it (but I'm not opposed if a vote
> helps)
> >>
> >> [1]
> >>
> >>
> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
> >>
> >> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com>
> wrote:
> >>
> >> > I'll keep this new vote open for at least the next 72 hours. As before
> >> > please reply with:
> >> >
> >> > [ ] +1 Accept this Proposal
> >> > [ ] +0
> >> > [ ] -1 Do not accept this proposal because...
> >> >
> >> > Thanks everyone!
> >> >
> >> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <
> bengilgit@gmail.com>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com>
> wrote:
> >> > >
> >> > > > Should I start a new thread for a new vote? Or repeat the original
> >> vote
> >> > > > email here?
> >> > > >
> >> > > > Just asking since there hasn't been any responses so far.
> >> > > >
> >> > > > --Matt
> >> > > >
> >> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <
> zotthewizard@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Absolutely, it will be marked experimental until we see some
> people
> >> > > using
> >> > > > > it and can get more real-world feedback.
> >> > > > >
> >> > > > > There's also already a couple things that will be followed-up on
> >> > after
> >> > > > the
> >> > > > > initial adoption for expansion which were discussed in the
> >> comments.
> >> > > > >
> >> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org>
> >> wrote:
> >> > > > >
> >> > > > >> I think let's try again. Would it be reasonable to declare this
> >> > > > >> 'experimental' for the time being, just as we did with
> >> Flight/Flight
> >> > > > >> SQL/etc?
> >> > > > >>
> >> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> >> > > > >> > Hey All, It's been another month and we've gotten a whole
> bunch
> >> of
> >> > > > >> feedback
> >> > > > >> > and engagement on the document from a variety of individuals.
> >> > Myself
> >> > > > >> and a
> >> > > > >> > few others have proactively attempted to reach out to as many
> >> > third
> >> > > > >> parties
> >> > > > >> > as we could, hoping to pull more engagement also. While it
> would
> >> > be
> >> > > > >> great
> >> > > > >> > to get even more feedback, the comments have slowed down and
> we
> >> > > > haven't
> >> > > > >> > gotten anything in a few days at this point.
> >> > > > >> >
> >> > > > >> > If there's no objections, I'd like to try to open up for
> voting
> >> > > again
> >> > > > to
> >> > > > >> > officially adopt this as a protocol to add to our docs.
> >> > > > >> >
> >> > > > >> > Thanks all!
> >> > > > >> >
> >> > > > >> > --Matt
> >> > > > >> >
> >> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <
> pgwhalen@gmail.com>
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> >> Agreed that it makes sense not to focus on in-place updating
> >> for
> >> > > this
> >> > > > >> >> proposal.  I’m not even sure it’s a great fit as a “general
> >> > > purpose”
> >> > > > >> Arrow
> >> > > > >> >> protocol, because of all the assumptions and restrictions
> >> > required
> >> > > as
> >> > > > >> you
> >> > > > >> >> noted.
> >> > > > >> >>
> >> > > > >> >> I took another look at the proposal and don’t think there’s
> >> > > anything
> >> > > > >> >> preventing in-place updating in the future - ultimately the
> >> data
> >> > > body
> >> > > > >> could
> >> > > > >> >> just be in the same location for subsequent messages.
> >> > > > >> >>
> >> > > > >> >> Thanks!
> >> > > > >> >> Paul
> >> > > > >> >>
> >> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
> >> > zotthewizard@gmail.com>
> >> > > > >> wrote:
> >> > > > >> >>
> >> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and
> >> aspiring
> >> > > > >> >> > contributor) this
> >> > > > >> >> > immediately excited me when I first saw it.
> >> > > > >> >> >
> >> > > > >> >> > Yay! Good to hear that!
> >> > > > >> >> >
> >> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating
> >> batches
> >> > > in
> >> > > > >> >> > place (and the producer/consumer coordination that comes
> with
> >> > > that)
> >> > > > >> was
> >> > > > >> >> > supported or encouraged as part of the proposal.
> >> > > > >> >> >
> >> > > > >> >> > So, updating batches in place was not a particular
> use-case
> >> we
> >> > > were
> >> > > > >> >> > targeting with this approach. Instead using shared memory
> to
> >> > > > produce
> >> > > > >> and
> >> > > > >> >> > consume the buffers/batches without having to physically
> copy
> >> > the
> >> > > > >> data.
> >> > > > >> >> > Trying to update a batch in place is a dangerous prospect
> >> for a
> >> > > > >> number of
> >> > > > >> >> > reasons, but as you've mentioned it can technically be
> made
> >> > safe
> >> > > if
> >> > > > >> the
> >> > > > >> >> > shape is staying the same and you're only modifying
> >> fixed-width
> >> > > > data
> >> > > > >> >> types
> >> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of
> the
> >> > > > >> underlying
> >> > > > >> >> > data buffers are also remaining unchanged). The
> >> > producer/consumer
> >> > > > >> >> > coordination that would be needed for updating batches in
> >> place
> >> > > is
> >> > > > >> not
> >> > > > >> >> part
> >> > > > >> >> > of this proposal but is definitely something we can look
> into
> >> > as
> >> > > a
> >> > > > >> >> > follow-up to this for extending it. There's a number of
> >> > > discussions
> >> > > > >> that
> >> > > > >> >> > would need to be had around that so I don't want to add on
> >> > > another
> >> > > > >> >> > complexity to this already complex proposal.
> >> > > > >> >> >
> >> > > > >> >> > That said, if you or anyone see something in this proposal
> >> that
> >> > > > would
> >> > > > >> >> > hinder or prevent being able to use it for your use case
> >> please
> >> > > let
> >> > > > >> me
> >> > > > >> >> know
> >> > > > >> >> > so we can address it. Even though the proposal as it
> >> currently
> >> > > > exists
> >> > > > >> >> > doesn't fully support the in-place updating of batches, I
> >> don't
> >> > > > want
> >> > > > >> to
> >> > > > >> >> > make things harder for us in such a follow-up where we'd
> end
> >> up
> >> > > > >> requiring
> >> > > > >> >> > an entirely new protocol to support that.
> >> > > > >> >> >
> >> > > > >> >> > > @octalene.dev: I know of a third party that is
> interested
> >> in
> >> > > > >> Arrow for
> >> > > > >> >> > HPC environments that could be interested in the proposal
> >> and I
> >> > > can
> >> > > > >> see
> >> > > > >> >> if
> >> > > > >> >> > they're interested in providing feedback.
> >> > > > >> >> >
> >> > > > >> >> > Awesome! Thanks much!
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > For reference to anyone who hasn't looked at the document
> in
> >> a
> >> > > > while,
> >> > > > >> >> since
> >> > > > >> >> > the original discussion thread on this I have added a full
> >> > > > >> "Background
> >> > > > >> >> > Context" page to the beginning of the proposal to help
> anyone
> >> > who
> >> > > > >> isn't
> >> > > > >> >> > already familiar with the issues this protocol is trying
> to
> >> > solve
> >> > > > or
> >> > > > >> >> isn't
> >> > > > >> >> > already familiar with ucx or libfabric transports to
> better
> >> > > > >> understand
> >> > > > >> >> > *why* I'm
> >> > > > >> >> > proposing this and what it is trying to solve. The point
> of
> >> > this
> >> > > > >> >> background
> >> > > > >> >> > information is to help ensure that anyone who might have
> >> > thoughts
> >> > > > on
> >> > > > >> >> > protocols in general or APIs should still be able to
> >> understand
> >> > > the
> >> > > > >> base
> >> > > > >> >> > reasons and goals that we're trying to achieve with this
> >> > protocol
> >> > > > >> >> proposal.
> >> > > > >> >> > You don't need to already understand managing GPU/device
> >> memory
> >> > > or
> >> > > > >> ucx to
> >> > > > >> >> > be able to have meaningful input on the document.
> >> > > > >> >> >
> >> > > > >> >> > Thanks again to all who have contributed so far and please
> >> > spread
> >> > > > to
> >> > > > >> any
> >> > > > >> >> > contacts that you think might be interested in this for
> their
> >> > > > >> particular
> >> > > > >> >> > use cases.
> >> > > > >> >> >
> >> > > > >> >> > --Matt
> >> > > > >> >> >
> >> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> >> > > <octalene.dev@pm.me.invalid
> >> > > > >
> >> > > > >> >> wrote:
> >> > > > >> >> >
> >> > > > >> >> > > I am interested in this as well, but I haven't gotten
> to a
> >> > > point
> >> > > > >> where
> >> > > > >> >> I
> >> > > > >> >> > > can have valuable input (I haven't tried other
> >> transports). I
> >> > > > know
> >> > > > >> of a
> >> > > > >> >> > > third party that is interested in Arrow for HPC
> >> environments
> >> > > that
> >> > > > >> could
> >> > > > >> >> > be
> >> > > > >> >> > > interested in the proposal and I can see if they're
> >> > interested
> >> > > in
> >> > > > >> >> > providing
> >> > > > >> >> > > feedback.
> >> > > > >> >> > >
> >> > > > >> >> > > I glanced at the document before but I'll go through
> again
> >> to
> >> > > see
> >> > > > >> if
> >> > > > >> >> > there
> >> > > > >> >> > > is anything I can comment on.
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > > # ------------------------------
> >> > > > >> >> > > # Aldrin
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > > https://github.com/drin/
> >> > > > >> >> > > https://gitlab.com/octalene
> >> > > > >> >> > > https://keybase.io/octalene
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> >> > > > >> >> > pgwhalen@gmail.com>
> >> > > > >> >> > > wrote:
> >> > > > >> >> > >
> >> > > > >> >> > > > As a potential "end user developer," (and aspiring
> >> > > contributor)
> >> > > > >> this
> >> > > > >> >> > > > immediately excited me when I first saw it.
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > I work at a trading firm, and my team has developed an
> >> IPC
> >> > > > >> mechanism
> >> > > > >> >> > for
> >> > > > >> >> > > > efficiently transmitting pandas dataframes both
> remotely
> >> > via
> >> > > > TCP
> >> > > > >> and
> >> > > > >> >> > > > locally via shared memory, where the interface for the
> >> > > > >> application
> >> > > > >> >> > > > developer is the same for both. The data in the
> >> dataframes
> >> > > may
> >> > > > >> change
> >> > > > >> >> > > > rapidly, so when communicating locally via shared
> memory,
> >> > if
> >> > > > the
> >> > > > >> >> shape
> >> > > > >> >> > of
> >> > > > >> >> > > > the dataframe doesn't change, we update the memory in
> >> > place,
> >> > > > >> >> > coordinating
> >> > > > >> >> > > > between the producer and consumer via TCP.
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > We intend to move away from our remote TCP mechanism
> >> > towards
> >> > > > >> Arrow
> >> > > > >> >> > > Flight,
> >> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the
> local
> >> > > shared
> >> > > > >> memory
> >> > > > >> >> > > > mechanism which we previously did not have a good
> answer
> >> > for,
> >> > > > it
> >> > > > >> >> seems
> >> > > > >> >> > > like
> >> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our
> problem.
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > So some features that enable our use case are:
> >> > > > >> >> > > > - Updating existing batches in place is supported
> >> > > > >> >> > > > - The interface is pretty similar to Flight
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > I'd imagine we're not the only financial firm to
> >> implement
> >> > > > >> something
> >> > > > >> >> > like
> >> > > > >> >> > > > this, given how widespread pandas usage is, so that
> could
> >> > be
> >> > > a
> >> > > > >> place
> >> > > > >> >> to
> >> > > > >> >> > > > seek feedback.
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > As I was reading the proposal initially, I gleaned
> that
> >> the
> >> > > > most
> >> > > > >> >> > > important
> >> > > > >> >> > > > audience was those writing interfaces to GPUs/remote
> >> > > > >> >> > memory/non-standard
> >> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
> >> updating
> >> > > > >> batches in
> >> > > > >> >> > > > place (and the producer/consumer coordination that
> comes
> >> > with
> >> > > > >> that)
> >> > > > >> >> was
> >> > > > >> >> > > > supported or encouraged as part of the proposal. But
> >> > > > regardless,
> >> > > > >> as
> >> > > > >> >> an
> >> > > > >> >> > > end
> >> > > > >> >> > > > user, this seems like an easier and more efficient
> way to
> >> > > glue
> >> > > > >> pieces
> >> > > > >> >> > in
> >> > > > >> >> > > > the Arrow ecosystem together if it was adopted
> broadly.
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > Paul
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> >> > > > >> zotthewizard@gmail.com
> >> > > > >> >> > wrote:
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > I'll continue my efforts of trying to reach out to
> >> other
> >> > > > >> interested
> >> > > > >> >> > > > > parties, but if anyone else here has any contacts or
> >> > > > >> connections
> >> > > > >> >> that
> >> > > > >> >> > > they
> >> > > > >> >> > > > > think might be interested please forward them the
> link
> >> to
> >> > > the
> >> > > > >> >> Google
> >> > > > >> >> > > doc.
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > I really do want to get as much engagement and
> feedback
> >> > as
> >> > > > >> possible
> >> > > > >> >> > on
> >> > > > >> >> > > > > this.
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > Thanks!
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> >> > > > wesmckinn@gmail.com
> >> > > > >> >> > wrote:
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > Have there been efforts to proactively reach out
> to
> >> > other
> >> > > > >> third
> >> > > > >> >> > > parties
> >> > > > >> >> > > > > > that might have an interest in this or be a
> potential
> >> > > user
> >> > > > at
> >> > > > >> >> some
> >> > > > >> >> > > point?
> >> > > > >> >> > > > > > There are a lot of interested parties in Arrow
> that
> >> may
> >> > > not
> >> > > > >> >> > actively
> >> > > > >> >> > > > > > follow
> >> > > > >> >> > > > > > the mailing list.
> >> > > > >> >> > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
> >> (especially
> >> > > > >> folks at
> >> > > > >> >> > > NVIDIA
> >> > > > >> >> > > > > > or
> >> > > > >> >> > > > > > working on UCX), or other communities like that
> might
> >> > > have
> >> > > > >> >> > > constructive
> >> > > > >> >> > > > > > thoughts about this. DLPack (
> >> > > > >> >> https://dmlc.github.io/dlpack/latest/
> >> > > > >> >> > )
> >> > > > >> >> > > also
> >> > > > >> >> > > > > > seems adjacent and worth reaching out to. Other
> ideas
> >> > for
> >> > > > >> >> projects
> >> > > > >> >> > or
> >> > > > >> >> > > > > > companies that could be reached out to for
> feedback.
> >> > > > >> >> > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> >> > > > >> >> antoine@python.org
> >> > > > >> >> > > > > > wrote:
> >> > > > >> >> > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > If there's no engagement, then I'm afraid it
> might
> >> > mean
> >> > > > >> that
> >> > > > >> >> > third
> >> > > > >> >> > > > > > > parties have no interest in this. I don't really
> >> have
> >> > > any
> >> > > > >> >> > solution
> >> > > > >> >> > > for
> >> > > > >> >> > > > > > > generating engagement except nagging and pinging
> >> > people
> >> > > > >> >> > explicitly
> >> > > > >> >> > > :-)
> >> > > > >> >> > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> >> > > > >> >> > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > I would like to see the same Antoine,
> currently
> >> > given
> >> > > > the
> >> > > > >> >> lack
> >> > > > >> >> > of
> >> > > > >> >> > > > > > > > engagement (both for OR against) I was going
> to
> >> > take
> >> > > > the
> >> > > > >> >> > silence
> >> > > > >> >> > > as
> >> > > > >> >> > > > > > > > assent
> >> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to
> vote
> >> > in
> >> > > > >> this.
> >> > > > >> >> > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > If anyone has any suggestions on how we could
> >> > > > potentially
> >> > > > >> >> > > generate
> >> > > > >> >> > > > > > > > more
> >> > > > >> >> > > > > > > > engagement and discussion on this, please let
> me
> >> > know
> >> > > > as
> >> > > > >> I
> >> > > > >> >> want
> >> > > > >> >> > > as
> >> > > > >> >> > > > > > > > many
> >> > > > >> >> > > > > > > > parties in the community as possible to be
> part
> >> of
> >> > > > this.
> >> > > > >> >> > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > Thanks everyone.
> >> > > > >> >> > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > --Matt
> >> > > > >> >> > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine
> Pitrou
> >> > > > >> >> > > antoine@python.org
> >> > > > >> >> > > > > > > > wrote:
> >> > > > >> >> > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > Hello,
> >> > > > >> >> > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > I'd really like to see more engagement and
> >> > > criticism
> >> > > > >> from
> >> > > > >> >> > > > > > > > > non-Voltron
> >> > > > >> >> > > > > > > > > Data parties before this is formally
> adopted as
> >> > an
> >> > > > >> Arrow
> >> > > > >> >> > spec.
> >> > > > >> >> > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > Regards
> >> > > > >> >> > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > Antoine.
> >> > > > >> >> > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> >> > > > >> >> > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > Hey all,
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
> >> officially
> >> > > > >> adopt the
> >> > > > >> >> > > protocol
> >> > > > >> >> > > > > > > > > > described in the google doc[1] for
> >> Dissociated
> >> > > > Arrow
> >> > > > >> IPC
> >> > > > >> >> > > > > > > > > > Transports.
> >> > > > >> >> > > > > > > > > > This
> >> > > > >> >> > > > > > > > > > proposal was originally discussed at 2.
> Once
> >> > this
> >> > > > >> >> proposal
> >> > > > >> >> > is
> >> > > > >> >> > > > > > > > > > adopted,
> >> > > > >> >> > > > > > > > > > I
> >> > > > >> >> > > > > > > > > > will work on adding the necessary
> >> documentation
> >> > > to
> >> > > > >> the
> >> > > > >> >> > Arrow
> >> > > > >> >> > > > > > > > > > website
> >> > > > >> >> > > > > > > > > > along
> >> > > > >> >> > > > > > > > > > with examples etc.
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > The vote will be open for at least 72
> hours.
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> >> > > > >> >> > > > > > > > > > [ ] +0
> >> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal
> because...
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > Thank you everyone!
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > --Matt
> >> > > > >> >> > > > > > > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > > > > > > > [1]:
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> > > > >
> >> > > > >> >> > >
> >> > > > >> >> >
> >> > > > >> >>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >> > > > >> >> >
> >> > > > >> >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by David Li <li...@apache.org>.
+1 from me as before

On Thu, Mar 28, 2024, at 18:06, Matt Topol wrote:
>>  There is a word doc with no implementation or PR.  I think there could
> be an implementation / PR.
>
> In the word doc there is a link to a POC implementation[1] showing this
> protocol working with a flight service, ucx and libcudf. The key piece here
> is that we're voting on adopting this protocol spec (i.e. I'll add it to
> the documentation website) rather than us explicitly providing full
> implementations or abstractions around it. We can provide reference
> implementations like the POC, but I don't think they should be in the Arrow
> monorepo or else we run the risk of a lot of the same issues that Flight
> has: i.e. Adding anything to Flight in C++ requires fully wrapping the
> grpc/flight primitives with Arrow equivalents to export which increases the
> maintenance burden on us and makes it more difficult for users to leverage
> the underlying knobs and dials.
>
>> For example, does any ADBC client respect this protocol today?  If a
> flight server responds with an S3/HTTP URI will the ADBC client download
> the files from the correct place?  Will it at least notice that the URI is
> not a GRPC URI and give a "I don't have a connector for downloading from
> HTTP/S3" error?
>
> I've split the S3/HTTP URI flight pieces out into a separate document and
> separate thing to vote on at the request of several people who wanted to
> view these as two separate proposals to vote on. So this vote *only* covers
> adopting the protocol spec as an "Experimental Protocol" so we can start
> seeing real world usage to help refine and improve it. That said, I believe
> all clients currently would reject any non-grpc URI.
>
>>   I was speaking with someone yesterday and they explained that
> they ended up not choosing Flight for an internal project because Flight
> didn't support something called "cloud fetch" which I have now learned is
>
> I was reading through that link, and it seems like it's pretty much
> *identical* to Flight as it currently exists, except that it is using cloud
> storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather than a
> service sitting in front of those serving up Arrow IPC *streams*. Which has
> been requested by others in the community, hence the second proposal that
> was split out [2].
>
>>  So a big +1 for the idea of disassociated transports but I'm not sure why
> we need a vote to start working on it (but I'm not opposed if a vote helps)
>
> Mostly I found that the google doc was easier for iterating on the protocol
> specification than a markdown PR for the Arrow documentation as I could
> more visually express things without a preview of the rendered markdown. If
> it would get people to be more likely to vote on this, I can write up the
> documentation markdown now and create a PR rather than waiting until we
> decide we're even going to adopt this protocol as an "official" arrow
> protocol.
>
> Lemme know if there's any other unanswered questions!
>
> --Matt
>
> [1]: https://github.com/zeroshade/cudf-flight-ucx
> [2]:
> https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq
>
> On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com> wrote:
>
>> I'm sorry for the very late reply.  Until yesterday I had no real concept
>> of what this was talking about and so I had stayed out.
>>
>> I'm +0 only because it isn't clear what we are voting on.  There is a word
>> doc with no implementation or PR.  I think there could be an implementation
>> / PR.  For example, does any ADBC client respect this protocol today?  If a
>> flight server responds with an S3/HTTP URI will the ADBC client download
>> the files from the correct place?  Will it at least notice that the URI is
>> not a GRPC URI and give a "I don't have a connector for downloading from
>> HTTP/S3" error?  In general, I think we do want this in Flight (see
>> comments below) and I am very supportive of the idea.  However, if adopting
>> this as an experimental proposal helps move this forward then I think
>> that's fine.
>>
>> That being said, I do want to express support for the proposal as a
>> concept, at least the "disassociated transports" portion (I can't speak to
>> UCX/etc.).  I was speaking with someone yesterday and they explained that
>> they ended up not choosing Flight for an internal project because Flight
>> didn't support something called "cloud fetch" which I have now learned is
>> [1].  I had recalled looking at this proposal before and this person seemed
>> interested and optimistic to know this was being considered for Flight.
>> This proposal, as I understand it, should make it possible for cloud
>> servers to support a cloud fetch style API.  From the discussion I got the
>> impression that this cloud fetch approach is useful and generally
>> applicable.
>>
>> So a big +1 for the idea of disassociated transports but I'm not sure why
>> we need a vote to start working on it (but I'm not opposed if a vote helps)
>>
>> [1]
>>
>> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
>>
>> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com> wrote:
>>
>> > I'll keep this new vote open for at least the next 72 hours. As before
>> > please reply with:
>> >
>> > [ ] +1 Accept this Proposal
>> > [ ] +0
>> > [ ] -1 Do not accept this proposal because...
>> >
>> > Thanks everyone!
>> >
>> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <be...@gmail.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com> wrote:
>> > >
>> > > > Should I start a new thread for a new vote? Or repeat the original
>> vote
>> > > > email here?
>> > > >
>> > > > Just asking since there hasn't been any responses so far.
>> > > >
>> > > > --Matt
>> > > >
>> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Absolutely, it will be marked experimental until we see some people
>> > > using
>> > > > > it and can get more real-world feedback.
>> > > > >
>> > > > > There's also already a couple things that will be followed-up on
>> > after
>> > > > the
>> > > > > initial adoption for expansion which were discussed in the
>> comments.
>> > > > >
>> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org>
>> wrote:
>> > > > >
>> > > > >> I think let's try again. Would it be reasonable to declare this
>> > > > >> 'experimental' for the time being, just as we did with
>> Flight/Flight
>> > > > >> SQL/etc?
>> > > > >>
>> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
>> > > > >> > Hey All, It's been another month and we've gotten a whole bunch
>> of
>> > > > >> feedback
>> > > > >> > and engagement on the document from a variety of individuals.
>> > Myself
>> > > > >> and a
>> > > > >> > few others have proactively attempted to reach out to as many
>> > third
>> > > > >> parties
>> > > > >> > as we could, hoping to pull more engagement also. While it would
>> > be
>> > > > >> great
>> > > > >> > to get even more feedback, the comments have slowed down and we
>> > > > haven't
>> > > > >> > gotten anything in a few days at this point.
>> > > > >> >
>> > > > >> > If there's no objections, I'd like to try to open up for voting
>> > > again
>> > > > to
>> > > > >> > officially adopt this as a protocol to add to our docs.
>> > > > >> >
>> > > > >> > Thanks all!
>> > > > >> >
>> > > > >> > --Matt
>> > > > >> >
>> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com>
>> > > > wrote:
>> > > > >> >
>> > > > >> >> Agreed that it makes sense not to focus on in-place updating
>> for
>> > > this
>> > > > >> >> proposal.  I’m not even sure it’s a great fit as a “general
>> > > purpose”
>> > > > >> Arrow
>> > > > >> >> protocol, because of all the assumptions and restrictions
>> > required
>> > > as
>> > > > >> you
>> > > > >> >> noted.
>> > > > >> >>
>> > > > >> >> I took another look at the proposal and don’t think there’s
>> > > anything
>> > > > >> >> preventing in-place updating in the future - ultimately the
>> data
>> > > body
>> > > > >> could
>> > > > >> >> just be in the same location for subsequent messages.
>> > > > >> >>
>> > > > >> >> Thanks!
>> > > > >> >> Paul
>> > > > >> >>
>> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
>> > zotthewizard@gmail.com>
>> > > > >> wrote:
>> > > > >> >>
>> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and
>> aspiring
>> > > > >> >> > contributor) this
>> > > > >> >> > immediately excited me when I first saw it.
>> > > > >> >> >
>> > > > >> >> > Yay! Good to hear that!
>> > > > >> >> >
>> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating
>> batches
>> > > in
>> > > > >> >> > place (and the producer/consumer coordination that comes with
>> > > that)
>> > > > >> was
>> > > > >> >> > supported or encouraged as part of the proposal.
>> > > > >> >> >
>> > > > >> >> > So, updating batches in place was not a particular use-case
>> we
>> > > were
>> > > > >> >> > targeting with this approach. Instead using shared memory to
>> > > > produce
>> > > > >> and
>> > > > >> >> > consume the buffers/batches without having to physically copy
>> > the
>> > > > >> data.
>> > > > >> >> > Trying to update a batch in place is a dangerous prospect
>> for a
>> > > > >> number of
>> > > > >> >> > reasons, but as you've mentioned it can technically be made
>> > safe
>> > > if
>> > > > >> the
>> > > > >> >> > shape is staying the same and you're only modifying
>> fixed-width
>> > > > data
>> > > > >> >> types
>> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of the
>> > > > >> underlying
>> > > > >> >> > data buffers are also remaining unchanged). The
>> > producer/consumer
>> > > > >> >> > coordination that would be needed for updating batches in
>> place
>> > > is
>> > > > >> not
>> > > > >> >> part
>> > > > >> >> > of this proposal but is definitely something we can look into
>> > as
>> > > a
>> > > > >> >> > follow-up to this for extending it. There's a number of
>> > > discussions
>> > > > >> that
>> > > > >> >> > would need to be had around that so I don't want to add on
>> > > another
>> > > > >> >> > complexity to this already complex proposal.
>> > > > >> >> >
>> > > > >> >> > That said, if you or anyone see something in this proposal
>> that
>> > > > would
>> > > > >> >> > hinder or prevent being able to use it for your use case
>> please
>> > > let
>> > > > >> me
>> > > > >> >> know
>> > > > >> >> > so we can address it. Even though the proposal as it
>> currently
>> > > > exists
>> > > > >> >> > doesn't fully support the in-place updating of batches, I
>> don't
>> > > > want
>> > > > >> to
>> > > > >> >> > make things harder for us in such a follow-up where we'd end
>> up
>> > > > >> requiring
>> > > > >> >> > an entirely new protocol to support that.
>> > > > >> >> >
>> > > > >> >> > > @octalene.dev: I know of a third party that is interested
>> in
>> > > > >> Arrow for
>> > > > >> >> > HPC environments that could be interested in the proposal
>> and I
>> > > can
>> > > > >> see
>> > > > >> >> if
>> > > > >> >> > they're interested in providing feedback.
>> > > > >> >> >
>> > > > >> >> > Awesome! Thanks much!
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > For reference to anyone who hasn't looked at the document in
>> a
>> > > > while,
>> > > > >> >> since
>> > > > >> >> > the original discussion thread on this I have added a full
>> > > > >> "Background
>> > > > >> >> > Context" page to the beginning of the proposal to help anyone
>> > who
>> > > > >> isn't
>> > > > >> >> > already familiar with the issues this protocol is trying to
>> > solve
>> > > > or
>> > > > >> >> isn't
>> > > > >> >> > already familiar with ucx or libfabric transports to better
>> > > > >> understand
>> > > > >> >> > *why* I'm
>> > > > >> >> > proposing this and what it is trying to solve. The point of
>> > this
>> > > > >> >> background
>> > > > >> >> > information is to help ensure that anyone who might have
>> > thoughts
>> > > > on
>> > > > >> >> > protocols in general or APIs should still be able to
>> understand
>> > > the
>> > > > >> base
>> > > > >> >> > reasons and goals that we're trying to achieve with this
>> > protocol
>> > > > >> >> proposal.
>> > > > >> >> > You don't need to already understand managing GPU/device
>> memory
>> > > or
>> > > > >> ucx to
>> > > > >> >> > be able to have meaningful input on the document.
>> > > > >> >> >
>> > > > >> >> > Thanks again to all who have contributed so far and please
>> > spread
>> > > > to
>> > > > >> any
>> > > > >> >> > contacts that you think might be interested in this for their
>> > > > >> particular
>> > > > >> >> > use cases.
>> > > > >> >> >
>> > > > >> >> > --Matt
>> > > > >> >> >
>> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
>> > > <octalene.dev@pm.me.invalid
>> > > > >
>> > > > >> >> wrote:
>> > > > >> >> >
>> > > > >> >> > > I am interested in this as well, but I haven't gotten to a
>> > > point
>> > > > >> where
>> > > > >> >> I
>> > > > >> >> > > can have valuable input (I haven't tried other
>> transports). I
>> > > > know
>> > > > >> of a
>> > > > >> >> > > third party that is interested in Arrow for HPC
>> environments
>> > > that
>> > > > >> could
>> > > > >> >> > be
>> > > > >> >> > > interested in the proposal and I can see if they're
>> > interested
>> > > in
>> > > > >> >> > providing
>> > > > >> >> > > feedback.
>> > > > >> >> > >
>> > > > >> >> > > I glanced at the document before but I'll go through again
>> to
>> > > see
>> > > > >> if
>> > > > >> >> > there
>> > > > >> >> > > is anything I can comment on.
>> > > > >> >> > >
>> > > > >> >> > >
>> > > > >> >> > >
>> > > > >> >> > > # ------------------------------
>> > > > >> >> > > # Aldrin
>> > > > >> >> > >
>> > > > >> >> > >
>> > > > >> >> > > https://github.com/drin/
>> > > > >> >> > > https://gitlab.com/octalene
>> > > > >> >> > > https://keybase.io/octalene
>> > > > >> >> > >
>> > > > >> >> > >
>> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
>> > > > >> >> > pgwhalen@gmail.com>
>> > > > >> >> > > wrote:
>> > > > >> >> > >
>> > > > >> >> > > > As a potential "end user developer," (and aspiring
>> > > contributor)
>> > > > >> this
>> > > > >> >> > > > immediately excited me when I first saw it.
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > I work at a trading firm, and my team has developed an
>> IPC
>> > > > >> mechanism
>> > > > >> >> > for
>> > > > >> >> > > > efficiently transmitting pandas dataframes both remotely
>> > via
>> > > > TCP
>> > > > >> and
>> > > > >> >> > > > locally via shared memory, where the interface for the
>> > > > >> application
>> > > > >> >> > > > developer is the same for both. The data in the
>> dataframes
>> > > may
>> > > > >> change
>> > > > >> >> > > > rapidly, so when communicating locally via shared memory,
>> > if
>> > > > the
>> > > > >> >> shape
>> > > > >> >> > of
>> > > > >> >> > > > the dataframe doesn't change, we update the memory in
>> > place,
>> > > > >> >> > coordinating
>> > > > >> >> > > > between the producer and consumer via TCP.
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > We intend to move away from our remote TCP mechanism
>> > towards
>> > > > >> Arrow
>> > > > >> >> > > Flight,
>> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the local
>> > > shared
>> > > > >> memory
>> > > > >> >> > > > mechanism which we previously did not have a good answer
>> > for,
>> > > > it
>> > > > >> >> seems
>> > > > >> >> > > like
>> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our problem.
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > So some features that enable our use case are:
>> > > > >> >> > > > - Updating existing batches in place is supported
>> > > > >> >> > > > - The interface is pretty similar to Flight
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > I'd imagine we're not the only financial firm to
>> implement
>> > > > >> something
>> > > > >> >> > like
>> > > > >> >> > > > this, given how widespread pandas usage is, so that could
>> > be
>> > > a
>> > > > >> place
>> > > > >> >> to
>> > > > >> >> > > > seek feedback.
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > As I was reading the proposal initially, I gleaned that
>> the
>> > > > most
>> > > > >> >> > > important
>> > > > >> >> > > > audience was those writing interfaces to GPUs/remote
>> > > > >> >> > memory/non-standard
>> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
>> updating
>> > > > >> batches in
>> > > > >> >> > > > place (and the producer/consumer coordination that comes
>> > with
>> > > > >> that)
>> > > > >> >> was
>> > > > >> >> > > > supported or encouraged as part of the proposal. But
>> > > > regardless,
>> > > > >> as
>> > > > >> >> an
>> > > > >> >> > > end
>> > > > >> >> > > > user, this seems like an easier and more efficient way to
>> > > glue
>> > > > >> pieces
>> > > > >> >> > in
>> > > > >> >> > > > the Arrow ecosystem together if it was adopted broadly.
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > Paul
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
>> > > > >> zotthewizard@gmail.com
>> > > > >> >> > wrote:
>> > > > >> >> > > >
>> > > > >> >> > >
>> > > > >> >> > > > > I'll continue my efforts of trying to reach out to
>> other
>> > > > >> interested
>> > > > >> >> > > > > parties, but if anyone else here has any contacts or
>> > > > >> connections
>> > > > >> >> that
>> > > > >> >> > > they
>> > > > >> >> > > > > think might be interested please forward them the link
>> to
>> > > the
>> > > > >> >> Google
>> > > > >> >> > > doc.
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > I really do want to get as much engagement and feedback
>> > as
>> > > > >> possible
>> > > > >> >> > on
>> > > > >> >> > > > > this.
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > Thanks!
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
>> > > > wesmckinn@gmail.com
>> > > > >> >> > wrote:
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > Have there been efforts to proactively reach out to
>> > other
>> > > > >> third
>> > > > >> >> > > parties
>> > > > >> >> > > > > > that might have an interest in this or be a potential
>> > > user
>> > > > at
>> > > > >> >> some
>> > > > >> >> > > point?
>> > > > >> >> > > > > > There are a lot of interested parties in Arrow that
>> may
>> > > not
>> > > > >> >> > actively
>> > > > >> >> > > > > > follow
>> > > > >> >> > > > > > the mailing list.
>> > > > >> >> > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
>> (especially
>> > > > >> folks at
>> > > > >> >> > > NVIDIA
>> > > > >> >> > > > > > or
>> > > > >> >> > > > > > working on UCX), or other communities like that might
>> > > have
>> > > > >> >> > > constructive
>> > > > >> >> > > > > > thoughts about this. DLPack (
>> > > > >> >> https://dmlc.github.io/dlpack/latest/
>> > > > >> >> > )
>> > > > >> >> > > also
>> > > > >> >> > > > > > seems adjacent and worth reaching out to. Other ideas
>> > for
>> > > > >> >> projects
>> > > > >> >> > or
>> > > > >> >> > > > > > companies that could be reached out to for feedback.
>> > > > >> >> > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
>> > > > >> >> antoine@python.org
>> > > > >> >> > > > > > wrote:
>> > > > >> >> > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > If there's no engagement, then I'm afraid it might
>> > mean
>> > > > >> that
>> > > > >> >> > third
>> > > > >> >> > > > > > > parties have no interest in this. I don't really
>> have
>> > > any
>> > > > >> >> > solution
>> > > > >> >> > > for
>> > > > >> >> > > > > > > generating engagement except nagging and pinging
>> > people
>> > > > >> >> > explicitly
>> > > > >> >> > > :-)
>> > > > >> >> > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
>> > > > >> >> > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > I would like to see the same Antoine, currently
>> > given
>> > > > the
>> > > > >> >> lack
>> > > > >> >> > of
>> > > > >> >> > > > > > > > engagement (both for OR against) I was going to
>> > take
>> > > > the
>> > > > >> >> > silence
>> > > > >> >> > > as
>> > > > >> >> > > > > > > > assent
>> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote
>> > in
>> > > > >> this.
>> > > > >> >> > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > If anyone has any suggestions on how we could
>> > > > potentially
>> > > > >> >> > > generate
>> > > > >> >> > > > > > > > more
>> > > > >> >> > > > > > > > engagement and discussion on this, please let me
>> > know
>> > > > as
>> > > > >> I
>> > > > >> >> want
>> > > > >> >> > > as
>> > > > >> >> > > > > > > > many
>> > > > >> >> > > > > > > > parties in the community as possible to be part
>> of
>> > > > this.
>> > > > >> >> > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > Thanks everyone.
>> > > > >> >> > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > --Matt
>> > > > >> >> > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
>> > > > >> >> > > antoine@python.org
>> > > > >> >> > > > > > > > wrote:
>> > > > >> >> > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > Hello,
>> > > > >> >> > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > I'd really like to see more engagement and
>> > > criticism
>> > > > >> from
>> > > > >> >> > > > > > > > > non-Voltron
>> > > > >> >> > > > > > > > > Data parties before this is formally adopted as
>> > an
>> > > > >> Arrow
>> > > > >> >> > spec.
>> > > > >> >> > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > Regards
>> > > > >> >> > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > Antoine.
>> > > > >> >> > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
>> > > > >> >> > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > Hey all,
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
>> officially
>> > > > >> adopt the
>> > > > >> >> > > protocol
>> > > > >> >> > > > > > > > > > described in the google doc[1] for
>> Dissociated
>> > > > Arrow
>> > > > >> IPC
>> > > > >> >> > > > > > > > > > Transports.
>> > > > >> >> > > > > > > > > > This
>> > > > >> >> > > > > > > > > > proposal was originally discussed at 2. Once
>> > this
>> > > > >> >> proposal
>> > > > >> >> > is
>> > > > >> >> > > > > > > > > > adopted,
>> > > > >> >> > > > > > > > > > I
>> > > > >> >> > > > > > > > > > will work on adding the necessary
>> documentation
>> > > to
>> > > > >> the
>> > > > >> >> > Arrow
>> > > > >> >> > > > > > > > > > website
>> > > > >> >> > > > > > > > > > along
>> > > > >> >> > > > > > > > > > with examples etc.
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > The vote will be open for at least 72 hours.
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
>> > > > >> >> > > > > > > > > > [ ] +0
>> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > Thank you everyone!
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > --Matt
>> > > > >> >> > > > > > > > > >
>> > > > >> >> > >
>> > > > >> >> > > > > > > > > > [1]:
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> > > > >
>> > > > >> >> > >
>> > > > >> >> >
>> > > > >> >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>> > > > >> >> >
>> > > > >> >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
>  There is a word doc with no implementation or PR.  I think there could
be an implementation / PR.

In the word doc there is a link to a POC implementation[1] showing this
protocol working with a flight service, ucx and libcudf. The key piece here
is that we're voting on adopting this protocol spec (i.e. I'll add it to
the documentation website) rather than us explicitly providing full
implementations or abstractions around it. We can provide reference
implementations like the POC, but I don't think they should be in the Arrow
monorepo or else we run the risk of a lot of the same issues that Flight
has: i.e. Adding anything to Flight in C++ requires fully wrapping the
grpc/flight primitives with Arrow equivalents to export which increases the
maintenance burden on us and makes it more difficult for users to leverage
the underlying knobs and dials.

> For example, does any ADBC client respect this protocol today?  If a
flight server responds with an S3/HTTP URI will the ADBC client download
the files from the correct place?  Will it at least notice that the URI is
not a GRPC URI and give a "I don't have a connector for downloading from
HTTP/S3" error?

I've split the S3/HTTP URI flight pieces out into a separate document and
separate thing to vote on at the request of several people who wanted to
view these as two separate proposals to vote on. So this vote *only* covers
adopting the protocol spec as an "Experimental Protocol" so we can start
seeing real world usage to help refine and improve it. That said, I believe
all clients currently would reject any non-grpc URI.

>   I was speaking with someone yesterday and they explained that
they ended up not choosing Flight for an internal project because Flight
didn't support something called "cloud fetch" which I have now learned is

I was reading through that link, and it seems like it's pretty much
*identical* to Flight as it currently exists, except that it is using cloud
storage (S3, GCS, etc.) URIs containing Arrow IPC *files*, rather than a
service sitting in front of those serving up Arrow IPC *streams*. Which has
been requested by others in the community, hence the second proposal that
was split out [2].

>  So a big +1 for the idea of disassociated transports but I'm not sure why
we need a vote to start working on it (but I'm not opposed if a vote helps)

Mostly I found that the google doc was easier for iterating on the protocol
specification than a markdown PR for the Arrow documentation as I could
more visually express things without a preview of the rendered markdown. If
it would get people to be more likely to vote on this, I can write up the
documentation markdown now and create a PR rather than waiting until we
decide we're even going to adopt this protocol as an "official" arrow
protocol.

Lemme know if there's any other unanswered questions!

--Matt

[1]: https://github.com/zeroshade/cudf-flight-ucx
[2]:
https://docs.google.com/document/d/1-x7tHWDzpbgmsjtTUnVXeEO4b7vMWDHTu-lzxlK9_hE/edit#heading=h.ub6lgn7s75tq

On Thu, Mar 28, 2024 at 4:53 PM Weston Pace <we...@gmail.com> wrote:

> I'm sorry for the very late reply.  Until yesterday I had no real concept
> of what this was talking about and so I had stayed out.
>
> I'm +0 only because it isn't clear what we are voting on.  There is a word
> doc with no implementation or PR.  I think there could be an implementation
> / PR.  For example, does any ADBC client respect this protocol today?  If a
> flight server responds with an S3/HTTP URI will the ADBC client download
> the files from the correct place?  Will it at least notice that the URI is
> not a GRPC URI and give a "I don't have a connector for downloading from
> HTTP/S3" error?  In general, I think we do want this in Flight (see
> comments below) and I am very supportive of the idea.  However, if adopting
> this as an experimental proposal helps move this forward then I think
> that's fine.
>
> That being said, I do want to express support for the proposal as a
> concept, at least the "disassociated transports" portion (I can't speak to
> UCX/etc.).  I was speaking with someone yesterday and they explained that
> they ended up not choosing Flight for an internal project because Flight
> didn't support something called "cloud fetch" which I have now learned is
> [1].  I had recalled looking at this proposal before and this person seemed
> interested and optimistic to know this was being considered for Flight.
> This proposal, as I understand it, should make it possible for cloud
> servers to support a cloud fetch style API.  From the discussion I got the
> impression that this cloud fetch approach is useful and generally
> applicable.
>
> So a big +1 for the idea of disassociated transports but I'm not sure why
> we need a vote to start working on it (but I'm not opposed if a vote helps)
>
> [1]
>
> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
>
> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com> wrote:
>
> > I'll keep this new vote open for at least the next 72 hours. As before
> > please reply with:
> >
> > [ ] +1 Accept this Proposal
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> > Thanks everyone!
> >
> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <be...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com> wrote:
> > >
> > > > Should I start a new thread for a new vote? Or repeat the original
> vote
> > > > email here?
> > > >
> > > > Just asking since there hasn't been any responses so far.
> > > >
> > > > --Matt
> > > >
> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Absolutely, it will be marked experimental until we see some people
> > > using
> > > > > it and can get more real-world feedback.
> > > > >
> > > > > There's also already a couple things that will be followed-up on
> > after
> > > > the
> > > > > initial adoption for expansion which were discussed in the
> comments.
> > > > >
> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org>
> wrote:
> > > > >
> > > > >> I think let's try again. Would it be reasonable to declare this
> > > > >> 'experimental' for the time being, just as we did with
> Flight/Flight
> > > > >> SQL/etc?
> > > > >>
> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> > > > >> > Hey All, It's been another month and we've gotten a whole bunch
> of
> > > > >> feedback
> > > > >> > and engagement on the document from a variety of individuals.
> > Myself
> > > > >> and a
> > > > >> > few others have proactively attempted to reach out to as many
> > third
> > > > >> parties
> > > > >> > as we could, hoping to pull more engagement also. While it would
> > be
> > > > >> great
> > > > >> > to get even more feedback, the comments have slowed down and we
> > > > haven't
> > > > >> > gotten anything in a few days at this point.
> > > > >> >
> > > > >> > If there's no objections, I'd like to try to open up for voting
> > > again
> > > > to
> > > > >> > officially adopt this as a protocol to add to our docs.
> > > > >> >
> > > > >> > Thanks all!
> > > > >> >
> > > > >> > --Matt
> > > > >> >
> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> >> Agreed that it makes sense not to focus on in-place updating
> for
> > > this
> > > > >> >> proposal.  I’m not even sure it’s a great fit as a “general
> > > purpose”
> > > > >> Arrow
> > > > >> >> protocol, because of all the assumptions and restrictions
> > required
> > > as
> > > > >> you
> > > > >> >> noted.
> > > > >> >>
> > > > >> >> I took another look at the proposal and don’t think there’s
> > > anything
> > > > >> >> preventing in-place updating in the future - ultimately the
> data
> > > body
> > > > >> could
> > > > >> >> just be in the same location for subsequent messages.
> > > > >> >>
> > > > >> >> Thanks!
> > > > >> >> Paul
> > > > >> >>
> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
> > zotthewizard@gmail.com>
> > > > >> wrote:
> > > > >> >>
> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and
> aspiring
> > > > >> >> > contributor) this
> > > > >> >> > immediately excited me when I first saw it.
> > > > >> >> >
> > > > >> >> > Yay! Good to hear that!
> > > > >> >> >
> > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating
> batches
> > > in
> > > > >> >> > place (and the producer/consumer coordination that comes with
> > > that)
> > > > >> was
> > > > >> >> > supported or encouraged as part of the proposal.
> > > > >> >> >
> > > > >> >> > So, updating batches in place was not a particular use-case
> we
> > > were
> > > > >> >> > targeting with this approach. Instead using shared memory to
> > > > produce
> > > > >> and
> > > > >> >> > consume the buffers/batches without having to physically copy
> > the
> > > > >> data.
> > > > >> >> > Trying to update a batch in place is a dangerous prospect
> for a
> > > > >> number of
> > > > >> >> > reasons, but as you've mentioned it can technically be made
> > safe
> > > if
> > > > >> the
> > > > >> >> > shape is staying the same and you're only modifying
> fixed-width
> > > > data
> > > > >> >> types
> > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of the
> > > > >> underlying
> > > > >> >> > data buffers are also remaining unchanged). The
> > producer/consumer
> > > > >> >> > coordination that would be needed for updating batches in
> place
> > > is
> > > > >> not
> > > > >> >> part
> > > > >> >> > of this proposal but is definitely something we can look into
> > as
> > > a
> > > > >> >> > follow-up to this for extending it. There's a number of
> > > discussions
> > > > >> that
> > > > >> >> > would need to be had around that so I don't want to add on
> > > another
> > > > >> >> > complexity to this already complex proposal.
> > > > >> >> >
> > > > >> >> > That said, if you or anyone see something in this proposal
> that
> > > > would
> > > > >> >> > hinder or prevent being able to use it for your use case
> please
> > > let
> > > > >> me
> > > > >> >> know
> > > > >> >> > so we can address it. Even though the proposal as it
> currently
> > > > exists
> > > > >> >> > doesn't fully support the in-place updating of batches, I
> don't
> > > > want
> > > > >> to
> > > > >> >> > make things harder for us in such a follow-up where we'd end
> up
> > > > >> requiring
> > > > >> >> > an entirely new protocol to support that.
> > > > >> >> >
> > > > >> >> > > @octalene.dev: I know of a third party that is interested
> in
> > > > >> Arrow for
> > > > >> >> > HPC environments that could be interested in the proposal
> and I
> > > can
> > > > >> see
> > > > >> >> if
> > > > >> >> > they're interested in providing feedback.
> > > > >> >> >
> > > > >> >> > Awesome! Thanks much!
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > For reference to anyone who hasn't looked at the document in
> a
> > > > while,
> > > > >> >> since
> > > > >> >> > the original discussion thread on this I have added a full
> > > > >> "Background
> > > > >> >> > Context" page to the beginning of the proposal to help anyone
> > who
> > > > >> isn't
> > > > >> >> > already familiar with the issues this protocol is trying to
> > solve
> > > > or
> > > > >> >> isn't
> > > > >> >> > already familiar with ucx or libfabric transports to better
> > > > >> understand
> > > > >> >> > *why* I'm
> > > > >> >> > proposing this and what it is trying to solve. The point of
> > this
> > > > >> >> background
> > > > >> >> > information is to help ensure that anyone who might have
> > thoughts
> > > > on
> > > > >> >> > protocols in general or APIs should still be able to
> understand
> > > the
> > > > >> base
> > > > >> >> > reasons and goals that we're trying to achieve with this
> > protocol
> > > > >> >> proposal.
> > > > >> >> > You don't need to already understand managing GPU/device
> memory
> > > or
> > > > >> ucx to
> > > > >> >> > be able to have meaningful input on the document.
> > > > >> >> >
> > > > >> >> > Thanks again to all who have contributed so far and please
> > spread
> > > > to
> > > > >> any
> > > > >> >> > contacts that you think might be interested in this for their
> > > > >> particular
> > > > >> >> > use cases.
> > > > >> >> >
> > > > >> >> > --Matt
> > > > >> >> >
> > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> > > <octalene.dev@pm.me.invalid
> > > > >
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> > > I am interested in this as well, but I haven't gotten to a
> > > point
> > > > >> where
> > > > >> >> I
> > > > >> >> > > can have valuable input (I haven't tried other
> transports). I
> > > > know
> > > > >> of a
> > > > >> >> > > third party that is interested in Arrow for HPC
> environments
> > > that
> > > > >> could
> > > > >> >> > be
> > > > >> >> > > interested in the proposal and I can see if they're
> > interested
> > > in
> > > > >> >> > providing
> > > > >> >> > > feedback.
> > > > >> >> > >
> > > > >> >> > > I glanced at the document before but I'll go through again
> to
> > > see
> > > > >> if
> > > > >> >> > there
> > > > >> >> > > is anything I can comment on.
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > # ------------------------------
> > > > >> >> > > # Aldrin
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > https://github.com/drin/
> > > > >> >> > > https://gitlab.com/octalene
> > > > >> >> > > https://keybase.io/octalene
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> > > > >> >> > pgwhalen@gmail.com>
> > > > >> >> > > wrote:
> > > > >> >> > >
> > > > >> >> > > > As a potential "end user developer," (and aspiring
> > > contributor)
> > > > >> this
> > > > >> >> > > > immediately excited me when I first saw it.
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > I work at a trading firm, and my team has developed an
> IPC
> > > > >> mechanism
> > > > >> >> > for
> > > > >> >> > > > efficiently transmitting pandas dataframes both remotely
> > via
> > > > TCP
> > > > >> and
> > > > >> >> > > > locally via shared memory, where the interface for the
> > > > >> application
> > > > >> >> > > > developer is the same for both. The data in the
> dataframes
> > > may
> > > > >> change
> > > > >> >> > > > rapidly, so when communicating locally via shared memory,
> > if
> > > > the
> > > > >> >> shape
> > > > >> >> > of
> > > > >> >> > > > the dataframe doesn't change, we update the memory in
> > place,
> > > > >> >> > coordinating
> > > > >> >> > > > between the producer and consumer via TCP.
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > We intend to move away from our remote TCP mechanism
> > towards
> > > > >> Arrow
> > > > >> >> > > Flight,
> > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the local
> > > shared
> > > > >> memory
> > > > >> >> > > > mechanism which we previously did not have a good answer
> > for,
> > > > it
> > > > >> >> seems
> > > > >> >> > > like
> > > > >> >> > > > Disassociated Arrow IPC maps quite well to our problem.
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > So some features that enable our use case are:
> > > > >> >> > > > - Updating existing batches in place is supported
> > > > >> >> > > > - The interface is pretty similar to Flight
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > I'd imagine we're not the only financial firm to
> implement
> > > > >> something
> > > > >> >> > like
> > > > >> >> > > > this, given how widespread pandas usage is, so that could
> > be
> > > a
> > > > >> place
> > > > >> >> to
> > > > >> >> > > > seek feedback.
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > As I was reading the proposal initially, I gleaned that
> the
> > > > most
> > > > >> >> > > important
> > > > >> >> > > > audience was those writing interfaces to GPUs/remote
> > > > >> >> > memory/non-standard
> > > > >> >> > > > transports/etc. And it wasn't clear to me whether
> updating
> > > > >> batches in
> > > > >> >> > > > place (and the producer/consumer coordination that comes
> > with
> > > > >> that)
> > > > >> >> was
> > > > >> >> > > > supported or encouraged as part of the proposal. But
> > > > regardless,
> > > > >> as
> > > > >> >> an
> > > > >> >> > > end
> > > > >> >> > > > user, this seems like an easier and more efficient way to
> > > glue
> > > > >> pieces
> > > > >> >> > in
> > > > >> >> > > > the Arrow ecosystem together if it was adopted broadly.
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > Paul
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> > > > >> zotthewizard@gmail.com
> > > > >> >> > wrote:
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > > > > I'll continue my efforts of trying to reach out to
> other
> > > > >> interested
> > > > >> >> > > > > parties, but if anyone else here has any contacts or
> > > > >> connections
> > > > >> >> that
> > > > >> >> > > they
> > > > >> >> > > > > think might be interested please forward them the link
> to
> > > the
> > > > >> >> Google
> > > > >> >> > > doc.
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> > > > > I really do want to get as much engagement and feedback
> > as
> > > > >> possible
> > > > >> >> > on
> > > > >> >> > > > > this.
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> > > > > Thanks!
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> > > > wesmckinn@gmail.com
> > > > >> >> > wrote:
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> > > > > > Have there been efforts to proactively reach out to
> > other
> > > > >> third
> > > > >> >> > > parties
> > > > >> >> > > > > > that might have an interest in this or be a potential
> > > user
> > > > at
> > > > >> >> some
> > > > >> >> > > point?
> > > > >> >> > > > > > There are a lot of interested parties in Arrow that
> may
> > > not
> > > > >> >> > actively
> > > > >> >> > > > > > follow
> > > > >> >> > > > > > the mailing list.
> > > > >> >> > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS
> (especially
> > > > >> folks at
> > > > >> >> > > NVIDIA
> > > > >> >> > > > > > or
> > > > >> >> > > > > > working on UCX), or other communities like that might
> > > have
> > > > >> >> > > constructive
> > > > >> >> > > > > > thoughts about this. DLPack (
> > > > >> >> https://dmlc.github.io/dlpack/latest/
> > > > >> >> > )
> > > > >> >> > > also
> > > > >> >> > > > > > seems adjacent and worth reaching out to. Other ideas
> > for
> > > > >> >> projects
> > > > >> >> > or
> > > > >> >> > > > > > companies that could be reached out to for feedback.
> > > > >> >> > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> > > > >> >> antoine@python.org
> > > > >> >> > > > > > wrote:
> > > > >> >> > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > If there's no engagement, then I'm afraid it might
> > mean
> > > > >> that
> > > > >> >> > third
> > > > >> >> > > > > > > parties have no interest in this. I don't really
> have
> > > any
> > > > >> >> > solution
> > > > >> >> > > for
> > > > >> >> > > > > > > generating engagement except nagging and pinging
> > people
> > > > >> >> > explicitly
> > > > >> >> > > :-)
> > > > >> >> > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > >> >> > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > I would like to see the same Antoine, currently
> > given
> > > > the
> > > > >> >> lack
> > > > >> >> > of
> > > > >> >> > > > > > > > engagement (both for OR against) I was going to
> > take
> > > > the
> > > > >> >> > silence
> > > > >> >> > > as
> > > > >> >> > > > > > > > assent
> > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote
> > in
> > > > >> this.
> > > > >> >> > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > If anyone has any suggestions on how we could
> > > > potentially
> > > > >> >> > > generate
> > > > >> >> > > > > > > > more
> > > > >> >> > > > > > > > engagement and discussion on this, please let me
> > know
> > > > as
> > > > >> I
> > > > >> >> want
> > > > >> >> > > as
> > > > >> >> > > > > > > > many
> > > > >> >> > > > > > > > parties in the community as possible to be part
> of
> > > > this.
> > > > >> >> > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > Thanks everyone.
> > > > >> >> > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > --Matt
> > > > >> >> > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > > > >> >> > > antoine@python.org
> > > > >> >> > > > > > > > wrote:
> > > > >> >> > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > Hello,
> > > > >> >> > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > I'd really like to see more engagement and
> > > criticism
> > > > >> from
> > > > >> >> > > > > > > > > non-Voltron
> > > > >> >> > > > > > > > > Data parties before this is formally adopted as
> > an
> > > > >> Arrow
> > > > >> >> > spec.
> > > > >> >> > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > Regards
> > > > >> >> > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > Antoine.
> > > > >> >> > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > >> >> > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > Hey all,
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > I'd like to propose a vote for us to
> officially
> > > > >> adopt the
> > > > >> >> > > protocol
> > > > >> >> > > > > > > > > > described in the google doc[1] for
> Dissociated
> > > > Arrow
> > > > >> IPC
> > > > >> >> > > > > > > > > > Transports.
> > > > >> >> > > > > > > > > > This
> > > > >> >> > > > > > > > > > proposal was originally discussed at 2. Once
> > this
> > > > >> >> proposal
> > > > >> >> > is
> > > > >> >> > > > > > > > > > adopted,
> > > > >> >> > > > > > > > > > I
> > > > >> >> > > > > > > > > > will work on adding the necessary
> documentation
> > > to
> > > > >> the
> > > > >> >> > Arrow
> > > > >> >> > > > > > > > > > website
> > > > >> >> > > > > > > > > > along
> > > > >> >> > > > > > > > > > with examples etc.
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > The vote will be open for at least 72 hours.
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> > > > >> >> > > > > > > > > > [ ] +0
> > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > Thank you everyone!
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > --Matt
> > > > >> >> > > > > > > > > >
> > > > >> >> > >
> > > > >> >> > > > > > > > > > [1]:
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Weston Pace <we...@gmail.com>.
I'm sorry for the very late reply.  Until yesterday I had no real concept
of what this was talking about and so I had stayed out.

I'm +0 only because it isn't clear what we are voting on.  There is a word
doc with no implementation or PR.  I think there could be an implementation
/ PR.  For example, does any ADBC client respect this protocol today?  If a
flight server responds with an S3/HTTP URI will the ADBC client download
the files from the correct place?  Will it at least notice that the URI is
not a GRPC URI and give a "I don't have a connector for downloading from
HTTP/S3" error?  In general, I think we do want this in Flight (see
comments below) and I am very supportive of the idea.  However, if adopting
this as an experimental proposal helps move this forward then I think
that's fine.

That being said, I do want to express support for the proposal as a
concept, at least the "disassociated transports" portion (I can't speak to
UCX/etc.).  I was speaking with someone yesterday and they explained that
they ended up not choosing Flight for an internal project because Flight
didn't support something called "cloud fetch" which I have now learned is
[1].  I had recalled looking at this proposal before and this person seemed
interested and optimistic to know this was being considered for Flight.
This proposal, as I understand it, should make it possible for cloud
servers to support a cloud fetch style API.  From the discussion I got the
impression that this cloud fetch approach is useful and generally
applicable.

So a big +1 for the idea of disassociated transports but I'm not sure why
we need a vote to start working on it (but I'm not opposed if a vote helps)

[1]
https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html

On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zo...@gmail.com> wrote:

> I'll keep this new vote open for at least the next 72 hours. As before
> please reply with:
>
> [ ] +1 Accept this Proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
> Thanks everyone!
>
> On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <be...@gmail.com>
> wrote:
>
> > +1
> >
> > On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com> wrote:
> >
> > > Should I start a new thread for a new vote? Or repeat the original vote
> > > email here?
> > >
> > > Just asking since there hasn't been any responses so far.
> > >
> > > --Matt
> > >
> > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com>
> > > wrote:
> > >
> > > > Absolutely, it will be marked experimental until we see some people
> > using
> > > > it and can get more real-world feedback.
> > > >
> > > > There's also already a couple things that will be followed-up on
> after
> > > the
> > > > initial adoption for expansion which were discussed in the comments.
> > > >
> > > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org> wrote:
> > > >
> > > >> I think let's try again. Would it be reasonable to declare this
> > > >> 'experimental' for the time being, just as we did with Flight/Flight
> > > >> SQL/etc?
> > > >>
> > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> > > >> > Hey All, It's been another month and we've gotten a whole bunch of
> > > >> feedback
> > > >> > and engagement on the document from a variety of individuals.
> Myself
> > > >> and a
> > > >> > few others have proactively attempted to reach out to as many
> third
> > > >> parties
> > > >> > as we could, hoping to pull more engagement also. While it would
> be
> > > >> great
> > > >> > to get even more feedback, the comments have slowed down and we
> > > haven't
> > > >> > gotten anything in a few days at this point.
> > > >> >
> > > >> > If there's no objections, I'd like to try to open up for voting
> > again
> > > to
> > > >> > officially adopt this as a protocol to add to our docs.
> > > >> >
> > > >> > Thanks all!
> > > >> >
> > > >> > --Matt
> > > >> >
> > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com>
> > > wrote:
> > > >> >
> > > >> >> Agreed that it makes sense not to focus on in-place updating for
> > this
> > > >> >> proposal.  I’m not even sure it’s a great fit as a “general
> > purpose”
> > > >> Arrow
> > > >> >> protocol, because of all the assumptions and restrictions
> required
> > as
> > > >> you
> > > >> >> noted.
> > > >> >>
> > > >> >> I took another look at the proposal and don’t think there’s
> > anything
> > > >> >> preventing in-place updating in the future - ultimately the data
> > body
> > > >> could
> > > >> >> just be in the same location for subsequent messages.
> > > >> >>
> > > >> >> Thanks!
> > > >> >> Paul
> > > >> >>
> > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <
> zotthewizard@gmail.com>
> > > >> wrote:
> > > >> >>
> > > >> >> > > @pgwhalen: As a potential "end user developer," (and aspiring
> > > >> >> > contributor) this
> > > >> >> > immediately excited me when I first saw it.
> > > >> >> >
> > > >> >> > Yay! Good to hear that!
> > > >> >> >
> > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating batches
> > in
> > > >> >> > place (and the producer/consumer coordination that comes with
> > that)
> > > >> was
> > > >> >> > supported or encouraged as part of the proposal.
> > > >> >> >
> > > >> >> > So, updating batches in place was not a particular use-case we
> > were
> > > >> >> > targeting with this approach. Instead using shared memory to
> > > produce
> > > >> and
> > > >> >> > consume the buffers/batches without having to physically copy
> the
> > > >> data.
> > > >> >> > Trying to update a batch in place is a dangerous prospect for a
> > > >> number of
> > > >> >> > reasons, but as you've mentioned it can technically be made
> safe
> > if
> > > >> the
> > > >> >> > shape is staying the same and you're only modifying fixed-width
> > > data
> > > >> >> types
> > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of the
> > > >> underlying
> > > >> >> > data buffers are also remaining unchanged). The
> producer/consumer
> > > >> >> > coordination that would be needed for updating batches in place
> > is
> > > >> not
> > > >> >> part
> > > >> >> > of this proposal but is definitely something we can look into
> as
> > a
> > > >> >> > follow-up to this for extending it. There's a number of
> > discussions
> > > >> that
> > > >> >> > would need to be had around that so I don't want to add on
> > another
> > > >> >> > complexity to this already complex proposal.
> > > >> >> >
> > > >> >> > That said, if you or anyone see something in this proposal that
> > > would
> > > >> >> > hinder or prevent being able to use it for your use case please
> > let
> > > >> me
> > > >> >> know
> > > >> >> > so we can address it. Even though the proposal as it currently
> > > exists
> > > >> >> > doesn't fully support the in-place updating of batches, I don't
> > > want
> > > >> to
> > > >> >> > make things harder for us in such a follow-up where we'd end up
> > > >> requiring
> > > >> >> > an entirely new protocol to support that.
> > > >> >> >
> > > >> >> > > @octalene.dev: I know of a third party that is interested in
> > > >> Arrow for
> > > >> >> > HPC environments that could be interested in the proposal and I
> > can
> > > >> see
> > > >> >> if
> > > >> >> > they're interested in providing feedback.
> > > >> >> >
> > > >> >> > Awesome! Thanks much!
> > > >> >> >
> > > >> >> >
> > > >> >> > For reference to anyone who hasn't looked at the document in a
> > > while,
> > > >> >> since
> > > >> >> > the original discussion thread on this I have added a full
> > > >> "Background
> > > >> >> > Context" page to the beginning of the proposal to help anyone
> who
> > > >> isn't
> > > >> >> > already familiar with the issues this protocol is trying to
> solve
> > > or
> > > >> >> isn't
> > > >> >> > already familiar with ucx or libfabric transports to better
> > > >> understand
> > > >> >> > *why* I'm
> > > >> >> > proposing this and what it is trying to solve. The point of
> this
> > > >> >> background
> > > >> >> > information is to help ensure that anyone who might have
> thoughts
> > > on
> > > >> >> > protocols in general or APIs should still be able to understand
> > the
> > > >> base
> > > >> >> > reasons and goals that we're trying to achieve with this
> protocol
> > > >> >> proposal.
> > > >> >> > You don't need to already understand managing GPU/device memory
> > or
> > > >> ucx to
> > > >> >> > be able to have meaningful input on the document.
> > > >> >> >
> > > >> >> > Thanks again to all who have contributed so far and please
> spread
> > > to
> > > >> any
> > > >> >> > contacts that you think might be interested in this for their
> > > >> particular
> > > >> >> > use cases.
> > > >> >> >
> > > >> >> > --Matt
> > > >> >> >
> > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> > <octalene.dev@pm.me.invalid
> > > >
> > > >> >> wrote:
> > > >> >> >
> > > >> >> > > I am interested in this as well, but I haven't gotten to a
> > point
> > > >> where
> > > >> >> I
> > > >> >> > > can have valuable input (I haven't tried other transports). I
> > > know
> > > >> of a
> > > >> >> > > third party that is interested in Arrow for HPC environments
> > that
> > > >> could
> > > >> >> > be
> > > >> >> > > interested in the proposal and I can see if they're
> interested
> > in
> > > >> >> > providing
> > > >> >> > > feedback.
> > > >> >> > >
> > > >> >> > > I glanced at the document before but I'll go through again to
> > see
> > > >> if
> > > >> >> > there
> > > >> >> > > is anything I can comment on.
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > # ------------------------------
> > > >> >> > > # Aldrin
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > https://github.com/drin/
> > > >> >> > > https://gitlab.com/octalene
> > > >> >> > > https://keybase.io/octalene
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> > > >> >> > pgwhalen@gmail.com>
> > > >> >> > > wrote:
> > > >> >> > >
> > > >> >> > > > As a potential "end user developer," (and aspiring
> > contributor)
> > > >> this
> > > >> >> > > > immediately excited me when I first saw it.
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > I work at a trading firm, and my team has developed an IPC
> > > >> mechanism
> > > >> >> > for
> > > >> >> > > > efficiently transmitting pandas dataframes both remotely
> via
> > > TCP
> > > >> and
> > > >> >> > > > locally via shared memory, where the interface for the
> > > >> application
> > > >> >> > > > developer is the same for both. The data in the dataframes
> > may
> > > >> change
> > > >> >> > > > rapidly, so when communicating locally via shared memory,
> if
> > > the
> > > >> >> shape
> > > >> >> > of
> > > >> >> > > > the dataframe doesn't change, we update the memory in
> place,
> > > >> >> > coordinating
> > > >> >> > > > between the producer and consumer via TCP.
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > We intend to move away from our remote TCP mechanism
> towards
> > > >> Arrow
> > > >> >> > > Flight,
> > > >> >> > > > or a lighter-weight version of Arrow IPC. For the local
> > shared
> > > >> memory
> > > >> >> > > > mechanism which we previously did not have a good answer
> for,
> > > it
> > > >> >> seems
> > > >> >> > > like
> > > >> >> > > > Disassociated Arrow IPC maps quite well to our problem.
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > So some features that enable our use case are:
> > > >> >> > > > - Updating existing batches in place is supported
> > > >> >> > > > - The interface is pretty similar to Flight
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > I'd imagine we're not the only financial firm to implement
> > > >> something
> > > >> >> > like
> > > >> >> > > > this, given how widespread pandas usage is, so that could
> be
> > a
> > > >> place
> > > >> >> to
> > > >> >> > > > seek feedback.
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > As I was reading the proposal initially, I gleaned that the
> > > most
> > > >> >> > > important
> > > >> >> > > > audience was those writing interfaces to GPUs/remote
> > > >> >> > memory/non-standard
> > > >> >> > > > transports/etc. And it wasn't clear to me whether updating
> > > >> batches in
> > > >> >> > > > place (and the producer/consumer coordination that comes
> with
> > > >> that)
> > > >> >> was
> > > >> >> > > > supported or encouraged as part of the proposal. But
> > > regardless,
> > > >> as
> > > >> >> an
> > > >> >> > > end
> > > >> >> > > > user, this seems like an easier and more efficient way to
> > glue
> > > >> pieces
> > > >> >> > in
> > > >> >> > > > the Arrow ecosystem together if it was adopted broadly.
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > Paul
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> > > >> zotthewizard@gmail.com
> > > >> >> > wrote:
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > > > > I'll continue my efforts of trying to reach out to other
> > > >> interested
> > > >> >> > > > > parties, but if anyone else here has any contacts or
> > > >> connections
> > > >> >> that
> > > >> >> > > they
> > > >> >> > > > > think might be interested please forward them the link to
> > the
> > > >> >> Google
> > > >> >> > > doc.
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> > > > > I really do want to get as much engagement and feedback
> as
> > > >> possible
> > > >> >> > on
> > > >> >> > > > > this.
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> > > > > Thanks!
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> > > wesmckinn@gmail.com
> > > >> >> > wrote:
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> > > > > > Have there been efforts to proactively reach out to
> other
> > > >> third
> > > >> >> > > parties
> > > >> >> > > > > > that might have an interest in this or be a potential
> > user
> > > at
> > > >> >> some
> > > >> >> > > point?
> > > >> >> > > > > > There are a lot of interested parties in Arrow that may
> > not
> > > >> >> > actively
> > > >> >> > > > > > follow
> > > >> >> > > > > > the mailing list.
> > > >> >> > > > > >
> > > >> >> > >
> > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially
> > > >> folks at
> > > >> >> > > NVIDIA
> > > >> >> > > > > > or
> > > >> >> > > > > > working on UCX), or other communities like that might
> > have
> > > >> >> > > constructive
> > > >> >> > > > > > thoughts about this. DLPack (
> > > >> >> https://dmlc.github.io/dlpack/latest/
> > > >> >> > )
> > > >> >> > > also
> > > >> >> > > > > > seems adjacent and worth reaching out to. Other ideas
> for
> > > >> >> projects
> > > >> >> > or
> > > >> >> > > > > > companies that could be reached out to for feedback.
> > > >> >> > > > > >
> > > >> >> > >
> > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> > > >> >> antoine@python.org
> > > >> >> > > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > >
> > > >> >> > > > > > > If there's no engagement, then I'm afraid it might
> mean
> > > >> that
> > > >> >> > third
> > > >> >> > > > > > > parties have no interest in this. I don't really have
> > any
> > > >> >> > solution
> > > >> >> > > for
> > > >> >> > > > > > > generating engagement except nagging and pinging
> people
> > > >> >> > explicitly
> > > >> >> > > :-)
> > > >> >> > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > >> >> > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > I would like to see the same Antoine, currently
> given
> > > the
> > > >> >> lack
> > > >> >> > of
> > > >> >> > > > > > > > engagement (both for OR against) I was going to
> take
> > > the
> > > >> >> > silence
> > > >> >> > > as
> > > >> >> > > > > > > > assent
> > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote
> in
> > > >> this.
> > > >> >> > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > If anyone has any suggestions on how we could
> > > potentially
> > > >> >> > > generate
> > > >> >> > > > > > > > more
> > > >> >> > > > > > > > engagement and discussion on this, please let me
> know
> > > as
> > > >> I
> > > >> >> want
> > > >> >> > > as
> > > >> >> > > > > > > > many
> > > >> >> > > > > > > > parties in the community as possible to be part of
> > > this.
> > > >> >> > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > Thanks everyone.
> > > >> >> > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > --Matt
> > > >> >> > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > > >> >> > > antoine@python.org
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > Hello,
> > > >> >> > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > I'd really like to see more engagement and
> > criticism
> > > >> from
> > > >> >> > > > > > > > > non-Voltron
> > > >> >> > > > > > > > > Data parties before this is formally adopted as
> an
> > > >> Arrow
> > > >> >> > spec.
> > > >> >> > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > Regards
> > > >> >> > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > Antoine.
> > > >> >> > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > >> >> > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > Hey all,
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > I'd like to propose a vote for us to officially
> > > >> adopt the
> > > >> >> > > protocol
> > > >> >> > > > > > > > > > described in the google doc[1] for Dissociated
> > > Arrow
> > > >> IPC
> > > >> >> > > > > > > > > > Transports.
> > > >> >> > > > > > > > > > This
> > > >> >> > > > > > > > > > proposal was originally discussed at 2. Once
> this
> > > >> >> proposal
> > > >> >> > is
> > > >> >> > > > > > > > > > adopted,
> > > >> >> > > > > > > > > > I
> > > >> >> > > > > > > > > > will work on adding the necessary documentation
> > to
> > > >> the
> > > >> >> > Arrow
> > > >> >> > > > > > > > > > website
> > > >> >> > > > > > > > > > along
> > > >> >> > > > > > > > > > with examples etc.
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > The vote will be open for at least 72 hours.
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> > > >> >> > > > > > > > > > [ ] +0
> > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > Thank you everyone!
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > --Matt
> > > >> >> > > > > > > > > >
> > > >> >> > >
> > > >> >> > > > > > > > > > [1]:
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> > > >> >> >
> > > >> >>
> > > >>
> > > >
> > >
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
I'll keep this new vote open for at least the next 72 hours. As before
please reply with:

[ ] +1 Accept this Proposal
[ ] +0
[ ] -1 Do not accept this proposal because...

Thanks everyone!

On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <be...@gmail.com>
wrote:

> +1
>
> On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com> wrote:
>
> > Should I start a new thread for a new vote? Or repeat the original vote
> > email here?
> >
> > Just asking since there hasn't been any responses so far.
> >
> > --Matt
> >
> > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com>
> > wrote:
> >
> > > Absolutely, it will be marked experimental until we see some people
> using
> > > it and can get more real-world feedback.
> > >
> > > There's also already a couple things that will be followed-up on after
> > the
> > > initial adoption for expansion which were discussed in the comments.
> > >
> > > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org> wrote:
> > >
> > >> I think let's try again. Would it be reasonable to declare this
> > >> 'experimental' for the time being, just as we did with Flight/Flight
> > >> SQL/etc?
> > >>
> > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> > >> > Hey All, It's been another month and we've gotten a whole bunch of
> > >> feedback
> > >> > and engagement on the document from a variety of individuals. Myself
> > >> and a
> > >> > few others have proactively attempted to reach out to as many third
> > >> parties
> > >> > as we could, hoping to pull more engagement also. While it would be
> > >> great
> > >> > to get even more feedback, the comments have slowed down and we
> > haven't
> > >> > gotten anything in a few days at this point.
> > >> >
> > >> > If there's no objections, I'd like to try to open up for voting
> again
> > to
> > >> > officially adopt this as a protocol to add to our docs.
> > >> >
> > >> > Thanks all!
> > >> >
> > >> > --Matt
> > >> >
> > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com>
> > wrote:
> > >> >
> > >> >> Agreed that it makes sense not to focus on in-place updating for
> this
> > >> >> proposal.  I’m not even sure it’s a great fit as a “general
> purpose”
> > >> Arrow
> > >> >> protocol, because of all the assumptions and restrictions required
> as
> > >> you
> > >> >> noted.
> > >> >>
> > >> >> I took another look at the proposal and don’t think there’s
> anything
> > >> >> preventing in-place updating in the future - ultimately the data
> body
> > >> could
> > >> >> just be in the same location for subsequent messages.
> > >> >>
> > >> >> Thanks!
> > >> >> Paul
> > >> >>
> > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com>
> > >> wrote:
> > >> >>
> > >> >> > > @pgwhalen: As a potential "end user developer," (and aspiring
> > >> >> > contributor) this
> > >> >> > immediately excited me when I first saw it.
> > >> >> >
> > >> >> > Yay! Good to hear that!
> > >> >> >
> > >> >> > > @pgwhalen: And it wasn't clear to me whether updating batches
> in
> > >> >> > place (and the producer/consumer coordination that comes with
> that)
> > >> was
> > >> >> > supported or encouraged as part of the proposal.
> > >> >> >
> > >> >> > So, updating batches in place was not a particular use-case we
> were
> > >> >> > targeting with this approach. Instead using shared memory to
> > produce
> > >> and
> > >> >> > consume the buffers/batches without having to physically copy the
> > >> data.
> > >> >> > Trying to update a batch in place is a dangerous prospect for a
> > >> number of
> > >> >> > reasons, but as you've mentioned it can technically be made safe
> if
> > >> the
> > >> >> > shape is staying the same and you're only modifying fixed-width
> > data
> > >> >> types
> > >> >> > (i.e. not only is the *shape* unchanged but the sizes of the
> > >> underlying
> > >> >> > data buffers are also remaining unchanged). The producer/consumer
> > >> >> > coordination that would be needed for updating batches in place
> is
> > >> not
> > >> >> part
> > >> >> > of this proposal but is definitely something we can look into as
> a
> > >> >> > follow-up to this for extending it. There's a number of
> discussions
> > >> that
> > >> >> > would need to be had around that so I don't want to add on
> another
> > >> >> > complexity to this already complex proposal.
> > >> >> >
> > >> >> > That said, if you or anyone see something in this proposal that
> > would
> > >> >> > hinder or prevent being able to use it for your use case please
> let
> > >> me
> > >> >> know
> > >> >> > so we can address it. Even though the proposal as it currently
> > exists
> > >> >> > doesn't fully support the in-place updating of batches, I don't
> > want
> > >> to
> > >> >> > make things harder for us in such a follow-up where we'd end up
> > >> requiring
> > >> >> > an entirely new protocol to support that.
> > >> >> >
> > >> >> > > @octalene.dev: I know of a third party that is interested in
> > >> Arrow for
> > >> >> > HPC environments that could be interested in the proposal and I
> can
> > >> see
> > >> >> if
> > >> >> > they're interested in providing feedback.
> > >> >> >
> > >> >> > Awesome! Thanks much!
> > >> >> >
> > >> >> >
> > >> >> > For reference to anyone who hasn't looked at the document in a
> > while,
> > >> >> since
> > >> >> > the original discussion thread on this I have added a full
> > >> "Background
> > >> >> > Context" page to the beginning of the proposal to help anyone who
> > >> isn't
> > >> >> > already familiar with the issues this protocol is trying to solve
> > or
> > >> >> isn't
> > >> >> > already familiar with ucx or libfabric transports to better
> > >> understand
> > >> >> > *why* I'm
> > >> >> > proposing this and what it is trying to solve. The point of this
> > >> >> background
> > >> >> > information is to help ensure that anyone who might have thoughts
> > on
> > >> >> > protocols in general or APIs should still be able to understand
> the
> > >> base
> > >> >> > reasons and goals that we're trying to achieve with this protocol
> > >> >> proposal.
> > >> >> > You don't need to already understand managing GPU/device memory
> or
> > >> ucx to
> > >> >> > be able to have meaningful input on the document.
> > >> >> >
> > >> >> > Thanks again to all who have contributed so far and please spread
> > to
> > >> any
> > >> >> > contacts that you think might be interested in this for their
> > >> particular
> > >> >> > use cases.
> > >> >> >
> > >> >> > --Matt
> > >> >> >
> > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin
> <octalene.dev@pm.me.invalid
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> > > I am interested in this as well, but I haven't gotten to a
> point
> > >> where
> > >> >> I
> > >> >> > > can have valuable input (I haven't tried other transports). I
> > know
> > >> of a
> > >> >> > > third party that is interested in Arrow for HPC environments
> that
> > >> could
> > >> >> > be
> > >> >> > > interested in the proposal and I can see if they're interested
> in
> > >> >> > providing
> > >> >> > > feedback.
> > >> >> > >
> > >> >> > > I glanced at the document before but I'll go through again to
> see
> > >> if
> > >> >> > there
> > >> >> > > is anything I can comment on.
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > # ------------------------------
> > >> >> > > # Aldrin
> > >> >> > >
> > >> >> > >
> > >> >> > > https://github.com/drin/
> > >> >> > > https://gitlab.com/octalene
> > >> >> > > https://keybase.io/octalene
> > >> >> > >
> > >> >> > >
> > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> > >> >> > pgwhalen@gmail.com>
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > As a potential "end user developer," (and aspiring
> contributor)
> > >> this
> > >> >> > > > immediately excited me when I first saw it.
> > >> >> > > >
> > >> >> > >
> > >> >> > > > I work at a trading firm, and my team has developed an IPC
> > >> mechanism
> > >> >> > for
> > >> >> > > > efficiently transmitting pandas dataframes both remotely via
> > TCP
> > >> and
> > >> >> > > > locally via shared memory, where the interface for the
> > >> application
> > >> >> > > > developer is the same for both. The data in the dataframes
> may
> > >> change
> > >> >> > > > rapidly, so when communicating locally via shared memory, if
> > the
> > >> >> shape
> > >> >> > of
> > >> >> > > > the dataframe doesn't change, we update the memory in place,
> > >> >> > coordinating
> > >> >> > > > between the producer and consumer via TCP.
> > >> >> > > >
> > >> >> > >
> > >> >> > > > We intend to move away from our remote TCP mechanism towards
> > >> Arrow
> > >> >> > > Flight,
> > >> >> > > > or a lighter-weight version of Arrow IPC. For the local
> shared
> > >> memory
> > >> >> > > > mechanism which we previously did not have a good answer for,
> > it
> > >> >> seems
> > >> >> > > like
> > >> >> > > > Disassociated Arrow IPC maps quite well to our problem.
> > >> >> > > >
> > >> >> > >
> > >> >> > > > So some features that enable our use case are:
> > >> >> > > > - Updating existing batches in place is supported
> > >> >> > > > - The interface is pretty similar to Flight
> > >> >> > > >
> > >> >> > >
> > >> >> > > > I'd imagine we're not the only financial firm to implement
> > >> something
> > >> >> > like
> > >> >> > > > this, given how widespread pandas usage is, so that could be
> a
> > >> place
> > >> >> to
> > >> >> > > > seek feedback.
> > >> >> > > >
> > >> >> > >
> > >> >> > > > As I was reading the proposal initially, I gleaned that the
> > most
> > >> >> > > important
> > >> >> > > > audience was those writing interfaces to GPUs/remote
> > >> >> > memory/non-standard
> > >> >> > > > transports/etc. And it wasn't clear to me whether updating
> > >> batches in
> > >> >> > > > place (and the producer/consumer coordination that comes with
> > >> that)
> > >> >> was
> > >> >> > > > supported or encouraged as part of the proposal. But
> > regardless,
> > >> as
> > >> >> an
> > >> >> > > end
> > >> >> > > > user, this seems like an easier and more efficient way to
> glue
> > >> pieces
> > >> >> > in
> > >> >> > > > the Arrow ecosystem together if it was adopted broadly.
> > >> >> > > >
> > >> >> > >
> > >> >> > > > Paul
> > >> >> > > >
> > >> >> > >
> > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> > >> zotthewizard@gmail.com
> > >> >> > wrote:
> > >> >> > > >
> > >> >> > >
> > >> >> > > > > I'll continue my efforts of trying to reach out to other
> > >> interested
> > >> >> > > > > parties, but if anyone else here has any contacts or
> > >> connections
> > >> >> that
> > >> >> > > they
> > >> >> > > > > think might be interested please forward them the link to
> the
> > >> >> Google
> > >> >> > > doc.
> > >> >> > > > >
> > >> >> > >
> > >> >> > > > > I really do want to get as much engagement and feedback as
> > >> possible
> > >> >> > on
> > >> >> > > > > this.
> > >> >> > > > >
> > >> >> > >
> > >> >> > > > > Thanks!
> > >> >> > > > >
> > >> >> > >
> > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> > wesmckinn@gmail.com
> > >> >> > wrote:
> > >> >> > > > >
> > >> >> > >
> > >> >> > > > > > Have there been efforts to proactively reach out to other
> > >> third
> > >> >> > > parties
> > >> >> > > > > > that might have an interest in this or be a potential
> user
> > at
> > >> >> some
> > >> >> > > point?
> > >> >> > > > > > There are a lot of interested parties in Arrow that may
> not
> > >> >> > actively
> > >> >> > > > > > follow
> > >> >> > > > > > the mailing list.
> > >> >> > > > > >
> > >> >> > >
> > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially
> > >> folks at
> > >> >> > > NVIDIA
> > >> >> > > > > > or
> > >> >> > > > > > working on UCX), or other communities like that might
> have
> > >> >> > > constructive
> > >> >> > > > > > thoughts about this. DLPack (
> > >> >> https://dmlc.github.io/dlpack/latest/
> > >> >> > )
> > >> >> > > also
> > >> >> > > > > > seems adjacent and worth reaching out to. Other ideas for
> > >> >> projects
> > >> >> > or
> > >> >> > > > > > companies that could be reached out to for feedback.
> > >> >> > > > > >
> > >> >> > >
> > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> > >> >> antoine@python.org
> > >> >> > > > > > wrote:
> > >> >> > > > > >
> > >> >> > >
> > >> >> > > > > > > If there's no engagement, then I'm afraid it might mean
> > >> that
> > >> >> > third
> > >> >> > > > > > > parties have no interest in this. I don't really have
> any
> > >> >> > solution
> > >> >> > > for
> > >> >> > > > > > > generating engagement except nagging and pinging people
> > >> >> > explicitly
> > >> >> > > :-)
> > >> >> > > > > > >
> > >> >> > >
> > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > >> >> > > > > > >
> > >> >> > >
> > >> >> > > > > > > > I would like to see the same Antoine, currently given
> > the
> > >> >> lack
> > >> >> > of
> > >> >> > > > > > > > engagement (both for OR against) I was going to take
> > the
> > >> >> > silence
> > >> >> > > as
> > >> >> > > > > > > > assent
> > >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote in
> > >> this.
> > >> >> > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > If anyone has any suggestions on how we could
> > potentially
> > >> >> > > generate
> > >> >> > > > > > > > more
> > >> >> > > > > > > > engagement and discussion on this, please let me know
> > as
> > >> I
> > >> >> want
> > >> >> > > as
> > >> >> > > > > > > > many
> > >> >> > > > > > > > parties in the community as possible to be part of
> > this.
> > >> >> > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > Thanks everyone.
> > >> >> > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > --Matt
> > >> >> > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > >> >> > > antoine@python.org
> > >> >> > > > > > > > wrote:
> > >> >> > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > Hello,
> > >> >> > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > I'd really like to see more engagement and
> criticism
> > >> from
> > >> >> > > > > > > > > non-Voltron
> > >> >> > > > > > > > > Data parties before this is formally adopted as an
> > >> Arrow
> > >> >> > spec.
> > >> >> > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > Regards
> > >> >> > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > Antoine.
> > >> >> > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > >> >> > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > Hey all,
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > I'd like to propose a vote for us to officially
> > >> adopt the
> > >> >> > > protocol
> > >> >> > > > > > > > > > described in the google doc[1] for Dissociated
> > Arrow
> > >> IPC
> > >> >> > > > > > > > > > Transports.
> > >> >> > > > > > > > > > This
> > >> >> > > > > > > > > > proposal was originally discussed at 2. Once this
> > >> >> proposal
> > >> >> > is
> > >> >> > > > > > > > > > adopted,
> > >> >> > > > > > > > > > I
> > >> >> > > > > > > > > > will work on adding the necessary documentation
> to
> > >> the
> > >> >> > Arrow
> > >> >> > > > > > > > > > website
> > >> >> > > > > > > > > > along
> > >> >> > > > > > > > > > with examples etc.
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > The vote will be open for at least 72 hours.
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> > >> >> > > > > > > > > > [ ] +0
> > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > Thank you everyone!
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > --Matt
> > >> >> > > > > > > > > >
> > >> >> > >
> > >> >> > > > > > > > > > [1]:
> > >> >> > > > >
> > >> >> > >
> > >> >> > > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> > >> >> >
> > >> >>
> > >>
> > >
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Benjamin Kietzman <be...@gmail.com>.
+1

On Tue, Mar 26, 2024, 18:36 Matt Topol <zo...@gmail.com> wrote:

> Should I start a new thread for a new vote? Or repeat the original vote
> email here?
>
> Just asking since there hasn't been any responses so far.
>
> --Matt
>
> On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com>
> wrote:
>
> > Absolutely, it will be marked experimental until we see some people using
> > it and can get more real-world feedback.
> >
> > There's also already a couple things that will be followed-up on after
> the
> > initial adoption for expansion which were discussed in the comments.
> >
> > On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org> wrote:
> >
> >> I think let's try again. Would it be reasonable to declare this
> >> 'experimental' for the time being, just as we did with Flight/Flight
> >> SQL/etc?
> >>
> >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> >> > Hey All, It's been another month and we've gotten a whole bunch of
> >> feedback
> >> > and engagement on the document from a variety of individuals. Myself
> >> and a
> >> > few others have proactively attempted to reach out to as many third
> >> parties
> >> > as we could, hoping to pull more engagement also. While it would be
> >> great
> >> > to get even more feedback, the comments have slowed down and we
> haven't
> >> > gotten anything in a few days at this point.
> >> >
> >> > If there's no objections, I'd like to try to open up for voting again
> to
> >> > officially adopt this as a protocol to add to our docs.
> >> >
> >> > Thanks all!
> >> >
> >> > --Matt
> >> >
> >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com>
> wrote:
> >> >
> >> >> Agreed that it makes sense not to focus on in-place updating for this
> >> >> proposal.  I’m not even sure it’s a great fit as a “general purpose”
> >> Arrow
> >> >> protocol, because of all the assumptions and restrictions required as
> >> you
> >> >> noted.
> >> >>
> >> >> I took another look at the proposal and don’t think there’s anything
> >> >> preventing in-place updating in the future - ultimately the data body
> >> could
> >> >> just be in the same location for subsequent messages.
> >> >>
> >> >> Thanks!
> >> >> Paul
> >> >>
> >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com>
> >> wrote:
> >> >>
> >> >> > > @pgwhalen: As a potential "end user developer," (and aspiring
> >> >> > contributor) this
> >> >> > immediately excited me when I first saw it.
> >> >> >
> >> >> > Yay! Good to hear that!
> >> >> >
> >> >> > > @pgwhalen: And it wasn't clear to me whether updating batches in
> >> >> > place (and the producer/consumer coordination that comes with that)
> >> was
> >> >> > supported or encouraged as part of the proposal.
> >> >> >
> >> >> > So, updating batches in place was not a particular use-case we were
> >> >> > targeting with this approach. Instead using shared memory to
> produce
> >> and
> >> >> > consume the buffers/batches without having to physically copy the
> >> data.
> >> >> > Trying to update a batch in place is a dangerous prospect for a
> >> number of
> >> >> > reasons, but as you've mentioned it can technically be made safe if
> >> the
> >> >> > shape is staying the same and you're only modifying fixed-width
> data
> >> >> types
> >> >> > (i.e. not only is the *shape* unchanged but the sizes of the
> >> underlying
> >> >> > data buffers are also remaining unchanged). The producer/consumer
> >> >> > coordination that would be needed for updating batches in place is
> >> not
> >> >> part
> >> >> > of this proposal but is definitely something we can look into as a
> >> >> > follow-up to this for extending it. There's a number of discussions
> >> that
> >> >> > would need to be had around that so I don't want to add on another
> >> >> > complexity to this already complex proposal.
> >> >> >
> >> >> > That said, if you or anyone see something in this proposal that
> would
> >> >> > hinder or prevent being able to use it for your use case please let
> >> me
> >> >> know
> >> >> > so we can address it. Even though the proposal as it currently
> exists
> >> >> > doesn't fully support the in-place updating of batches, I don't
> want
> >> to
> >> >> > make things harder for us in such a follow-up where we'd end up
> >> requiring
> >> >> > an entirely new protocol to support that.
> >> >> >
> >> >> > > @octalene.dev: I know of a third party that is interested in
> >> Arrow for
> >> >> > HPC environments that could be interested in the proposal and I can
> >> see
> >> >> if
> >> >> > they're interested in providing feedback.
> >> >> >
> >> >> > Awesome! Thanks much!
> >> >> >
> >> >> >
> >> >> > For reference to anyone who hasn't looked at the document in a
> while,
> >> >> since
> >> >> > the original discussion thread on this I have added a full
> >> "Background
> >> >> > Context" page to the beginning of the proposal to help anyone who
> >> isn't
> >> >> > already familiar with the issues this protocol is trying to solve
> or
> >> >> isn't
> >> >> > already familiar with ucx or libfabric transports to better
> >> understand
> >> >> > *why* I'm
> >> >> > proposing this and what it is trying to solve. The point of this
> >> >> background
> >> >> > information is to help ensure that anyone who might have thoughts
> on
> >> >> > protocols in general or APIs should still be able to understand the
> >> base
> >> >> > reasons and goals that we're trying to achieve with this protocol
> >> >> proposal.
> >> >> > You don't need to already understand managing GPU/device memory or
> >> ucx to
> >> >> > be able to have meaningful input on the document.
> >> >> >
> >> >> > Thanks again to all who have contributed so far and please spread
> to
> >> any
> >> >> > contacts that you think might be interested in this for their
> >> particular
> >> >> > use cases.
> >> >> >
> >> >> > --Matt
> >> >> >
> >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <octalene.dev@pm.me.invalid
> >
> >> >> wrote:
> >> >> >
> >> >> > > I am interested in this as well, but I haven't gotten to a point
> >> where
> >> >> I
> >> >> > > can have valuable input (I haven't tried other transports). I
> know
> >> of a
> >> >> > > third party that is interested in Arrow for HPC environments that
> >> could
> >> >> > be
> >> >> > > interested in the proposal and I can see if they're interested in
> >> >> > providing
> >> >> > > feedback.
> >> >> > >
> >> >> > > I glanced at the document before but I'll go through again to see
> >> if
> >> >> > there
> >> >> > > is anything I can comment on.
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > # ------------------------------
> >> >> > > # Aldrin
> >> >> > >
> >> >> > >
> >> >> > > https://github.com/drin/
> >> >> > > https://gitlab.com/octalene
> >> >> > > https://keybase.io/octalene
> >> >> > >
> >> >> > >
> >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> >> >> > pgwhalen@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > As a potential "end user developer," (and aspiring contributor)
> >> this
> >> >> > > > immediately excited me when I first saw it.
> >> >> > > >
> >> >> > >
> >> >> > > > I work at a trading firm, and my team has developed an IPC
> >> mechanism
> >> >> > for
> >> >> > > > efficiently transmitting pandas dataframes both remotely via
> TCP
> >> and
> >> >> > > > locally via shared memory, where the interface for the
> >> application
> >> >> > > > developer is the same for both. The data in the dataframes may
> >> change
> >> >> > > > rapidly, so when communicating locally via shared memory, if
> the
> >> >> shape
> >> >> > of
> >> >> > > > the dataframe doesn't change, we update the memory in place,
> >> >> > coordinating
> >> >> > > > between the producer and consumer via TCP.
> >> >> > > >
> >> >> > >
> >> >> > > > We intend to move away from our remote TCP mechanism towards
> >> Arrow
> >> >> > > Flight,
> >> >> > > > or a lighter-weight version of Arrow IPC. For the local shared
> >> memory
> >> >> > > > mechanism which we previously did not have a good answer for,
> it
> >> >> seems
> >> >> > > like
> >> >> > > > Disassociated Arrow IPC maps quite well to our problem.
> >> >> > > >
> >> >> > >
> >> >> > > > So some features that enable our use case are:
> >> >> > > > - Updating existing batches in place is supported
> >> >> > > > - The interface is pretty similar to Flight
> >> >> > > >
> >> >> > >
> >> >> > > > I'd imagine we're not the only financial firm to implement
> >> something
> >> >> > like
> >> >> > > > this, given how widespread pandas usage is, so that could be a
> >> place
> >> >> to
> >> >> > > > seek feedback.
> >> >> > > >
> >> >> > >
> >> >> > > > As I was reading the proposal initially, I gleaned that the
> most
> >> >> > > important
> >> >> > > > audience was those writing interfaces to GPUs/remote
> >> >> > memory/non-standard
> >> >> > > > transports/etc. And it wasn't clear to me whether updating
> >> batches in
> >> >> > > > place (and the producer/consumer coordination that comes with
> >> that)
> >> >> was
> >> >> > > > supported or encouraged as part of the proposal. But
> regardless,
> >> as
> >> >> an
> >> >> > > end
> >> >> > > > user, this seems like an easier and more efficient way to glue
> >> pieces
> >> >> > in
> >> >> > > > the Arrow ecosystem together if it was adopted broadly.
> >> >> > > >
> >> >> > >
> >> >> > > > Paul
> >> >> > > >
> >> >> > >
> >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
> >> zotthewizard@gmail.com
> >> >> > wrote:
> >> >> > > >
> >> >> > >
> >> >> > > > > I'll continue my efforts of trying to reach out to other
> >> interested
> >> >> > > > > parties, but if anyone else here has any contacts or
> >> connections
> >> >> that
> >> >> > > they
> >> >> > > > > think might be interested please forward them the link to the
> >> >> Google
> >> >> > > doc.
> >> >> > > > >
> >> >> > >
> >> >> > > > > I really do want to get as much engagement and feedback as
> >> possible
> >> >> > on
> >> >> > > > > this.
> >> >> > > > >
> >> >> > >
> >> >> > > > > Thanks!
> >> >> > > > >
> >> >> > >
> >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney
> wesmckinn@gmail.com
> >> >> > wrote:
> >> >> > > > >
> >> >> > >
> >> >> > > > > > Have there been efforts to proactively reach out to other
> >> third
> >> >> > > parties
> >> >> > > > > > that might have an interest in this or be a potential user
> at
> >> >> some
> >> >> > > point?
> >> >> > > > > > There are a lot of interested parties in Arrow that may not
> >> >> > actively
> >> >> > > > > > follow
> >> >> > > > > > the mailing list.
> >> >> > > > > >
> >> >> > >
> >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially
> >> folks at
> >> >> > > NVIDIA
> >> >> > > > > > or
> >> >> > > > > > working on UCX), or other communities like that might have
> >> >> > > constructive
> >> >> > > > > > thoughts about this. DLPack (
> >> >> https://dmlc.github.io/dlpack/latest/
> >> >> > )
> >> >> > > also
> >> >> > > > > > seems adjacent and worth reaching out to. Other ideas for
> >> >> projects
> >> >> > or
> >> >> > > > > > companies that could be reached out to for feedback.
> >> >> > > > > >
> >> >> > >
> >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> >> >> antoine@python.org
> >> >> > > > > > wrote:
> >> >> > > > > >
> >> >> > >
> >> >> > > > > > > If there's no engagement, then I'm afraid it might mean
> >> that
> >> >> > third
> >> >> > > > > > > parties have no interest in this. I don't really have any
> >> >> > solution
> >> >> > > for
> >> >> > > > > > > generating engagement except nagging and pinging people
> >> >> > explicitly
> >> >> > > :-)
> >> >> > > > > > >
> >> >> > >
> >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> >> >> > > > > > >
> >> >> > >
> >> >> > > > > > > > I would like to see the same Antoine, currently given
> the
> >> >> lack
> >> >> > of
> >> >> > > > > > > > engagement (both for OR against) I was going to take
> the
> >> >> > silence
> >> >> > > as
> >> >> > > > > > > > assent
> >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote in
> >> this.
> >> >> > > > > > > >
> >> >> > >
> >> >> > > > > > > > If anyone has any suggestions on how we could
> potentially
> >> >> > > generate
> >> >> > > > > > > > more
> >> >> > > > > > > > engagement and discussion on this, please let me know
> as
> >> I
> >> >> want
> >> >> > > as
> >> >> > > > > > > > many
> >> >> > > > > > > > parties in the community as possible to be part of
> this.
> >> >> > > > > > > >
> >> >> > >
> >> >> > > > > > > > Thanks everyone.
> >> >> > > > > > > >
> >> >> > >
> >> >> > > > > > > > --Matt
> >> >> > > > > > > >
> >> >> > >
> >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> >> >> > > antoine@python.org
> >> >> > > > > > > > wrote:
> >> >> > > > > > > >
> >> >> > >
> >> >> > > > > > > > > Hello,
> >> >> > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > I'd really like to see more engagement and criticism
> >> from
> >> >> > > > > > > > > non-Voltron
> >> >> > > > > > > > > Data parties before this is formally adopted as an
> >> Arrow
> >> >> > spec.
> >> >> > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > Regards
> >> >> > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > Antoine.
> >> >> > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> >> >> > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > Hey all,
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > I'd like to propose a vote for us to officially
> >> adopt the
> >> >> > > protocol
> >> >> > > > > > > > > > described in the google doc[1] for Dissociated
> Arrow
> >> IPC
> >> >> > > > > > > > > > Transports.
> >> >> > > > > > > > > > This
> >> >> > > > > > > > > > proposal was originally discussed at 2. Once this
> >> >> proposal
> >> >> > is
> >> >> > > > > > > > > > adopted,
> >> >> > > > > > > > > > I
> >> >> > > > > > > > > > will work on adding the necessary documentation to
> >> the
> >> >> > Arrow
> >> >> > > > > > > > > > website
> >> >> > > > > > > > > > along
> >> >> > > > > > > > > > with examples etc.
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > The vote will be open for at least 72 hours.
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > [ ] +1 Accept this Proposal
> >> >> > > > > > > > > > [ ] +0
> >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > Thank you everyone!
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > --Matt
> >> >> > > > > > > > > >
> >> >> > >
> >> >> > > > > > > > > > [1]:
> >> >> > > > >
> >> >> > >
> >> >> > > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >> >> >
> >> >>
> >>
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
Should I start a new thread for a new vote? Or repeat the original vote
email here?

Just asking since there hasn't been any responses so far.

--Matt

On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zo...@gmail.com> wrote:

> Absolutely, it will be marked experimental until we see some people using
> it and can get more real-world feedback.
>
> There's also already a couple things that will be followed-up on after the
> initial adoption for expansion which were discussed in the comments.
>
> On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org> wrote:
>
>> I think let's try again. Would it be reasonable to declare this
>> 'experimental' for the time being, just as we did with Flight/Flight
>> SQL/etc?
>>
>> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
>> > Hey All, It's been another month and we've gotten a whole bunch of
>> feedback
>> > and engagement on the document from a variety of individuals. Myself
>> and a
>> > few others have proactively attempted to reach out to as many third
>> parties
>> > as we could, hoping to pull more engagement also. While it would be
>> great
>> > to get even more feedback, the comments have slowed down and we haven't
>> > gotten anything in a few days at this point.
>> >
>> > If there's no objections, I'd like to try to open up for voting again to
>> > officially adopt this as a protocol to add to our docs.
>> >
>> > Thanks all!
>> >
>> > --Matt
>> >
>> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com> wrote:
>> >
>> >> Agreed that it makes sense not to focus on in-place updating for this
>> >> proposal.  I’m not even sure it’s a great fit as a “general purpose”
>> Arrow
>> >> protocol, because of all the assumptions and restrictions required as
>> you
>> >> noted.
>> >>
>> >> I took another look at the proposal and don’t think there’s anything
>> >> preventing in-place updating in the future - ultimately the data body
>> could
>> >> just be in the same location for subsequent messages.
>> >>
>> >> Thanks!
>> >> Paul
>> >>
>> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com>
>> wrote:
>> >>
>> >> > > @pgwhalen: As a potential "end user developer," (and aspiring
>> >> > contributor) this
>> >> > immediately excited me when I first saw it.
>> >> >
>> >> > Yay! Good to hear that!
>> >> >
>> >> > > @pgwhalen: And it wasn't clear to me whether updating batches in
>> >> > place (and the producer/consumer coordination that comes with that)
>> was
>> >> > supported or encouraged as part of the proposal.
>> >> >
>> >> > So, updating batches in place was not a particular use-case we were
>> >> > targeting with this approach. Instead using shared memory to produce
>> and
>> >> > consume the buffers/batches without having to physically copy the
>> data.
>> >> > Trying to update a batch in place is a dangerous prospect for a
>> number of
>> >> > reasons, but as you've mentioned it can technically be made safe if
>> the
>> >> > shape is staying the same and you're only modifying fixed-width data
>> >> types
>> >> > (i.e. not only is the *shape* unchanged but the sizes of the
>> underlying
>> >> > data buffers are also remaining unchanged). The producer/consumer
>> >> > coordination that would be needed for updating batches in place is
>> not
>> >> part
>> >> > of this proposal but is definitely something we can look into as a
>> >> > follow-up to this for extending it. There's a number of discussions
>> that
>> >> > would need to be had around that so I don't want to add on another
>> >> > complexity to this already complex proposal.
>> >> >
>> >> > That said, if you or anyone see something in this proposal that would
>> >> > hinder or prevent being able to use it for your use case please let
>> me
>> >> know
>> >> > so we can address it. Even though the proposal as it currently exists
>> >> > doesn't fully support the in-place updating of batches, I don't want
>> to
>> >> > make things harder for us in such a follow-up where we'd end up
>> requiring
>> >> > an entirely new protocol to support that.
>> >> >
>> >> > > @octalene.dev: I know of a third party that is interested in
>> Arrow for
>> >> > HPC environments that could be interested in the proposal and I can
>> see
>> >> if
>> >> > they're interested in providing feedback.
>> >> >
>> >> > Awesome! Thanks much!
>> >> >
>> >> >
>> >> > For reference to anyone who hasn't looked at the document in a while,
>> >> since
>> >> > the original discussion thread on this I have added a full
>> "Background
>> >> > Context" page to the beginning of the proposal to help anyone who
>> isn't
>> >> > already familiar with the issues this protocol is trying to solve or
>> >> isn't
>> >> > already familiar with ucx or libfabric transports to better
>> understand
>> >> > *why* I'm
>> >> > proposing this and what it is trying to solve. The point of this
>> >> background
>> >> > information is to help ensure that anyone who might have thoughts on
>> >> > protocols in general or APIs should still be able to understand the
>> base
>> >> > reasons and goals that we're trying to achieve with this protocol
>> >> proposal.
>> >> > You don't need to already understand managing GPU/device memory or
>> ucx to
>> >> > be able to have meaningful input on the document.
>> >> >
>> >> > Thanks again to all who have contributed so far and please spread to
>> any
>> >> > contacts that you think might be interested in this for their
>> particular
>> >> > use cases.
>> >> >
>> >> > --Matt
>> >> >
>> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid>
>> >> wrote:
>> >> >
>> >> > > I am interested in this as well, but I haven't gotten to a point
>> where
>> >> I
>> >> > > can have valuable input (I haven't tried other transports). I know
>> of a
>> >> > > third party that is interested in Arrow for HPC environments that
>> could
>> >> > be
>> >> > > interested in the proposal and I can see if they're interested in
>> >> > providing
>> >> > > feedback.
>> >> > >
>> >> > > I glanced at the document before but I'll go through again to see
>> if
>> >> > there
>> >> > > is anything I can comment on.
>> >> > >
>> >> > >
>> >> > >
>> >> > > # ------------------------------
>> >> > > # Aldrin
>> >> > >
>> >> > >
>> >> > > https://github.com/drin/
>> >> > > https://gitlab.com/octalene
>> >> > > https://keybase.io/octalene
>> >> > >
>> >> > >
>> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
>> >> > pgwhalen@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > > > As a potential "end user developer," (and aspiring contributor)
>> this
>> >> > > > immediately excited me when I first saw it.
>> >> > > >
>> >> > >
>> >> > > > I work at a trading firm, and my team has developed an IPC
>> mechanism
>> >> > for
>> >> > > > efficiently transmitting pandas dataframes both remotely via TCP
>> and
>> >> > > > locally via shared memory, where the interface for the
>> application
>> >> > > > developer is the same for both. The data in the dataframes may
>> change
>> >> > > > rapidly, so when communicating locally via shared memory, if the
>> >> shape
>> >> > of
>> >> > > > the dataframe doesn't change, we update the memory in place,
>> >> > coordinating
>> >> > > > between the producer and consumer via TCP.
>> >> > > >
>> >> > >
>> >> > > > We intend to move away from our remote TCP mechanism towards
>> Arrow
>> >> > > Flight,
>> >> > > > or a lighter-weight version of Arrow IPC. For the local shared
>> memory
>> >> > > > mechanism which we previously did not have a good answer for, it
>> >> seems
>> >> > > like
>> >> > > > Disassociated Arrow IPC maps quite well to our problem.
>> >> > > >
>> >> > >
>> >> > > > So some features that enable our use case are:
>> >> > > > - Updating existing batches in place is supported
>> >> > > > - The interface is pretty similar to Flight
>> >> > > >
>> >> > >
>> >> > > > I'd imagine we're not the only financial firm to implement
>> something
>> >> > like
>> >> > > > this, given how widespread pandas usage is, so that could be a
>> place
>> >> to
>> >> > > > seek feedback.
>> >> > > >
>> >> > >
>> >> > > > As I was reading the proposal initially, I gleaned that the most
>> >> > > important
>> >> > > > audience was those writing interfaces to GPUs/remote
>> >> > memory/non-standard
>> >> > > > transports/etc. And it wasn't clear to me whether updating
>> batches in
>> >> > > > place (and the producer/consumer coordination that comes with
>> that)
>> >> was
>> >> > > > supported or encouraged as part of the proposal. But regardless,
>> as
>> >> an
>> >> > > end
>> >> > > > user, this seems like an easier and more efficient way to glue
>> pieces
>> >> > in
>> >> > > > the Arrow ecosystem together if it was adopted broadly.
>> >> > > >
>> >> > >
>> >> > > > Paul
>> >> > > >
>> >> > >
>> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol
>> zotthewizard@gmail.com
>> >> > wrote:
>> >> > > >
>> >> > >
>> >> > > > > I'll continue my efforts of trying to reach out to other
>> interested
>> >> > > > > parties, but if anyone else here has any contacts or
>> connections
>> >> that
>> >> > > they
>> >> > > > > think might be interested please forward them the link to the
>> >> Google
>> >> > > doc.
>> >> > > > >
>> >> > >
>> >> > > > > I really do want to get as much engagement and feedback as
>> possible
>> >> > on
>> >> > > > > this.
>> >> > > > >
>> >> > >
>> >> > > > > Thanks!
>> >> > > > >
>> >> > >
>> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com
>> >> > wrote:
>> >> > > > >
>> >> > >
>> >> > > > > > Have there been efforts to proactively reach out to other
>> third
>> >> > > parties
>> >> > > > > > that might have an interest in this or be a potential user at
>> >> some
>> >> > > point?
>> >> > > > > > There are a lot of interested parties in Arrow that may not
>> >> > actively
>> >> > > > > > follow
>> >> > > > > > the mailing list.
>> >> > > > > >
>> >> > >
>> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially
>> folks at
>> >> > > NVIDIA
>> >> > > > > > or
>> >> > > > > > working on UCX), or other communities like that might have
>> >> > > constructive
>> >> > > > > > thoughts about this. DLPack (
>> >> https://dmlc.github.io/dlpack/latest/
>> >> > )
>> >> > > also
>> >> > > > > > seems adjacent and worth reaching out to. Other ideas for
>> >> projects
>> >> > or
>> >> > > > > > companies that could be reached out to for feedback.
>> >> > > > > >
>> >> > >
>> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
>> >> antoine@python.org
>> >> > > > > > wrote:
>> >> > > > > >
>> >> > >
>> >> > > > > > > If there's no engagement, then I'm afraid it might mean
>> that
>> >> > third
>> >> > > > > > > parties have no interest in this. I don't really have any
>> >> > solution
>> >> > > for
>> >> > > > > > > generating engagement except nagging and pinging people
>> >> > explicitly
>> >> > > :-)
>> >> > > > > > >
>> >> > >
>> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
>> >> > > > > > >
>> >> > >
>> >> > > > > > > > I would like to see the same Antoine, currently given the
>> >> lack
>> >> > of
>> >> > > > > > > > engagement (both for OR against) I was going to take the
>> >> > silence
>> >> > > as
>> >> > > > > > > > assent
>> >> > > > > > > > and hope for non-Voltron Data PMC members to vote in
>> this.
>> >> > > > > > > >
>> >> > >
>> >> > > > > > > > If anyone has any suggestions on how we could potentially
>> >> > > generate
>> >> > > > > > > > more
>> >> > > > > > > > engagement and discussion on this, please let me know as
>> I
>> >> want
>> >> > > as
>> >> > > > > > > > many
>> >> > > > > > > > parties in the community as possible to be part of this.
>> >> > > > > > > >
>> >> > >
>> >> > > > > > > > Thanks everyone.
>> >> > > > > > > >
>> >> > >
>> >> > > > > > > > --Matt
>> >> > > > > > > >
>> >> > >
>> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
>> >> > > antoine@python.org
>> >> > > > > > > > wrote:
>> >> > > > > > > >
>> >> > >
>> >> > > > > > > > > Hello,
>> >> > > > > > > > >
>> >> > >
>> >> > > > > > > > > I'd really like to see more engagement and criticism
>> from
>> >> > > > > > > > > non-Voltron
>> >> > > > > > > > > Data parties before this is formally adopted as an
>> Arrow
>> >> > spec.
>> >> > > > > > > > >
>> >> > >
>> >> > > > > > > > > Regards
>> >> > > > > > > > >
>> >> > >
>> >> > > > > > > > > Antoine.
>> >> > > > > > > > >
>> >> > >
>> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
>> >> > > > > > > > >
>> >> > >
>> >> > > > > > > > > > Hey all,
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > I'd like to propose a vote for us to officially
>> adopt the
>> >> > > protocol
>> >> > > > > > > > > > described in the google doc[1] for Dissociated Arrow
>> IPC
>> >> > > > > > > > > > Transports.
>> >> > > > > > > > > > This
>> >> > > > > > > > > > proposal was originally discussed at 2. Once this
>> >> proposal
>> >> > is
>> >> > > > > > > > > > adopted,
>> >> > > > > > > > > > I
>> >> > > > > > > > > > will work on adding the necessary documentation to
>> the
>> >> > Arrow
>> >> > > > > > > > > > website
>> >> > > > > > > > > > along
>> >> > > > > > > > > > with examples etc.
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > The vote will be open for at least 72 hours.
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > [ ] +1 Accept this Proposal
>> >> > > > > > > > > > [ ] +0
>> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > Thank you everyone!
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > --Matt
>> >> > > > > > > > > >
>> >> > >
>> >> > > > > > > > > > [1]:
>> >> > > > >
>> >> > >
>> >> > > > >
>> >> > >
>> >> >
>> >>
>> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>> >> >
>> >>
>>
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
Absolutely, it will be marked experimental until we see some people using
it and can get more real-world feedback.

There's also already a couple things that will be followed-up on after the
initial adoption for expansion which were discussed in the comments.

On Thu, Mar 21, 2024, 11:42 AM David Li <li...@apache.org> wrote:

> I think let's try again. Would it be reasonable to declare this
> 'experimental' for the time being, just as we did with Flight/Flight
> SQL/etc?
>
> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> > Hey All, It's been another month and we've gotten a whole bunch of
> feedback
> > and engagement on the document from a variety of individuals. Myself and
> a
> > few others have proactively attempted to reach out to as many third
> parties
> > as we could, hoping to pull more engagement also. While it would be great
> > to get even more feedback, the comments have slowed down and we haven't
> > gotten anything in a few days at this point.
> >
> > If there's no objections, I'd like to try to open up for voting again to
> > officially adopt this as a protocol to add to our docs.
> >
> > Thanks all!
> >
> > --Matt
> >
> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com> wrote:
> >
> >> Agreed that it makes sense not to focus on in-place updating for this
> >> proposal.  I’m not even sure it’s a great fit as a “general purpose”
> Arrow
> >> protocol, because of all the assumptions and restrictions required as
> you
> >> noted.
> >>
> >> I took another look at the proposal and don’t think there’s anything
> >> preventing in-place updating in the future - ultimately the data body
> could
> >> just be in the same location for subsequent messages.
> >>
> >> Thanks!
> >> Paul
> >>
> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com>
> wrote:
> >>
> >> > > @pgwhalen: As a potential "end user developer," (and aspiring
> >> > contributor) this
> >> > immediately excited me when I first saw it.
> >> >
> >> > Yay! Good to hear that!
> >> >
> >> > > @pgwhalen: And it wasn't clear to me whether updating batches in
> >> > place (and the producer/consumer coordination that comes with that)
> was
> >> > supported or encouraged as part of the proposal.
> >> >
> >> > So, updating batches in place was not a particular use-case we were
> >> > targeting with this approach. Instead using shared memory to produce
> and
> >> > consume the buffers/batches without having to physically copy the
> data.
> >> > Trying to update a batch in place is a dangerous prospect for a
> number of
> >> > reasons, but as you've mentioned it can technically be made safe if
> the
> >> > shape is staying the same and you're only modifying fixed-width data
> >> types
> >> > (i.e. not only is the *shape* unchanged but the sizes of the
> underlying
> >> > data buffers are also remaining unchanged). The producer/consumer
> >> > coordination that would be needed for updating batches in place is not
> >> part
> >> > of this proposal but is definitely something we can look into as a
> >> > follow-up to this for extending it. There's a number of discussions
> that
> >> > would need to be had around that so I don't want to add on another
> >> > complexity to this already complex proposal.
> >> >
> >> > That said, if you or anyone see something in this proposal that would
> >> > hinder or prevent being able to use it for your use case please let me
> >> know
> >> > so we can address it. Even though the proposal as it currently exists
> >> > doesn't fully support the in-place updating of batches, I don't want
> to
> >> > make things harder for us in such a follow-up where we'd end up
> requiring
> >> > an entirely new protocol to support that.
> >> >
> >> > > @octalene.dev: I know of a third party that is interested in Arrow
> for
> >> > HPC environments that could be interested in the proposal and I can
> see
> >> if
> >> > they're interested in providing feedback.
> >> >
> >> > Awesome! Thanks much!
> >> >
> >> >
> >> > For reference to anyone who hasn't looked at the document in a while,
> >> since
> >> > the original discussion thread on this I have added a full "Background
> >> > Context" page to the beginning of the proposal to help anyone who
> isn't
> >> > already familiar with the issues this protocol is trying to solve or
> >> isn't
> >> > already familiar with ucx or libfabric transports to better understand
> >> > *why* I'm
> >> > proposing this and what it is trying to solve. The point of this
> >> background
> >> > information is to help ensure that anyone who might have thoughts on
> >> > protocols in general or APIs should still be able to understand the
> base
> >> > reasons and goals that we're trying to achieve with this protocol
> >> proposal.
> >> > You don't need to already understand managing GPU/device memory or
> ucx to
> >> > be able to have meaningful input on the document.
> >> >
> >> > Thanks again to all who have contributed so far and please spread to
> any
> >> > contacts that you think might be interested in this for their
> particular
> >> > use cases.
> >> >
> >> > --Matt
> >> >
> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid>
> >> wrote:
> >> >
> >> > > I am interested in this as well, but I haven't gotten to a point
> where
> >> I
> >> > > can have valuable input (I haven't tried other transports). I know
> of a
> >> > > third party that is interested in Arrow for HPC environments that
> could
> >> > be
> >> > > interested in the proposal and I can see if they're interested in
> >> > providing
> >> > > feedback.
> >> > >
> >> > > I glanced at the document before but I'll go through again to see if
> >> > there
> >> > > is anything I can comment on.
> >> > >
> >> > >
> >> > >
> >> > > # ------------------------------
> >> > > # Aldrin
> >> > >
> >> > >
> >> > > https://github.com/drin/
> >> > > https://gitlab.com/octalene
> >> > > https://keybase.io/octalene
> >> > >
> >> > >
> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> >> > pgwhalen@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > As a potential "end user developer," (and aspiring contributor)
> this
> >> > > > immediately excited me when I first saw it.
> >> > > >
> >> > >
> >> > > > I work at a trading firm, and my team has developed an IPC
> mechanism
> >> > for
> >> > > > efficiently transmitting pandas dataframes both remotely via TCP
> and
> >> > > > locally via shared memory, where the interface for the application
> >> > > > developer is the same for both. The data in the dataframes may
> change
> >> > > > rapidly, so when communicating locally via shared memory, if the
> >> shape
> >> > of
> >> > > > the dataframe doesn't change, we update the memory in place,
> >> > coordinating
> >> > > > between the producer and consumer via TCP.
> >> > > >
> >> > >
> >> > > > We intend to move away from our remote TCP mechanism towards Arrow
> >> > > Flight,
> >> > > > or a lighter-weight version of Arrow IPC. For the local shared
> memory
> >> > > > mechanism which we previously did not have a good answer for, it
> >> seems
> >> > > like
> >> > > > Disassociated Arrow IPC maps quite well to our problem.
> >> > > >
> >> > >
> >> > > > So some features that enable our use case are:
> >> > > > - Updating existing batches in place is supported
> >> > > > - The interface is pretty similar to Flight
> >> > > >
> >> > >
> >> > > > I'd imagine we're not the only financial firm to implement
> something
> >> > like
> >> > > > this, given how widespread pandas usage is, so that could be a
> place
> >> to
> >> > > > seek feedback.
> >> > > >
> >> > >
> >> > > > As I was reading the proposal initially, I gleaned that the most
> >> > > important
> >> > > > audience was those writing interfaces to GPUs/remote
> >> > memory/non-standard
> >> > > > transports/etc. And it wasn't clear to me whether updating
> batches in
> >> > > > place (and the producer/consumer coordination that comes with
> that)
> >> was
> >> > > > supported or encouraged as part of the proposal. But regardless,
> as
> >> an
> >> > > end
> >> > > > user, this seems like an easier and more efficient way to glue
> pieces
> >> > in
> >> > > > the Arrow ecosystem together if it was adopted broadly.
> >> > > >
> >> > >
> >> > > > Paul
> >> > > >
> >> > >
> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewizard@gmail.com
> >> > wrote:
> >> > > >
> >> > >
> >> > > > > I'll continue my efforts of trying to reach out to other
> interested
> >> > > > > parties, but if anyone else here has any contacts or connections
> >> that
> >> > > they
> >> > > > > think might be interested please forward them the link to the
> >> Google
> >> > > doc.
> >> > > > >
> >> > >
> >> > > > > I really do want to get as much engagement and feedback as
> possible
> >> > on
> >> > > > > this.
> >> > > > >
> >> > >
> >> > > > > Thanks!
> >> > > > >
> >> > >
> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com
> >> > wrote:
> >> > > > >
> >> > >
> >> > > > > > Have there been efforts to proactively reach out to other
> third
> >> > > parties
> >> > > > > > that might have an interest in this or be a potential user at
> >> some
> >> > > point?
> >> > > > > > There are a lot of interested parties in Arrow that may not
> >> > actively
> >> > > > > > follow
> >> > > > > > the mailing list.
> >> > > > > >
> >> > >
> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks
> at
> >> > > NVIDIA
> >> > > > > > or
> >> > > > > > working on UCX), or other communities like that might have
> >> > > constructive
> >> > > > > > thoughts about this. DLPack (
> >> https://dmlc.github.io/dlpack/latest/
> >> > )
> >> > > also
> >> > > > > > seems adjacent and worth reaching out to. Other ideas for
> >> projects
> >> > or
> >> > > > > > companies that could be reached out to for feedback.
> >> > > > > >
> >> > >
> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> >> antoine@python.org
> >> > > > > > wrote:
> >> > > > > >
> >> > >
> >> > > > > > > If there's no engagement, then I'm afraid it might mean that
> >> > third
> >> > > > > > > parties have no interest in this. I don't really have any
> >> > solution
> >> > > for
> >> > > > > > > generating engagement except nagging and pinging people
> >> > explicitly
> >> > > :-)
> >> > > > > > >
> >> > >
> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> >> > > > > > >
> >> > >
> >> > > > > > > > I would like to see the same Antoine, currently given the
> >> lack
> >> > of
> >> > > > > > > > engagement (both for OR against) I was going to take the
> >> > silence
> >> > > as
> >> > > > > > > > assent
> >> > > > > > > > and hope for non-Voltron Data PMC members to vote in this.
> >> > > > > > > >
> >> > >
> >> > > > > > > > If anyone has any suggestions on how we could potentially
> >> > > generate
> >> > > > > > > > more
> >> > > > > > > > engagement and discussion on this, please let me know as I
> >> want
> >> > > as
> >> > > > > > > > many
> >> > > > > > > > parties in the community as possible to be part of this.
> >> > > > > > > >
> >> > >
> >> > > > > > > > Thanks everyone.
> >> > > > > > > >
> >> > >
> >> > > > > > > > --Matt
> >> > > > > > > >
> >> > >
> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> >> > > antoine@python.org
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > >
> >> > > > > > > > > Hello,
> >> > > > > > > > >
> >> > >
> >> > > > > > > > > I'd really like to see more engagement and criticism
> from
> >> > > > > > > > > non-Voltron
> >> > > > > > > > > Data parties before this is formally adopted as an Arrow
> >> > spec.
> >> > > > > > > > >
> >> > >
> >> > > > > > > > > Regards
> >> > > > > > > > >
> >> > >
> >> > > > > > > > > Antoine.
> >> > > > > > > > >
> >> > >
> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> >> > > > > > > > >
> >> > >
> >> > > > > > > > > > Hey all,
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > I'd like to propose a vote for us to officially adopt
> the
> >> > > protocol
> >> > > > > > > > > > described in the google doc[1] for Dissociated Arrow
> IPC
> >> > > > > > > > > > Transports.
> >> > > > > > > > > > This
> >> > > > > > > > > > proposal was originally discussed at 2. Once this
> >> proposal
> >> > is
> >> > > > > > > > > > adopted,
> >> > > > > > > > > > I
> >> > > > > > > > > > will work on adding the necessary documentation to the
> >> > Arrow
> >> > > > > > > > > > website
> >> > > > > > > > > > along
> >> > > > > > > > > > with examples etc.
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > The vote will be open for at least 72 hours.
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > [ ] +1 Accept this Proposal
> >> > > > > > > > > > [ ] +0
> >> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > Thank you everyone!
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > --Matt
> >> > > > > > > > > >
> >> > >
> >> > > > > > > > > > [1]:
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >> >
> >>
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by David Li <li...@apache.org>.
I think let's try again. Would it be reasonable to declare this 'experimental' for the time being, just as we did with Flight/Flight SQL/etc?

On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> Hey All, It's been another month and we've gotten a whole bunch of feedback
> and engagement on the document from a variety of individuals. Myself and a
> few others have proactively attempted to reach out to as many third parties
> as we could, hoping to pull more engagement also. While it would be great
> to get even more feedback, the comments have slowed down and we haven't
> gotten anything in a few days at this point.
>
> If there's no objections, I'd like to try to open up for voting again to
> officially adopt this as a protocol to add to our docs.
>
> Thanks all!
>
> --Matt
>
> On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com> wrote:
>
>> Agreed that it makes sense not to focus on in-place updating for this
>> proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
>> protocol, because of all the assumptions and restrictions required as you
>> noted.
>>
>> I took another look at the proposal and don’t think there’s anything
>> preventing in-place updating in the future - ultimately the data body could
>> just be in the same location for subsequent messages.
>>
>> Thanks!
>> Paul
>>
>> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com> wrote:
>>
>> > > @pgwhalen: As a potential "end user developer," (and aspiring
>> > contributor) this
>> > immediately excited me when I first saw it.
>> >
>> > Yay! Good to hear that!
>> >
>> > > @pgwhalen: And it wasn't clear to me whether updating batches in
>> > place (and the producer/consumer coordination that comes with that) was
>> > supported or encouraged as part of the proposal.
>> >
>> > So, updating batches in place was not a particular use-case we were
>> > targeting with this approach. Instead using shared memory to produce and
>> > consume the buffers/batches without having to physically copy the data.
>> > Trying to update a batch in place is a dangerous prospect for a number of
>> > reasons, but as you've mentioned it can technically be made safe if the
>> > shape is staying the same and you're only modifying fixed-width data
>> types
>> > (i.e. not only is the *shape* unchanged but the sizes of the underlying
>> > data buffers are also remaining unchanged). The producer/consumer
>> > coordination that would be needed for updating batches in place is not
>> part
>> > of this proposal but is definitely something we can look into as a
>> > follow-up to this for extending it. There's a number of discussions that
>> > would need to be had around that so I don't want to add on another
>> > complexity to this already complex proposal.
>> >
>> > That said, if you or anyone see something in this proposal that would
>> > hinder or prevent being able to use it for your use case please let me
>> know
>> > so we can address it. Even though the proposal as it currently exists
>> > doesn't fully support the in-place updating of batches, I don't want to
>> > make things harder for us in such a follow-up where we'd end up requiring
>> > an entirely new protocol to support that.
>> >
>> > > @octalene.dev: I know of a third party that is interested in Arrow for
>> > HPC environments that could be interested in the proposal and I can see
>> if
>> > they're interested in providing feedback.
>> >
>> > Awesome! Thanks much!
>> >
>> >
>> > For reference to anyone who hasn't looked at the document in a while,
>> since
>> > the original discussion thread on this I have added a full "Background
>> > Context" page to the beginning of the proposal to help anyone who isn't
>> > already familiar with the issues this protocol is trying to solve or
>> isn't
>> > already familiar with ucx or libfabric transports to better understand
>> > *why* I'm
>> > proposing this and what it is trying to solve. The point of this
>> background
>> > information is to help ensure that anyone who might have thoughts on
>> > protocols in general or APIs should still be able to understand the base
>> > reasons and goals that we're trying to achieve with this protocol
>> proposal.
>> > You don't need to already understand managing GPU/device memory or ucx to
>> > be able to have meaningful input on the document.
>> >
>> > Thanks again to all who have contributed so far and please spread to any
>> > contacts that you think might be interested in this for their particular
>> > use cases.
>> >
>> > --Matt
>> >
>> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid>
>> wrote:
>> >
>> > > I am interested in this as well, but I haven't gotten to a point where
>> I
>> > > can have valuable input (I haven't tried other transports). I know of a
>> > > third party that is interested in Arrow for HPC environments that could
>> > be
>> > > interested in the proposal and I can see if they're interested in
>> > providing
>> > > feedback.
>> > >
>> > > I glanced at the document before but I'll go through again to see if
>> > there
>> > > is anything I can comment on.
>> > >
>> > >
>> > >
>> > > # ------------------------------
>> > > # Aldrin
>> > >
>> > >
>> > > https://github.com/drin/
>> > > https://gitlab.com/octalene
>> > > https://keybase.io/octalene
>> > >
>> > >
>> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
>> > pgwhalen@gmail.com>
>> > > wrote:
>> > >
>> > > > As a potential "end user developer," (and aspiring contributor) this
>> > > > immediately excited me when I first saw it.
>> > > >
>> > >
>> > > > I work at a trading firm, and my team has developed an IPC mechanism
>> > for
>> > > > efficiently transmitting pandas dataframes both remotely via TCP and
>> > > > locally via shared memory, where the interface for the application
>> > > > developer is the same for both. The data in the dataframes may change
>> > > > rapidly, so when communicating locally via shared memory, if the
>> shape
>> > of
>> > > > the dataframe doesn't change, we update the memory in place,
>> > coordinating
>> > > > between the producer and consumer via TCP.
>> > > >
>> > >
>> > > > We intend to move away from our remote TCP mechanism towards Arrow
>> > > Flight,
>> > > > or a lighter-weight version of Arrow IPC. For the local shared memory
>> > > > mechanism which we previously did not have a good answer for, it
>> seems
>> > > like
>> > > > Disassociated Arrow IPC maps quite well to our problem.
>> > > >
>> > >
>> > > > So some features that enable our use case are:
>> > > > - Updating existing batches in place is supported
>> > > > - The interface is pretty similar to Flight
>> > > >
>> > >
>> > > > I'd imagine we're not the only financial firm to implement something
>> > like
>> > > > this, given how widespread pandas usage is, so that could be a place
>> to
>> > > > seek feedback.
>> > > >
>> > >
>> > > > As I was reading the proposal initially, I gleaned that the most
>> > > important
>> > > > audience was those writing interfaces to GPUs/remote
>> > memory/non-standard
>> > > > transports/etc. And it wasn't clear to me whether updating batches in
>> > > > place (and the producer/consumer coordination that comes with that)
>> was
>> > > > supported or encouraged as part of the proposal. But regardless, as
>> an
>> > > end
>> > > > user, this seems like an easier and more efficient way to glue pieces
>> > in
>> > > > the Arrow ecosystem together if it was adopted broadly.
>> > > >
>> > >
>> > > > Paul
>> > > >
>> > >
>> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewizard@gmail.com
>> > wrote:
>> > > >
>> > >
>> > > > > I'll continue my efforts of trying to reach out to other interested
>> > > > > parties, but if anyone else here has any contacts or connections
>> that
>> > > they
>> > > > > think might be interested please forward them the link to the
>> Google
>> > > doc.
>> > > > >
>> > >
>> > > > > I really do want to get as much engagement and feedback as possible
>> > on
>> > > > > this.
>> > > > >
>> > >
>> > > > > Thanks!
>> > > > >
>> > >
>> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com
>> > wrote:
>> > > > >
>> > >
>> > > > > > Have there been efforts to proactively reach out to other third
>> > > parties
>> > > > > > that might have an interest in this or be a potential user at
>> some
>> > > point?
>> > > > > > There are a lot of interested parties in Arrow that may not
>> > actively
>> > > > > > follow
>> > > > > > the mailing list.
>> > > > > >
>> > >
>> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
>> > > NVIDIA
>> > > > > > or
>> > > > > > working on UCX), or other communities like that might have
>> > > constructive
>> > > > > > thoughts about this. DLPack (
>> https://dmlc.github.io/dlpack/latest/
>> > )
>> > > also
>> > > > > > seems adjacent and worth reaching out to. Other ideas for
>> projects
>> > or
>> > > > > > companies that could be reached out to for feedback.
>> > > > > >
>> > >
>> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
>> antoine@python.org
>> > > > > > wrote:
>> > > > > >
>> > >
>> > > > > > > If there's no engagement, then I'm afraid it might mean that
>> > third
>> > > > > > > parties have no interest in this. I don't really have any
>> > solution
>> > > for
>> > > > > > > generating engagement except nagging and pinging people
>> > explicitly
>> > > :-)
>> > > > > > >
>> > >
>> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
>> > > > > > >
>> > >
>> > > > > > > > I would like to see the same Antoine, currently given the
>> lack
>> > of
>> > > > > > > > engagement (both for OR against) I was going to take the
>> > silence
>> > > as
>> > > > > > > > assent
>> > > > > > > > and hope for non-Voltron Data PMC members to vote in this.
>> > > > > > > >
>> > >
>> > > > > > > > If anyone has any suggestions on how we could potentially
>> > > generate
>> > > > > > > > more
>> > > > > > > > engagement and discussion on this, please let me know as I
>> want
>> > > as
>> > > > > > > > many
>> > > > > > > > parties in the community as possible to be part of this.
>> > > > > > > >
>> > >
>> > > > > > > > Thanks everyone.
>> > > > > > > >
>> > >
>> > > > > > > > --Matt
>> > > > > > > >
>> > >
>> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
>> > > antoine@python.org
>> > > > > > > > wrote:
>> > > > > > > >
>> > >
>> > > > > > > > > Hello,
>> > > > > > > > >
>> > >
>> > > > > > > > > I'd really like to see more engagement and criticism from
>> > > > > > > > > non-Voltron
>> > > > > > > > > Data parties before this is formally adopted as an Arrow
>> > spec.
>> > > > > > > > >
>> > >
>> > > > > > > > > Regards
>> > > > > > > > >
>> > >
>> > > > > > > > > Antoine.
>> > > > > > > > >
>> > >
>> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
>> > > > > > > > >
>> > >
>> > > > > > > > > > Hey all,
>> > > > > > > > > >
>> > >
>> > > > > > > > > > I'd like to propose a vote for us to officially adopt the
>> > > protocol
>> > > > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
>> > > > > > > > > > Transports.
>> > > > > > > > > > This
>> > > > > > > > > > proposal was originally discussed at 2. Once this
>> proposal
>> > is
>> > > > > > > > > > adopted,
>> > > > > > > > > > I
>> > > > > > > > > > will work on adding the necessary documentation to the
>> > Arrow
>> > > > > > > > > > website
>> > > > > > > > > > along
>> > > > > > > > > > with examples etc.
>> > > > > > > > > >
>> > >
>> > > > > > > > > > The vote will be open for at least 72 hours.
>> > > > > > > > > >
>> > >
>> > > > > > > > > > [ ] +1 Accept this Proposal
>> > > > > > > > > > [ ] +0
>> > > > > > > > > > [ ] -1 Do not accept this proposal because...
>> > > > > > > > > >
>> > >
>> > > > > > > > > > Thank you everyone!
>> > > > > > > > > >
>> > >
>> > > > > > > > > > --Matt
>> > > > > > > > > >
>> > >
>> > > > > > > > > > [1]:
>> > > > >
>> > >
>> > > > >
>> > >
>> >
>> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>> >
>>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Matt Topol <zo...@gmail.com>.
Hey All, It's been another month and we've gotten a whole bunch of feedback
and engagement on the document from a variety of individuals. Myself and a
few others have proactively attempted to reach out to as many third parties
as we could, hoping to pull more engagement also. While it would be great
to get even more feedback, the comments have slowed down and we haven't
gotten anything in a few days at this point.

If there's no objections, I'd like to try to open up for voting again to
officially adopt this as a protocol to add to our docs.

Thanks all!

--Matt

On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pg...@gmail.com> wrote:

> Agreed that it makes sense not to focus on in-place updating for this
> proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
> protocol, because of all the assumptions and restrictions required as you
> noted.
>
> I took another look at the proposal and don’t think there’s anything
> preventing in-place updating in the future - ultimately the data body could
> just be in the same location for subsequent messages.
>
> Thanks!
> Paul
>
> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com> wrote:
>
> > > @pgwhalen: As a potential "end user developer," (and aspiring
> > contributor) this
> > immediately excited me when I first saw it.
> >
> > Yay! Good to hear that!
> >
> > > @pgwhalen: And it wasn't clear to me whether updating batches in
> > place (and the producer/consumer coordination that comes with that) was
> > supported or encouraged as part of the proposal.
> >
> > So, updating batches in place was not a particular use-case we were
> > targeting with this approach. Instead using shared memory to produce and
> > consume the buffers/batches without having to physically copy the data.
> > Trying to update a batch in place is a dangerous prospect for a number of
> > reasons, but as you've mentioned it can technically be made safe if the
> > shape is staying the same and you're only modifying fixed-width data
> types
> > (i.e. not only is the *shape* unchanged but the sizes of the underlying
> > data buffers are also remaining unchanged). The producer/consumer
> > coordination that would be needed for updating batches in place is not
> part
> > of this proposal but is definitely something we can look into as a
> > follow-up to this for extending it. There's a number of discussions that
> > would need to be had around that so I don't want to add on another
> > complexity to this already complex proposal.
> >
> > That said, if you or anyone see something in this proposal that would
> > hinder or prevent being able to use it for your use case please let me
> know
> > so we can address it. Even though the proposal as it currently exists
> > doesn't fully support the in-place updating of batches, I don't want to
> > make things harder for us in such a follow-up where we'd end up requiring
> > an entirely new protocol to support that.
> >
> > > @octalene.dev: I know of a third party that is interested in Arrow for
> > HPC environments that could be interested in the proposal and I can see
> if
> > they're interested in providing feedback.
> >
> > Awesome! Thanks much!
> >
> >
> > For reference to anyone who hasn't looked at the document in a while,
> since
> > the original discussion thread on this I have added a full "Background
> > Context" page to the beginning of the proposal to help anyone who isn't
> > already familiar with the issues this protocol is trying to solve or
> isn't
> > already familiar with ucx or libfabric transports to better understand
> > *why* I'm
> > proposing this and what it is trying to solve. The point of this
> background
> > information is to help ensure that anyone who might have thoughts on
> > protocols in general or APIs should still be able to understand the base
> > reasons and goals that we're trying to achieve with this protocol
> proposal.
> > You don't need to already understand managing GPU/device memory or ucx to
> > be able to have meaningful input on the document.
> >
> > Thanks again to all who have contributed so far and please spread to any
> > contacts that you think might be interested in this for their particular
> > use cases.
> >
> > --Matt
> >
> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid>
> wrote:
> >
> > > I am interested in this as well, but I haven't gotten to a point where
> I
> > > can have valuable input (I haven't tried other transports). I know of a
> > > third party that is interested in Arrow for HPC environments that could
> > be
> > > interested in the proposal and I can see if they're interested in
> > providing
> > > feedback.
> > >
> > > I glanced at the document before but I'll go through again to see if
> > there
> > > is anything I can comment on.
> > >
> > >
> > >
> > > # ------------------------------
> > > # Aldrin
> > >
> > >
> > > https://github.com/drin/
> > > https://gitlab.com/octalene
> > > https://keybase.io/octalene
> > >
> > >
> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> > pgwhalen@gmail.com>
> > > wrote:
> > >
> > > > As a potential "end user developer," (and aspiring contributor) this
> > > > immediately excited me when I first saw it.
> > > >
> > >
> > > > I work at a trading firm, and my team has developed an IPC mechanism
> > for
> > > > efficiently transmitting pandas dataframes both remotely via TCP and
> > > > locally via shared memory, where the interface for the application
> > > > developer is the same for both. The data in the dataframes may change
> > > > rapidly, so when communicating locally via shared memory, if the
> shape
> > of
> > > > the dataframe doesn't change, we update the memory in place,
> > coordinating
> > > > between the producer and consumer via TCP.
> > > >
> > >
> > > > We intend to move away from our remote TCP mechanism towards Arrow
> > > Flight,
> > > > or a lighter-weight version of Arrow IPC. For the local shared memory
> > > > mechanism which we previously did not have a good answer for, it
> seems
> > > like
> > > > Disassociated Arrow IPC maps quite well to our problem.
> > > >
> > >
> > > > So some features that enable our use case are:
> > > > - Updating existing batches in place is supported
> > > > - The interface is pretty similar to Flight
> > > >
> > >
> > > > I'd imagine we're not the only financial firm to implement something
> > like
> > > > this, given how widespread pandas usage is, so that could be a place
> to
> > > > seek feedback.
> > > >
> > >
> > > > As I was reading the proposal initially, I gleaned that the most
> > > important
> > > > audience was those writing interfaces to GPUs/remote
> > memory/non-standard
> > > > transports/etc. And it wasn't clear to me whether updating batches in
> > > > place (and the producer/consumer coordination that comes with that)
> was
> > > > supported or encouraged as part of the proposal. But regardless, as
> an
> > > end
> > > > user, this seems like an easier and more efficient way to glue pieces
> > in
> > > > the Arrow ecosystem together if it was adopted broadly.
> > > >
> > >
> > > > Paul
> > > >
> > >
> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewizard@gmail.com
> > wrote:
> > > >
> > >
> > > > > I'll continue my efforts of trying to reach out to other interested
> > > > > parties, but if anyone else here has any contacts or connections
> that
> > > they
> > > > > think might be interested please forward them the link to the
> Google
> > > doc.
> > > > >
> > >
> > > > > I really do want to get as much engagement and feedback as possible
> > on
> > > > > this.
> > > > >
> > >
> > > > > Thanks!
> > > > >
> > >
> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com
> > wrote:
> > > > >
> > >
> > > > > > Have there been efforts to proactively reach out to other third
> > > parties
> > > > > > that might have an interest in this or be a potential user at
> some
> > > point?
> > > > > > There are a lot of interested parties in Arrow that may not
> > actively
> > > > > > follow
> > > > > > the mailing list.
> > > > > >
> > >
> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
> > > NVIDIA
> > > > > > or
> > > > > > working on UCX), or other communities like that might have
> > > constructive
> > > > > > thoughts about this. DLPack (
> https://dmlc.github.io/dlpack/latest/
> > )
> > > also
> > > > > > seems adjacent and worth reaching out to. Other ideas for
> projects
> > or
> > > > > > companies that could be reached out to for feedback.
> > > > > >
> > >
> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> antoine@python.org
> > > > > > wrote:
> > > > > >
> > >
> > > > > > > If there's no engagement, then I'm afraid it might mean that
> > third
> > > > > > > parties have no interest in this. I don't really have any
> > solution
> > > for
> > > > > > > generating engagement except nagging and pinging people
> > explicitly
> > > :-)
> > > > > > >
> > >
> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > > > >
> > >
> > > > > > > > I would like to see the same Antoine, currently given the
> lack
> > of
> > > > > > > > engagement (both for OR against) I was going to take the
> > silence
> > > as
> > > > > > > > assent
> > > > > > > > and hope for non-Voltron Data PMC members to vote in this.
> > > > > > > >
> > >
> > > > > > > > If anyone has any suggestions on how we could potentially
> > > generate
> > > > > > > > more
> > > > > > > > engagement and discussion on this, please let me know as I
> want
> > > as
> > > > > > > > many
> > > > > > > > parties in the community as possible to be part of this.
> > > > > > > >
> > >
> > > > > > > > Thanks everyone.
> > > > > > > >
> > >
> > > > > > > > --Matt
> > > > > > > >
> > >
> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > > antoine@python.org
> > > > > > > > wrote:
> > > > > > > >
> > >
> > > > > > > > > Hello,
> > > > > > > > >
> > >
> > > > > > > > > I'd really like to see more engagement and criticism from
> > > > > > > > > non-Voltron
> > > > > > > > > Data parties before this is formally adopted as an Arrow
> > spec.
> > > > > > > > >
> > >
> > > > > > > > > Regards
> > > > > > > > >
> > >
> > > > > > > > > Antoine.
> > > > > > > > >
> > >
> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > > > > > >
> > >
> > > > > > > > > > Hey all,
> > > > > > > > > >
> > >
> > > > > > > > > > I'd like to propose a vote for us to officially adopt the
> > > protocol
> > > > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
> > > > > > > > > > Transports.
> > > > > > > > > > This
> > > > > > > > > > proposal was originally discussed at 2. Once this
> proposal
> > is
> > > > > > > > > > adopted,
> > > > > > > > > > I
> > > > > > > > > > will work on adding the necessary documentation to the
> > Arrow
> > > > > > > > > > website
> > > > > > > > > > along
> > > > > > > > > > with examples etc.
> > > > > > > > > >
> > >
> > > > > > > > > > The vote will be open for at least 72 hours.
> > > > > > > > > >
> > >
> > > > > > > > > > [ ] +1 Accept this Proposal
> > > > > > > > > > [ ] +0
> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > > > > > > >
> > >
> > > > > > > > > > Thank you everyone!
> > > > > > > > > >
> > >
> > > > > > > > > > --Matt
> > > > > > > > > >
> > >
> > > > > > > > > > [1]:
> > > > >
> > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >
>

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

Posted by Paul Whalen <pg...@gmail.com>.
Agreed that it makes sense not to focus on in-place updating for this
proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
protocol, because of all the assumptions and restrictions required as you
noted.

I took another look at the proposal and don’t think there’s anything
preventing in-place updating in the future - ultimately the data body could
just be in the same location for subsequent messages.

Thanks!
Paul

On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zo...@gmail.com> wrote:

> > @pgwhalen: As a potential "end user developer," (and aspiring
> contributor) this
> immediately excited me when I first saw it.
>
> Yay! Good to hear that!
>
> > @pgwhalen: And it wasn't clear to me whether updating batches in
> place (and the producer/consumer coordination that comes with that) was
> supported or encouraged as part of the proposal.
>
> So, updating batches in place was not a particular use-case we were
> targeting with this approach. Instead using shared memory to produce and
> consume the buffers/batches without having to physically copy the data.
> Trying to update a batch in place is a dangerous prospect for a number of
> reasons, but as you've mentioned it can technically be made safe if the
> shape is staying the same and you're only modifying fixed-width data types
> (i.e. not only is the *shape* unchanged but the sizes of the underlying
> data buffers are also remaining unchanged). The producer/consumer
> coordination that would be needed for updating batches in place is not part
> of this proposal but is definitely something we can look into as a
> follow-up to this for extending it. There's a number of discussions that
> would need to be had around that so I don't want to add on another
> complexity to this already complex proposal.
>
> That said, if you or anyone see something in this proposal that would
> hinder or prevent being able to use it for your use case please let me know
> so we can address it. Even though the proposal as it currently exists
> doesn't fully support the in-place updating of batches, I don't want to
> make things harder for us in such a follow-up where we'd end up requiring
> an entirely new protocol to support that.
>
> > @octalene.dev: I know of a third party that is interested in Arrow for
> HPC environments that could be interested in the proposal and I can see if
> they're interested in providing feedback.
>
> Awesome! Thanks much!
>
>
> For reference to anyone who hasn't looked at the document in a while, since
> the original discussion thread on this I have added a full "Background
> Context" page to the beginning of the proposal to help anyone who isn't
> already familiar with the issues this protocol is trying to solve or isn't
> already familiar with ucx or libfabric transports to better understand
> *why* I'm
> proposing this and what it is trying to solve. The point of this background
> information is to help ensure that anyone who might have thoughts on
> protocols in general or APIs should still be able to understand the base
> reasons and goals that we're trying to achieve with this protocol proposal.
> You don't need to already understand managing GPU/device memory or ucx to
> be able to have meaningful input on the document.
>
> Thanks again to all who have contributed so far and please spread to any
> contacts that you think might be interested in this for their particular
> use cases.
>
> --Matt
>
> On Wed, Feb 28, 2024 at 1:39 AM Aldrin <oc...@pm.me.invalid> wrote:
>
> > I am interested in this as well, but I haven't gotten to a point where I
> > can have valuable input (I haven't tried other transports). I know of a
> > third party that is interested in Arrow for HPC environments that could
> be
> > interested in the proposal and I can see if they're interested in
> providing
> > feedback.
> >
> > I glanced at the document before but I'll go through again to see if
> there
> > is anything I can comment on.
> >
> >
> >
> > # ------------------------------
> > # Aldrin
> >
> >
> > https://github.com/drin/
> > https://gitlab.com/octalene
> > https://keybase.io/octalene
> >
> >
> > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> pgwhalen@gmail.com>
> > wrote:
> >
> > > As a potential "end user developer," (and aspiring contributor) this
> > > immediately excited me when I first saw it.
> > >
> >
> > > I work at a trading firm, and my team has developed an IPC mechanism
> for
> > > efficiently transmitting pandas dataframes both remotely via TCP and
> > > locally via shared memory, where the interface for the application
> > > developer is the same for both. The data in the dataframes may change
> > > rapidly, so when communicating locally via shared memory, if the shape
> of
> > > the dataframe doesn't change, we update the memory in place,
> coordinating
> > > between the producer and consumer via TCP.
> > >
> >
> > > We intend to move away from our remote TCP mechanism towards Arrow
> > Flight,
> > > or a lighter-weight version of Arrow IPC. For the local shared memory
> > > mechanism which we previously did not have a good answer for, it seems
> > like
> > > Disassociated Arrow IPC maps quite well to our problem.
> > >
> >
> > > So some features that enable our use case are:
> > > - Updating existing batches in place is supported
> > > - The interface is pretty similar to Flight
> > >
> >
> > > I'd imagine we're not the only financial firm to implement something
> like
> > > this, given how widespread pandas usage is, so that could be a place to
> > > seek feedback.
> > >
> >
> > > As I was reading the proposal initially, I gleaned that the most
> > important
> > > audience was those writing interfaces to GPUs/remote
> memory/non-standard
> > > transports/etc. And it wasn't clear to me whether updating batches in
> > > place (and the producer/consumer coordination that comes with that) was
> > > supported or encouraged as part of the proposal. But regardless, as an
> > end
> > > user, this seems like an easier and more efficient way to glue pieces
> in
> > > the Arrow ecosystem together if it was adopted broadly.
> > >
> >
> > > Paul
> > >
> >
> > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewizard@gmail.com
> wrote:
> > >
> >
> > > > I'll continue my efforts of trying to reach out to other interested
> > > > parties, but if anyone else here has any contacts or connections that
> > they
> > > > think might be interested please forward them the link to the Google
> > doc.
> > > >
> >
> > > > I really do want to get as much engagement and feedback as possible
> on
> > > > this.
> > > >
> >
> > > > Thanks!
> > > >
> >
> > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmckinn@gmail.com
> wrote:
> > > >
> >
> > > > > Have there been efforts to proactively reach out to other third
> > parties
> > > > > that might have an interest in this or be a potential user at some
> > point?
> > > > > There are a lot of interested parties in Arrow that may not
> actively
> > > > > follow
> > > > > the mailing list.
> > > > >
> >
> > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
> > NVIDIA
> > > > > or
> > > > > working on UCX), or other communities like that might have
> > constructive
> > > > > thoughts about this. DLPack (https://dmlc.github.io/dlpack/latest/
> )
> > also
> > > > > seems adjacent and worth reaching out to. Other ideas for projects
> or
> > > > > companies that could be reached out to for feedback.
> > > > >
> >
> > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou antoine@python.org
> > > > > wrote:
> > > > >
> >
> > > > > > If there's no engagement, then I'm afraid it might mean that
> third
> > > > > > parties have no interest in this. I don't really have any
> solution
> > for
> > > > > > generating engagement except nagging and pinging people
> explicitly
> > :-)
> > > > > >
> >
> > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > > >
> >
> > > > > > > I would like to see the same Antoine, currently given the lack
> of
> > > > > > > engagement (both for OR against) I was going to take the
> silence
> > as
> > > > > > > assent
> > > > > > > and hope for non-Voltron Data PMC members to vote in this.
> > > > > > >
> >
> > > > > > > If anyone has any suggestions on how we could potentially
> > generate
> > > > > > > more
> > > > > > > engagement and discussion on this, please let me know as I want
> > as
> > > > > > > many
> > > > > > > parties in the community as possible to be part of this.
> > > > > > >
> >
> > > > > > > Thanks everyone.
> > > > > > >
> >
> > > > > > > --Matt
> > > > > > >
> >
> > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > antoine@python.org
> > > > > > > wrote:
> > > > > > >
> >
> > > > > > > > Hello,
> > > > > > > >
> >
> > > > > > > > I'd really like to see more engagement and criticism from
> > > > > > > > non-Voltron
> > > > > > > > Data parties before this is formally adopted as an Arrow
> spec.
> > > > > > > >
> >
> > > > > > > > Regards
> > > > > > > >
> >
> > > > > > > > Antoine.
> > > > > > > >
> >
> > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > > > > >
> >
> > > > > > > > > Hey all,
> > > > > > > > >
> >
> > > > > > > > > I'd like to propose a vote for us to officially adopt the
> > protocol
> > > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
> > > > > > > > > Transports.
> > > > > > > > > This
> > > > > > > > > proposal was originally discussed at 2. Once this proposal
> is
> > > > > > > > > adopted,
> > > > > > > > > I
> > > > > > > > > will work on adding the necessary documentation to the
> Arrow
> > > > > > > > > website
> > > > > > > > > along
> > > > > > > > > with examples etc.
> > > > > > > > >
> >
> > > > > > > > > The vote will be open for at least 72 hours.
> > > > > > > > >
> >
> > > > > > > > > [ ] +1 Accept this Proposal
> > > > > > > > > [ ] +0
> > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > > > > > >
> >
> > > > > > > > > Thank you everyone!
> > > > > > > > >
> >
> > > > > > > > > --Matt
> > > > > > > > >
> >
> > > > > > > > > [1]:
> > > >
> >
> > > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>