You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dominik Moritz <do...@cmu.edu> on 2021/02/25 22:39:39 UTC

[Rust] Arrow in WebAssemby

Hello Rust Arrow Devs,

I have been working on a wasm version of Arrow using the Rust library (
https://github.com/domoritz/arrow-wasm). I was wondering whether you would
be interested in having me demo it in the Arrow Rust sync call. If so, when
would be the next one and how much time would you want to allocate for it?
Also, would you be interested for me to dive into something in particular?

Cheers,
Dominik

Re: [Rust] Arrow in WebAssemby

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Domink's point is that the IPC reader currently first writes the whole
thing into a Vec<u8>, and then copies all of that to buffers using
IPC::Buffer offsets and lengths. Thus, it performs 2 memcopies of the whole
data and needs to hold 2x the required memory (the Vec<u8> and the
arrow::Buffers).

I noticed this while going through it on my proposal repo, and I rewrote it
using `Reader::Seek`
<https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/deserialize.rs#L66>
to write directly to typed buffers. Coincidentally, this also enabled
reading from big endian, as we know what is on each buffer, and thus know
how to handle endianness using to_le and from_be implemented on Rust 's
native types.

Best,
Jorge


On Mon, Mar 8, 2021 at 11:12 PM Andrew Lamb <al...@influxdata.com> wrote:

> Thank you for filing the ticket.
>
> I wonder if you mean this reader:
>
> https://docs.rs/arrow/3.0.0/arrow/ipc/reader/struct.FileReader.html#method.try_new
>
> If so, while it is called a `FileReader` I think that is somewhat
> misleading. It requires something that implements `std::io::Read` -- which
> `&[u8]` does.
>
> https://doc.rust-lang.org/std/io/trait.Read.html#impl-Read-2
>
> So you should be able to read directly from the `[u8]` without having to do
> any copies
>
> I may perhaps be missing something
>
> On Thu, Mar 4, 2021 at 10:53 AM Dominik Moritz <do...@cmu.edu> wrote:
>
> >  I just remembered a bigger issue I ran into. I wanted to read from IPC
> but
> > I don’t have a file. I do have the data as [u8] already. The current API
> > incurs more copies than necessary (I think) and therefore the performance
> > of reading IPC is worse than in JS. (
> > https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11696).
> >
> > On Mar 1, 2021 at 23:29:18, Dominik Moritz <do...@cmu.edu> wrote:
> >
> > > I am looking forward to speaking with you then. I’ll talk about the
> > > motivation.
> > >
> > > My experience with the library has been good. I ran into a few
> > limitations
> > > that I filed Jiras for. I struggled a bit with some of the error
> handling
> > > and Arc types but that’s probably because I am now very experienced
> with
> > > Rust and wasm-bindgen doesn’t support all Rust features.
> > >
> > > I had some bigger issues with the DataFusion and Parquet libraries as
> > they
> > > don’t support wasm right now (also filed Jiras for those).
> > >
> > > On Feb 27, 2021 at 11:14:27, Andrew Lamb <al...@influxdata.com> wrote:
> > >
> > >> Hi  Dominik,
> > >>
> > >> That sounds really interesting -- thank you for the offer
> > >>
> > >> I for one would enjoy seeing a demo and suggest that 10 minutes might
> > be a
> > >> good length. The next call (details are also on the announcement [1])
> is
> > >> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST /
> 17:00
> > >> UTC. The link is https://meet.google.com/ctp-yujs-aee
> > >>
> > >> I would personally be interested in hearing about your experience as a
> > >> user
> > >> of the Rust library (what was good, what was challenging, how can we
> > >> improve).
> > >>
> > >> Thanks!
> > >> Andrew
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
> > >>
> > >> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
> > >> fernando.j.herrera@gmail.com> wrote:
> > >>
> > >> Hi Dominic,
> > >>
> > >>
> > >> I would be interested in a demo. Im curious to see your implementation
> > and
> > >>
> > >> what advantages you have seen over javascript
> > >>
> > >>
> > >> thanks
> > >>
> > >> Fernando
> > >>
> > >>
> > >> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu>
> > wrote:
> > >>
> > >>
> > >> > Hello Rust Arrow Devs,
> > >>
> > >> >
> > >>
> > >> > I have been working on a wasm version of Arrow using the Rust
> library
> > (
> > >>
> > >> > https://github.com/domoritz/arrow-wasm). I was wondering whether
> you
> > >>
> > >> would
> > >>
> > >> > be interested in having me demo it in the Arrow Rust sync call. If
> so,
> > >>
> > >> when
> > >>
> > >> > would be the next one and how much time would you want to allocate
> for
> > >>
> > >> it?
> > >>
> > >> > Also, would you be interested for me to dive into something in
> > >>
> > >> particular?
> > >>
> > >> >
> > >>
> > >> > Cheers,
> > >>
> > >> > Dominik
> > >>
> > >> >
> > >>
> > >>
> > >>
> >
>

Re: [Rust] Arrow in WebAssemby

Posted by Andrew Lamb <al...@influxdata.com>.
Thank you for filing the ticket.

I wonder if you mean this reader:
https://docs.rs/arrow/3.0.0/arrow/ipc/reader/struct.FileReader.html#method.try_new

If so, while it is called a `FileReader` I think that is somewhat
misleading. It requires something that implements `std::io::Read` -- which
`&[u8]` does.

https://doc.rust-lang.org/std/io/trait.Read.html#impl-Read-2

So you should be able to read directly from the `[u8]` without having to do
any copies

I may perhaps be missing something

On Thu, Mar 4, 2021 at 10:53 AM Dominik Moritz <do...@cmu.edu> wrote:

>  I just remembered a bigger issue I ran into. I wanted to read from IPC but
> I don’t have a file. I do have the data as [u8] already. The current API
> incurs more copies than necessary (I think) and therefore the performance
> of reading IPC is worse than in JS. (
> https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11696).
>
> On Mar 1, 2021 at 23:29:18, Dominik Moritz <do...@cmu.edu> wrote:
>
> > I am looking forward to speaking with you then. I’ll talk about the
> > motivation.
> >
> > My experience with the library has been good. I ran into a few
> limitations
> > that I filed Jiras for. I struggled a bit with some of the error handling
> > and Arc types but that’s probably because I am now very experienced with
> > Rust and wasm-bindgen doesn’t support all Rust features.
> >
> > I had some bigger issues with the DataFusion and Parquet libraries as
> they
> > don’t support wasm right now (also filed Jiras for those).
> >
> > On Feb 27, 2021 at 11:14:27, Andrew Lamb <al...@influxdata.com> wrote:
> >
> >> Hi  Dominik,
> >>
> >> That sounds really interesting -- thank you for the offer
> >>
> >> I for one would enjoy seeing a demo and suggest that 10 minutes might
> be a
> >> good length. The next call (details are also on the announcement [1]) is
> >> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST / 17:00
> >> UTC. The link is https://meet.google.com/ctp-yujs-aee
> >>
> >> I would personally be interested in hearing about your experience as a
> >> user
> >> of the Rust library (what was good, what was challenging, how can we
> >> improve).
> >>
> >> Thanks!
> >> Andrew
> >>
> >> [1]
> >>
> >>
> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
> >>
> >> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
> >> fernando.j.herrera@gmail.com> wrote:
> >>
> >> Hi Dominic,
> >>
> >>
> >> I would be interested in a demo. Im curious to see your implementation
> and
> >>
> >> what advantages you have seen over javascript
> >>
> >>
> >> thanks
> >>
> >> Fernando
> >>
> >>
> >> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu>
> wrote:
> >>
> >>
> >> > Hello Rust Arrow Devs,
> >>
> >> >
> >>
> >> > I have been working on a wasm version of Arrow using the Rust library
> (
> >>
> >> > https://github.com/domoritz/arrow-wasm). I was wondering whether you
> >>
> >> would
> >>
> >> > be interested in having me demo it in the Arrow Rust sync call. If so,
> >>
> >> when
> >>
> >> > would be the next one and how much time would you want to allocate for
> >>
> >> it?
> >>
> >> > Also, would you be interested for me to dive into something in
> >>
> >> particular?
> >>
> >> >
> >>
> >> > Cheers,
> >>
> >> > Dominik
> >>
> >> >
> >>
> >>
> >>
>

Re: [Rust] Arrow in WebAssemby

Posted by Dominik Moritz <do...@cmu.edu>.
 I just remembered a bigger issue I ran into. I wanted to read from IPC but
I don’t have a file. I do have the data as [u8] already. The current API
incurs more copies than necessary (I think) and therefore the performance
of reading IPC is worse than in JS. (
https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11696).

On Mar 1, 2021 at 23:29:18, Dominik Moritz <do...@cmu.edu> wrote:

> I am looking forward to speaking with you then. I’ll talk about the
> motivation.
>
> My experience with the library has been good. I ran into a few limitations
> that I filed Jiras for. I struggled a bit with some of the error handling
> and Arc types but that’s probably because I am now very experienced with
> Rust and wasm-bindgen doesn’t support all Rust features.
>
> I had some bigger issues with the DataFusion and Parquet libraries as they
> don’t support wasm right now (also filed Jiras for those).
>
> On Feb 27, 2021 at 11:14:27, Andrew Lamb <al...@influxdata.com> wrote:
>
>> Hi  Dominik,
>>
>> That sounds really interesting -- thank you for the offer
>>
>> I for one would enjoy seeing a demo and suggest that 10 minutes might be a
>> good length. The next call (details are also on the announcement [1]) is
>> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST / 17:00
>> UTC. The link is https://meet.google.com/ctp-yujs-aee
>>
>> I would personally be interested in hearing about your experience as a
>> user
>> of the Rust library (what was good, what was challenging, how can we
>> improve).
>>
>> Thanks!
>> Andrew
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
>>
>> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
>> fernando.j.herrera@gmail.com> wrote:
>>
>> Hi Dominic,
>>
>>
>> I would be interested in a demo. Im curious to see your implementation and
>>
>> what advantages you have seen over javascript
>>
>>
>> thanks
>>
>> Fernando
>>
>>
>> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu> wrote:
>>
>>
>> > Hello Rust Arrow Devs,
>>
>> >
>>
>> > I have been working on a wasm version of Arrow using the Rust library (
>>
>> > https://github.com/domoritz/arrow-wasm). I was wondering whether you
>>
>> would
>>
>> > be interested in having me demo it in the Arrow Rust sync call. If so,
>>
>> when
>>
>> > would be the next one and how much time would you want to allocate for
>>
>> it?
>>
>> > Also, would you be interested for me to dive into something in
>>
>> particular?
>>
>> >
>>
>> > Cheers,
>>
>> > Dominik
>>
>> >
>>
>>
>>

Re: [Rust] Arrow in WebAssemby

Posted by Dominik Moritz <do...@cmu.edu>.
 I am looking forward to speaking with you then. I’ll talk about the
motivation.

My experience with the library has been good. I ran into a few limitations
that I filed Jiras for. I struggled a bit with some of the error handling
and Arc types but that’s probably because I am now very experienced with
Rust and wasm-bindgen doesn’t support all Rust features.

I had some bigger issues with the DataFusion and Parquet libraries as they
don’t support wasm right now (also filed Jiras for those).

On Feb 27, 2021 at 11:14:27, Andrew Lamb <al...@influxdata.com> wrote:

> Hi  Dominik,
>
> That sounds really interesting -- thank you for the offer
>
> I for one would enjoy seeing a demo and suggest that 10 minutes might be a
> good length. The next call (details are also on the announcement [1]) is
> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST / 17:00
> UTC. The link is https://meet.google.com/ctp-yujs-aee
>
> I would personally be interested in hearing about your experience as a user
> of the Rust library (what was good, what was challenging, how can we
> improve).
>
> Thanks!
> Andrew
>
> [1]
>
> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
>
> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
> fernando.j.herrera@gmail.com> wrote:
>
> Hi Dominic,
>
>
> I would be interested in a demo. Im curious to see your implementation and
>
> what advantages you have seen over javascript
>
>
> thanks
>
> Fernando
>
>
> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu> wrote:
>
>
> > Hello Rust Arrow Devs,
>
> >
>
> > I have been working on a wasm version of Arrow using the Rust library (
>
> > https://github.com/domoritz/arrow-wasm). I was wondering whether you
>
> would
>
> > be interested in having me demo it in the Arrow Rust sync call. If so,
>
> when
>
> > would be the next one and how much time would you want to allocate for
>
> it?
>
> > Also, would you be interested for me to dive into something in
>
> particular?
>
> >
>
> > Cheers,
>
> > Dominik
>
> >
>
>
>

Re: [Rust] Arrow in WebAssemby

Posted by Andrew Lamb <al...@influxdata.com>.
Hi  Dominik,

That sounds really interesting -- thank you for the offer

I for one would enjoy seeing a demo and suggest that 10 minutes might be a
good length. The next call (details are also on the announcement [1]) is
scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST / 17:00
UTC. The link is https://meet.google.com/ctp-yujs-aee

I would personally be interested in hearing about your experience as a user
of the Rust library (what was good, what was challenging, how can we
improve).

Thanks!
Andrew

[1]
https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E

On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi Dominic,
>
> I would be interested in a demo. Im curious to see your implementation and
> what advantages you have seen over javascript
>
> thanks
> Fernando
>
> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu> wrote:
>
> > Hello Rust Arrow Devs,
> >
> > I have been working on a wasm version of Arrow using the Rust library (
> > https://github.com/domoritz/arrow-wasm). I was wondering whether you
> would
> > be interested in having me demo it in the Arrow Rust sync call. If so,
> when
> > would be the next one and how much time would you want to allocate for
> it?
> > Also, would you be interested for me to dive into something in
> particular?
> >
> > Cheers,
> > Dominik
> >
>

Re: [Rust] Arrow in WebAssemby

Posted by Fernando Herrera <fe...@gmail.com>.
Hi Dominic,

I would be interested in a demo. Im curious to see your implementation and
what advantages you have seen over javascript

thanks
Fernando

On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <do...@cmu.edu> wrote:

> Hello Rust Arrow Devs,
>
> I have been working on a wasm version of Arrow using the Rust library (
> https://github.com/domoritz/arrow-wasm). I was wondering whether you would
> be interested in having me demo it in the Arrow Rust sync call. If so, when
> would be the next one and how much time would you want to allocate for it?
> Also, would you be interested for me to dive into something in particular?
>
> Cheers,
> Dominik
>