You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Jacopo Gobbi <ja...@orchest.io> on 2021/05/04 14:56:39 UTC

State of plasma

Hi everybody,

My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
are currently making use of the plasma store to do in-memory data
passing, and use pyarrow for some serialization.
Some months ago, in the dev mailing list, there have been talks of the
plasma store getting deprecated. I see that Arrow 4.0.0 has been
released and that the plasma-store is still there, I could not find
exhaustive information in the docs and the README about the current
state of the plasma-store and what to expect in the future.

Could someone provide some insight as to what the current plans are for
plasma in pyarrow?

Regards,
Jacopo

Re: State of plasma

Posted by Wes McKinney <we...@gmail.com>.
Plasma was used by Ray in production as is prior to the fork which
took place last year. We have no concrete plans to remove it at this
time, but if you run into a bug (bugs have seemed to creep up mostly
in stress-testing scenarios, and there are some unfixed bug reports in
the issue tracker), you would be on your own right now. It's
conceivable that its development might be sponsored by some
corporation in the future, and I still think that doing shared memory
IPC in this fashion is a good idea.

On Tue, Jul 27, 2021 at 3:46 PM Simon Fischer <si...@ipp.mpg.de> wrote:
>
> Thanks a lot for clearifying, Wes!
>
> You said earlier that plasma is effectively deprecated for lack of
> maintenance, and you also say it works well. Will it be removed from
> Arrow? Can it be considered stable (in terms of "production ready
> runtime stability", not API)?
>
> Thanks again
> Simon
>
> Am 27.07.2021 um 21:55 schrieb Wes McKinney:
> > On Tue, Jul 27, 2021 at 1:29 PM Simon Fischer <si...@ipp.mpg.de> wrote:
> >> Hi all,
> >>
> >> could someone (Wes?) elaborate on that a bit more? Because the ray project still lists Arrow Plasma as part of there infrastructure , e.g. here :
> > I'm pretty certain this is simply a case of outdated documentation --
> > you could file an issue with Ray and ask them to fix it.
> >
> >> Also, from my understanding plasma is C++ and the client exists for C++, and plasma is (still) part of Arrow. But the only documentation on how to use plasma with Arrow is for python [2]. There is some docs for plasma C++ here [3], but they do not seem to cover the interaction between Arrow and plasma. Can you point me somewhere where I can find something in that regard?
> >>
> > At this point reading the C++ headers would be the way to go. Plasma
> > is one aspect of building a distributed data caching solution, so it's
> > quite low level but it works well.
> >
> >> Thirdly out of curiosity: are there any data about performance (throughput, latency) of plasma (both inter and intra process)?
> > I don't recall anyone having done too many systematic performance
> > studies / benchmarks.
> >
> >> Thanks a lot and best regards
> >> Simon
> >>
> >>
> >> [1] https://docs.ray.io/en/master/serialization.html
> >> [2] https://arrow.apache.org/docs/python/plasma.html
> >> [3] https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md
> >>
> >> Am 04.05.2021 um 17:12 schrieb Wes McKinney:
> >>
> >> hi Jacopo — absent developer-maintainers, it is de facto deprecated
> >> since the previous developer-maintainers (who work on the Ray project)
> >> have forked away and abandoned it. If someone wants to resume
> >> development and maintenance of the codebase (or fund work on it,
> >> please contact me if you want to fund it!), that would be great.
> >>
> >> On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi <ja...@orchest.io> wrote:
> >>
> >> Hi everybody,
> >>
> >> My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
> >> are currently making use of the plasma store to do in-memory data
> >> passing, and use pyarrow for some serialization.
> >> Some months ago, in the dev mailing list, there have been talks of the
> >> plasma store getting deprecated. I see that Arrow 4.0.0 has been
> >> released and that the plasma-store is still there, I could not find
> >> exhaustive information in the docs and the README about the current
> >> state of the plasma-store and what to expect in the future.
> >>
> >> Could someone provide some insight as to what the current plans are for
> >> plasma in pyarrow?
> >>
> >> Regards,
> >> Jacopo
> >>
> >>
> >> --
> >> Bitte Beachten: Namensänderung! Zukünftig bitte simon.fischer@ipp.mpg.de verwenden.
> >>
> >> Simon Fischer
> >>
> >> Entwickler - CoDaC
> >> Department Operation
> >>
> >> Max Planck Institut for Plasmaphysics
> >> Wendelsteinstrasse 1
> >> 17491 Greifswald, Germany
> >>
> >> Phone: +49(0)3834 88 1215
>
>
>
>

Re: State of plasma

Posted by Simon Fischer <si...@ipp.mpg.de>.
Thanks a lot for clearifying, Wes!

You said earlier that plasma is effectively deprecated for lack of 
maintenance, and you also say it works well. Will it be removed from 
Arrow? Can it be considered stable (in terms of "production ready 
runtime stability", not API)?

Thanks again
Simon

Am 27.07.2021 um 21:55 schrieb Wes McKinney:
> On Tue, Jul 27, 2021 at 1:29 PM Simon Fischer <si...@ipp.mpg.de> wrote:
>> Hi all,
>>
>> could someone (Wes?) elaborate on that a bit more? Because the ray project still lists Arrow Plasma as part of there infrastructure , e.g. here :
> I'm pretty certain this is simply a case of outdated documentation --
> you could file an issue with Ray and ask them to fix it.
>
>> Also, from my understanding plasma is C++ and the client exists for C++, and plasma is (still) part of Arrow. But the only documentation on how to use plasma with Arrow is for python [2]. There is some docs for plasma C++ here [3], but they do not seem to cover the interaction between Arrow and plasma. Can you point me somewhere where I can find something in that regard?
>>
> At this point reading the C++ headers would be the way to go. Plasma
> is one aspect of building a distributed data caching solution, so it's
> quite low level but it works well.
>
>> Thirdly out of curiosity: are there any data about performance (throughput, latency) of plasma (both inter and intra process)?
> I don't recall anyone having done too many systematic performance
> studies / benchmarks.
>
>> Thanks a lot and best regards
>> Simon
>>
>>
>> [1] https://docs.ray.io/en/master/serialization.html
>> [2] https://arrow.apache.org/docs/python/plasma.html
>> [3] https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md
>>
>> Am 04.05.2021 um 17:12 schrieb Wes McKinney:
>>
>> hi Jacopo — absent developer-maintainers, it is de facto deprecated
>> since the previous developer-maintainers (who work on the Ray project)
>> have forked away and abandoned it. If someone wants to resume
>> development and maintenance of the codebase (or fund work on it,
>> please contact me if you want to fund it!), that would be great.
>>
>> On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi <ja...@orchest.io> wrote:
>>
>> Hi everybody,
>>
>> My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
>> are currently making use of the plasma store to do in-memory data
>> passing, and use pyarrow for some serialization.
>> Some months ago, in the dev mailing list, there have been talks of the
>> plasma store getting deprecated. I see that Arrow 4.0.0 has been
>> released and that the plasma-store is still there, I could not find
>> exhaustive information in the docs and the README about the current
>> state of the plasma-store and what to expect in the future.
>>
>> Could someone provide some insight as to what the current plans are for
>> plasma in pyarrow?
>>
>> Regards,
>> Jacopo
>>
>>
>> --
>> Bitte Beachten: Namensänderung! Zukünftig bitte simon.fischer@ipp.mpg.de verwenden.
>>
>> Simon Fischer
>>
>> Entwickler - CoDaC
>> Department Operation
>>
>> Max Planck Institut for Plasmaphysics
>> Wendelsteinstrasse 1
>> 17491 Greifswald, Germany
>>
>> Phone: +49(0)3834 88 1215





Re: State of plasma

Posted by Wes McKinney <we...@gmail.com>.
On Tue, Jul 27, 2021 at 1:29 PM Simon Fischer <si...@ipp.mpg.de> wrote:
>
> Hi all,
>
> could someone (Wes?) elaborate on that a bit more? Because the ray project still lists Arrow Plasma as part of there infrastructure , e.g. here :

I'm pretty certain this is simply a case of outdated documentation --
you could file an issue with Ray and ask them to fix it.

> Also, from my understanding plasma is C++ and the client exists for C++, and plasma is (still) part of Arrow. But the only documentation on how to use plasma with Arrow is for python [2]. There is some docs for plasma C++ here [3], but they do not seem to cover the interaction between Arrow and plasma. Can you point me somewhere where I can find something in that regard?
>

At this point reading the C++ headers would be the way to go. Plasma
is one aspect of building a distributed data caching solution, so it's
quite low level but it works well.

> Thirdly out of curiosity: are there any data about performance (throughput, latency) of plasma (both inter and intra process)?

I don't recall anyone having done too many systematic performance
studies / benchmarks.

> Thanks a lot and best regards
> Simon
>
>
> [1] https://docs.ray.io/en/master/serialization.html
> [2] https://arrow.apache.org/docs/python/plasma.html
> [3] https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md
>
> Am 04.05.2021 um 17:12 schrieb Wes McKinney:
>
> hi Jacopo — absent developer-maintainers, it is de facto deprecated
> since the previous developer-maintainers (who work on the Ray project)
> have forked away and abandoned it. If someone wants to resume
> development and maintenance of the codebase (or fund work on it,
> please contact me if you want to fund it!), that would be great.
>
> On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi <ja...@orchest.io> wrote:
>
> Hi everybody,
>
> My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
> are currently making use of the plasma store to do in-memory data
> passing, and use pyarrow for some serialization.
> Some months ago, in the dev mailing list, there have been talks of the
> plasma store getting deprecated. I see that Arrow 4.0.0 has been
> released and that the plasma-store is still there, I could not find
> exhaustive information in the docs and the README about the current
> state of the plasma-store and what to expect in the future.
>
> Could someone provide some insight as to what the current plans are for
> plasma in pyarrow?
>
> Regards,
> Jacopo
>
>
> --
> Bitte Beachten: Namensänderung! Zukünftig bitte simon.fischer@ipp.mpg.de verwenden.
>
> Simon Fischer
>
> Entwickler - CoDaC
> Department Operation
>
> Max Planck Institut for Plasmaphysics
> Wendelsteinstrasse 1
> 17491 Greifswald, Germany
>
> Phone: +49(0)3834 88 1215

Re: State of plasma

Posted by Simon Fischer <si...@ipp.mpg.de>.
Hi all,

could someone (Wes?) elaborate on that a bit more? Because the ray 
project still lists Arrow Plasma as part of there infrastructure , e.g. 
here : _
_

Also, from my understanding plasma is C++ and the client exists for C++, 
and plasma is (still) part of Arrow. But the only documentation on how 
to use plasma with Arrow is for python [2]. There is some docs for 
plasma C++ here [3], but they do not seem to cover the interaction 
between Arrow and plasma. Can you point me somewhere where I can find 
something in that regard?

Thirdly out of curiosity: are there any data about performance 
(throughput, latency) of plasma (both inter and intra process)?

Thanks a lot and best regards
Simon


[1] 
https://docs.ray.io/en/master/serialization.html_<https://zoom.us/j/93660435118?pwd=RVhHSjhFVVNwZkJub01qWmdpeWordz09Meeting-ID:%20936%206043%205118Kenncode:%20398025>_
[2] https://arrow.apache.org/docs/python/plasma.html
[3] 
https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md
__

Am 04.05.2021 um 17:12 schrieb Wes McKinney:
> hi Jacopo — absent developer-maintainers, it is de facto deprecated
> since the previous developer-maintainers (who work on the Ray project)
> have forked away and abandoned it. If someone wants to resume
> development and maintenance of the codebase (or fund work on it,
> please contact me if you want to fund it!), that would be great.
>
> On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi<ja...@orchest.io>  wrote:
>> Hi everybody,
>>
>> My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
>> are currently making use of the plasma store to do in-memory data
>> passing, and use pyarrow for some serialization.
>> Some months ago, in the dev mailing list, there have been talks of the
>> plasma store getting deprecated. I see that Arrow 4.0.0 has been
>> released and that the plasma-store is still there, I could not find
>> exhaustive information in the docs and the README about the current
>> state of the plasma-store and what to expect in the future.
>>
>> Could someone provide some insight as to what the current plans are for
>> plasma in pyarrow?
>>
>> Regards,
>> Jacopo


-- 
Bitte Beachten: Namensänderung! Zukünftig bitte simon.fischer@ipp.mpg.de verwenden.

Simon Fischer

Entwickler - CoDaC
Department Operation

Max Planck Institut for Plasmaphysics
Wendelsteinstrasse 1
17491 Greifswald, Germany

Phone: +49(0)3834 88 1215


Re: State of plasma

Posted by Jacopo Gobbi <ja...@orchest.io>.
Hi Wes,

Thank you for the quick and informative response, much appreciated.

Regards,
Jacopo


On Tue, May 4, 2021 at 5:13 PM Wes McKinney <we...@gmail.com> wrote:

> hi Jacopo — absent developer-maintainers, it is de facto deprecated
> since the previous developer-maintainers (who work on the Ray project)
> have forked away and abandoned it. If someone wants to resume
> development and maintenance of the codebase (or fund work on it,
> please contact me if you want to fund it!), that would be great.
>
> On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi <ja...@orchest.io> wrote:
> >
> > Hi everybody,
> >
> > My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
> > are currently making use of the plasma store to do in-memory data
> > passing, and use pyarrow for some serialization.
> > Some months ago, in the dev mailing list, there have been talks of the
> > plasma store getting deprecated. I see that Arrow 4.0.0 has been
> > released and that the plasma-store is still there, I could not find
> > exhaustive information in the docs and the README about the current
> > state of the plasma-store and what to expect in the future.
> >
> > Could someone provide some insight as to what the current plans are for
> > plasma in pyarrow?
> >
> > Regards,
> > Jacopo
>

Re: State of plasma

Posted by Wes McKinney <we...@gmail.com>.
hi Jacopo — absent developer-maintainers, it is de facto deprecated
since the previous developer-maintainers (who work on the Ray project)
have forked away and abandoned it. If someone wants to resume
development and maintenance of the codebase (or fund work on it,
please contact me if you want to fund it!), that would be great.

On Tue, May 4, 2021 at 10:07 AM Jacopo Gobbi <ja...@orchest.io> wrote:
>
> Hi everybody,
>
> My name is Jacopo, I am a software engineer at Orchest (orchest.io). We
> are currently making use of the plasma store to do in-memory data
> passing, and use pyarrow for some serialization.
> Some months ago, in the dev mailing list, there have been talks of the
> plasma store getting deprecated. I see that Arrow 4.0.0 has been
> released and that the plasma-store is still there, I could not find
> exhaustive information in the docs and the README about the current
> state of the plasma-store and what to expect in the future.
>
> Could someone provide some insight as to what the current plans are for
> plasma in pyarrow?
>
> Regards,
> Jacopo