You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dustin Long <du...@qri.io> on 2018/12/19 19:57:41 UTC

plan for Go implementation of Plasma

Hi all!

I am a developer on qri <https://qri.io/>, a data-science tool built on
IPFS written in go. We're interested in integrating Arrow and especially
Plasma, in order to be able to share datasets with other apps like Jupyter
Notebook. Having this functionality is going to be key for how we plan to
integrate with existing frameworks.

I've been investigating possible approaches for how to use Plasma in our
codecase. I realize that Plasma is still a work in progress, and doesn't
have stable API yet, but we're also a ways off from being ready to fully
integrate it on our side. Just figured it would be good to start this
conversation early in order to plan ahead for how development should
proceed.

So, the prototypes I've been hacking on have revealed a few choices of how
to make our golang codebase call Plasma's C++, and I wanted to see what the
Plasma devs think about these approaches or if they have any preference for
how the go bindings should be behave.

Here are the options in order of what seems to be least to most usable:

1. cgo
  Use go's builtin cgo facility to call the Plasma C++ implementation. cgo
is relatively easy to use, however it only can call C functions. So this
would require writing and maintaining a pure C language wrapper around the
C++ functionality we want to expose. A lot would be lost in translation and
the resulting go code would look nothing like the original library.

2. dlopen
  Install Plasma as a library on the user's system, then load the library
at run-time, looking up function calls and data structures as needed.
Removes the need for a static dependency, but still requires a lot of shim
code to be written to load the shared library calls. C++'s name mangling
gets in the way a lot.

3. Swig
  Wrap a swig interface file that exposes whatever functionality we want to
golang. The standard go tool has builtin swig support, which knows when to
invoke the swig generator, in order to create go bindings that resemble the
C++ original. The build process is relatively uninterrupted.

I noticed there doesn't seem to be any swig in use currently in the arrow
codebase, which made me think there might have been a reason that it has
been avoided for other languages. I'm interested to hear any thoughts, or
see if there are other suggestions on how to proceed?

Regards,
Dustin

Re: plan for Go implementation of Plasma

Posted by Kouhei Sutou <ko...@clear-code.com>.
Hi,

GObject Plasma bindings mentioned by Philipp is the official
C bindings for Plasma (Plasma GLib):

  https://github.com/apache/arrow/tree/master/c_glib/plasma-glib

Ruby bindings use it. So we'll maintain and improve it.


There were examples to generate Go bindings for Arrow from
Arrow GLib automatically:

  https://github.com/apache/arrow/tree/apache-arrow-0.9.0/c_glib/example/go

I removed it because we started native Go bindings. The same
mechanism can be used for Plasma but I don't think we should
use it.

We'll be able to implement Go bindings with cgo and Plasma
GLib. I think that Go bindings for Plasma shouldn't export
GLib related API when we use Plasma GLib.

I can help when we use Plasma GLib.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: plan for Go implementation of Plasma" on Wed, 19 Dec 2018 17:23:58 -0500,
  Dustin Long <du...@qri.io> wrote:

> Neat! Thank you for the suggestions, I'll take a look into these other
> approaches. Sticking with cgo does sound promising; I had dismissed it due
> to needing to maintain a C interface, but if there's already some bindings
> that might become official, that negates that issue.
> 
> On Wed, Dec 19, 2018 at 3:26 PM Philipp Moritz <pc...@gmail.com> wrote:
> 
>> Hey Dustin,
>>
>> Thanks for getting in touch! Here are two additional ways to do it:
>>
>> 5. Native go client library: If Go has support to ship file descriptors
>> over unix domain sockets (which I think it has, see
>>
>> https://github.com/opencontainers/runc/blob/master/libcontainer/utils/cmsg.go
>> )
>> and interact with memory mapped files it might also be possible to make a
>> version of
>> https://github.com/apache/arrow/blob/master/cpp/src/plasma/client.cc that
>> is native go. The advantage is that it wouldn't need any additional
>> compilation steps on the go side, the disadvantage is that it would need to
>> be updated if the plasma client internals change (like they did recently
>> with the removal of the release buffer).
>>
>> 6. GObject wrapper: Possibly one could use the GObject plasma bindings
>> that kou and his team are managing to build a wrapper (not sure how
>> feasible that is or if there is a mature GObject go implementation).
>>
>> I would encourage you to start by writing write down the ideal Go API for
>> the client and then see how it can be implemented after that (to make sure
>> the API, which is the most important piece, is not influenced by the
>> implementation choice).
>>
>> Then, going the cgo route seems the most promising for me since that's I
>> think the route that most go code interfaces with native libraries. There
>> are some C bindings that have been written:
>> https://github.com/plures/pxnd/tree/master/libplasma. If they are useful
>> to
>> you, we can make a plan to integrate them into the repo.
>>
>> Best,
>> Philipp.
>>
>>
>>
>> On Wed, Dec 19, 2018 at 12:04 PM Dustin Long <du...@qri.io> wrote:
>>
>> > Hi all!
>> >
>> > I am a developer on qri <https://qri.io/>, a data-science tool built on
>> > IPFS written in go. We're interested in integrating Arrow and especially
>> > Plasma, in order to be able to share datasets with other apps like
>> Jupyter
>> > Notebook. Having this functionality is going to be key for how we plan to
>> > integrate with existing frameworks.
>> >
>> > I've been investigating possible approaches for how to use Plasma in our
>> > codecase. I realize that Plasma is still a work in progress, and doesn't
>> > have stable API yet, but we're also a ways off from being ready to fully
>> > integrate it on our side. Just figured it would be good to start this
>> > conversation early in order to plan ahead for how development should
>> > proceed.
>> >
>> > So, the prototypes I've been hacking on have revealed a few choices of
>> how
>> > to make our golang codebase call Plasma's C++, and I wanted to see what
>> the
>> > Plasma devs think about these approaches or if they have any preference
>> for
>> > how the go bindings should be behave.
>> >
>> > Here are the options in order of what seems to be least to most usable:
>> >
>> > 1. cgo
>> >   Use go's builtin cgo facility to call the Plasma C++ implementation.
>> cgo
>> > is relatively easy to use, however it only can call C functions. So this
>> > would require writing and maintaining a pure C language wrapper around
>> the
>> > C++ functionality we want to expose. A lot would be lost in translation
>> and
>> > the resulting go code would look nothing like the original library.
>> >
>> > 2. dlopen
>> >   Install Plasma as a library on the user's system, then load the library
>> > at run-time, looking up function calls and data structures as needed.
>> > Removes the need for a static dependency, but still requires a lot of
>> shim
>> > code to be written to load the shared library calls. C++'s name mangling
>> > gets in the way a lot.
>> >
>> > 3. Swig
>> >   Wrap a swig interface file that exposes whatever functionality we want
>> to
>> > golang. The standard go tool has builtin swig support, which knows when
>> to
>> > invoke the swig generator, in order to create go bindings that resemble
>> the
>> > C++ original. The build process is relatively uninterrupted.
>> >
>> > I noticed there doesn't seem to be any swig in use currently in the arrow
>> > codebase, which made me think there might have been a reason that it has
>> > been avoided for other languages. I'm interested to hear any thoughts, or
>> > see if there are other suggestions on how to proceed?
>> >
>> > Regards,
>> > Dustin
>> >
>>

Re: plan for Go implementation of Plasma

Posted by Dustin Long <du...@qri.io>.
Neat! Thank you for the suggestions, I'll take a look into these other
approaches. Sticking with cgo does sound promising; I had dismissed it due
to needing to maintain a C interface, but if there's already some bindings
that might become official, that negates that issue.

On Wed, Dec 19, 2018 at 3:26 PM Philipp Moritz <pc...@gmail.com> wrote:

> Hey Dustin,
>
> Thanks for getting in touch! Here are two additional ways to do it:
>
> 5. Native go client library: If Go has support to ship file descriptors
> over unix domain sockets (which I think it has, see
>
> https://github.com/opencontainers/runc/blob/master/libcontainer/utils/cmsg.go
> )
> and interact with memory mapped files it might also be possible to make a
> version of
> https://github.com/apache/arrow/blob/master/cpp/src/plasma/client.cc that
> is native go. The advantage is that it wouldn't need any additional
> compilation steps on the go side, the disadvantage is that it would need to
> be updated if the plasma client internals change (like they did recently
> with the removal of the release buffer).
>
> 6. GObject wrapper: Possibly one could use the GObject plasma bindings
> that kou and his team are managing to build a wrapper (not sure how
> feasible that is or if there is a mature GObject go implementation).
>
> I would encourage you to start by writing write down the ideal Go API for
> the client and then see how it can be implemented after that (to make sure
> the API, which is the most important piece, is not influenced by the
> implementation choice).
>
> Then, going the cgo route seems the most promising for me since that's I
> think the route that most go code interfaces with native libraries. There
> are some C bindings that have been written:
> https://github.com/plures/pxnd/tree/master/libplasma. If they are useful
> to
> you, we can make a plan to integrate them into the repo.
>
> Best,
> Philipp.
>
>
>
> On Wed, Dec 19, 2018 at 12:04 PM Dustin Long <du...@qri.io> wrote:
>
> > Hi all!
> >
> > I am a developer on qri <https://qri.io/>, a data-science tool built on
> > IPFS written in go. We're interested in integrating Arrow and especially
> > Plasma, in order to be able to share datasets with other apps like
> Jupyter
> > Notebook. Having this functionality is going to be key for how we plan to
> > integrate with existing frameworks.
> >
> > I've been investigating possible approaches for how to use Plasma in our
> > codecase. I realize that Plasma is still a work in progress, and doesn't
> > have stable API yet, but we're also a ways off from being ready to fully
> > integrate it on our side. Just figured it would be good to start this
> > conversation early in order to plan ahead for how development should
> > proceed.
> >
> > So, the prototypes I've been hacking on have revealed a few choices of
> how
> > to make our golang codebase call Plasma's C++, and I wanted to see what
> the
> > Plasma devs think about these approaches or if they have any preference
> for
> > how the go bindings should be behave.
> >
> > Here are the options in order of what seems to be least to most usable:
> >
> > 1. cgo
> >   Use go's builtin cgo facility to call the Plasma C++ implementation.
> cgo
> > is relatively easy to use, however it only can call C functions. So this
> > would require writing and maintaining a pure C language wrapper around
> the
> > C++ functionality we want to expose. A lot would be lost in translation
> and
> > the resulting go code would look nothing like the original library.
> >
> > 2. dlopen
> >   Install Plasma as a library on the user's system, then load the library
> > at run-time, looking up function calls and data structures as needed.
> > Removes the need for a static dependency, but still requires a lot of
> shim
> > code to be written to load the shared library calls. C++'s name mangling
> > gets in the way a lot.
> >
> > 3. Swig
> >   Wrap a swig interface file that exposes whatever functionality we want
> to
> > golang. The standard go tool has builtin swig support, which knows when
> to
> > invoke the swig generator, in order to create go bindings that resemble
> the
> > C++ original. The build process is relatively uninterrupted.
> >
> > I noticed there doesn't seem to be any swig in use currently in the arrow
> > codebase, which made me think there might have been a reason that it has
> > been avoided for other languages. I'm interested to hear any thoughts, or
> > see if there are other suggestions on how to proceed?
> >
> > Regards,
> > Dustin
> >
>

Re: plan for Go implementation of Plasma

Posted by Philipp Moritz <pc...@gmail.com>.
Hey Dustin,

Thanks for getting in touch! Here are two additional ways to do it:

5. Native go client library: If Go has support to ship file descriptors
over unix domain sockets (which I think it has, see
https://github.com/opencontainers/runc/blob/master/libcontainer/utils/cmsg.go)
and interact with memory mapped files it might also be possible to make a
version of
https://github.com/apache/arrow/blob/master/cpp/src/plasma/client.cc that
is native go. The advantage is that it wouldn't need any additional
compilation steps on the go side, the disadvantage is that it would need to
be updated if the plasma client internals change (like they did recently
with the removal of the release buffer).

6. GObject wrapper: Possibly one could use the GObject plasma bindings
that kou and his team are managing to build a wrapper (not sure how
feasible that is or if there is a mature GObject go implementation).

I would encourage you to start by writing write down the ideal Go API for
the client and then see how it can be implemented after that (to make sure
the API, which is the most important piece, is not influenced by the
implementation choice).

Then, going the cgo route seems the most promising for me since that's I
think the route that most go code interfaces with native libraries. There
are some C bindings that have been written:
https://github.com/plures/pxnd/tree/master/libplasma. If they are useful to
you, we can make a plan to integrate them into the repo.

Best,
Philipp.



On Wed, Dec 19, 2018 at 12:04 PM Dustin Long <du...@qri.io> wrote:

> Hi all!
>
> I am a developer on qri <https://qri.io/>, a data-science tool built on
> IPFS written in go. We're interested in integrating Arrow and especially
> Plasma, in order to be able to share datasets with other apps like Jupyter
> Notebook. Having this functionality is going to be key for how we plan to
> integrate with existing frameworks.
>
> I've been investigating possible approaches for how to use Plasma in our
> codecase. I realize that Plasma is still a work in progress, and doesn't
> have stable API yet, but we're also a ways off from being ready to fully
> integrate it on our side. Just figured it would be good to start this
> conversation early in order to plan ahead for how development should
> proceed.
>
> So, the prototypes I've been hacking on have revealed a few choices of how
> to make our golang codebase call Plasma's C++, and I wanted to see what the
> Plasma devs think about these approaches or if they have any preference for
> how the go bindings should be behave.
>
> Here are the options in order of what seems to be least to most usable:
>
> 1. cgo
>   Use go's builtin cgo facility to call the Plasma C++ implementation. cgo
> is relatively easy to use, however it only can call C functions. So this
> would require writing and maintaining a pure C language wrapper around the
> C++ functionality we want to expose. A lot would be lost in translation and
> the resulting go code would look nothing like the original library.
>
> 2. dlopen
>   Install Plasma as a library on the user's system, then load the library
> at run-time, looking up function calls and data structures as needed.
> Removes the need for a static dependency, but still requires a lot of shim
> code to be written to load the shared library calls. C++'s name mangling
> gets in the way a lot.
>
> 3. Swig
>   Wrap a swig interface file that exposes whatever functionality we want to
> golang. The standard go tool has builtin swig support, which knows when to
> invoke the swig generator, in order to create go bindings that resemble the
> C++ original. The build process is relatively uninterrupted.
>
> I noticed there doesn't seem to be any swig in use currently in the arrow
> codebase, which made me think there might have been a reason that it has
> been avoided for other languages. I'm interested to hear any thoughts, or
> see if there are other suggestions on how to proceed?
>
> Regards,
> Dustin
>