You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Clive Cox <cc...@seldon.io> on 2019/07/08 05:55:16 UTC

Go / Python Sharing

Hi,

I'd like to understand the high level design for a system where a Go
process can communicate an Arrow data structure to a python process on the
same CPU - and for the python process to zero-copy gain access to that
data, change it and inform the Go process.  This is low latency so I don't
want to save to file.

Would this need the use of Plasma as a zero-copy store for the data between
the two processes or do I need to use IPC? But with IPC you are
transferring the data which is not needed in this case as I understand it.
Any pointers to examples would be appreciated.

 Thanks,

   Clive


-- 


<https://www.seldon.io>
Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP (
map <https://goo.gl/maps/BbJgCdNso5Q2>). Registered in England & Wales, No.
9188032. VAT GB 258424587. Privacy Policy <https://www.seldon.io/privacy/>.

回复: Go / Python Sharing

Posted by black <bl...@qq.com>.
I think you may have sent the wrong email, I did not ask the relevant questions.




------------------ 原始邮件 ------------------
发件人: "Miki Tebeka"<mi...@353solutions.com>;
发送时间: 2019年7月11日(星期四) 下午3:11
收件人: "user"<us...@arrow.apache.org>;

主题: Re: Go / Python Sharing



Hi,
 

but I think implementing a shared memory Go allocator to be easier (as in less human hours to implement).
That depends on a lot of aspects. There might be slight differences in memory layout between Go and Python/C++ (byte alignment... ). Not saying it can't be done, just a lot of testing required :)


All the best,
Miki

Re: Go / Python Sharing

Posted by Miki Tebeka <mi...@353solutions.com>.
I stand corrected!

On Thu, Jul 11, 2019, 11:01 Uwe L. Korn <uw...@xhochy.com> wrote:

> Hello Miki,
>
> actually having the same byte alignment is something that we have written
> into the spec. So when there is a problem in the shared memory usage, we
> actually would have found a bug in one of the two implementations.
>
> Uwe
>
> On Thu, Jul 11, 2019, at 9:11 AM, Miki Tebeka wrote:
>
> Hi,
>
>
> but I think implementing a shared memory Go allocator to be easier (as in
> less human hours to implement).
>
> That depends on a lot of aspects. There might be slight differences in
> memory layout between Go and Python/C++ (byte alignment... ). Not saying it
> can't be done, just a lot of testing required :)
>
> All the best,
> Miki
>
>
>

Re: Go / Python Sharing

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Miki,

actually having the same byte alignment is something that we have written into the spec. So when there is a problem in the shared memory usage, we actually would have found a bug in one of the two implementations.

Uwe

On Thu, Jul 11, 2019, at 9:11 AM, Miki Tebeka wrote:
> Hi,
> 
>> but I think implementing a shared memory Go allocator to be easier (as in less human hours to implement).
> That depends on a lot of aspects. There might be slight differences in memory layout between Go and Python/C++ (byte alignment... ). Not saying it can't be done, just a lot of testing required :)
> 
> All the best,
> Miki

Re: Go / Python Sharing

Posted by Miki Tebeka <mi...@353solutions.com>.
Hi,


> but I think implementing a shared memory Go allocator to be easier (as in
> less human hours to implement).
>
That depends on a lot of aspects. There might be slight differences in
memory layout between Go and Python/C++ (byte alignment... ). Not saying it
can't be done, just a lot of testing required :)

All the best,
Miki

Re: Go / Python Sharing

Posted by Sebastien Binet <se...@gmail.com>.
Having not yet looked at the amount of work implementing plasma in Go is,
you may just ignore me :) but I think implementing a shared memory Go
allocator to be easier (as in less human hours to implement).

Another option could be to have a CGo package exposing a set of functions
(compiled as a C shlib) that call into the Go based arrow package to do
what you need.

-s

sent from my droid

On Mon, Jul 8, 2019, 10:30 Clive Cox <cc...@seldon.io> wrote:

>
> Thanks for all the informative replies.
>
>  In our case the Python and Go would be in separate processes. So for that
> as I understand the conversation so far the options are:
>
>    - Use of Plasma. This requires pending updates for the current Go
>    implementation? (happy to help here)
>    - IPC - but this will require sending the data over the wire?
>
> Thanks,
>
>  Clive
>
>
>
>
>
> On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn <uw...@xhochy.com> wrote:
>
>> Hello all,
>>
>> I've been using the in-process sharing method for quite some time for the
>> Python<->Java interaction and I really like the ease of doing it all in the
>> same process. Especially as this avoids any memory-copy or shared memory
>> handling. This is really useful for the case where you only want to call a
>> single routine in another language.
>>
>> Thus I would really like to see this also implemented for Go (and Rust)
>> so that one can build custom UDFs in it and use them from Python code. The
>> pre-conditions for this are that we have IPC tests that verify that both
>> libraries use the exact same memory layout and that we can pull out the
>> memory pointer from the Go Arrow structures into the C++ memory structures
>> and also keep a reference between both so that memory tracking doesn't
>> deallocate the underlying memory. For that we have in Python the
>> pyarrow.foreign_buffer
>> https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292
>>  function.
>>
>> For the Go<->Python case, I would though recommend to solve this as a
>> Go<->C++ interface as this would make interaction for all the libraries
>> based on the C++ one (like R, Ruby, ..) possible.
>>
>> Uwe
>>
>> On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote:
>>
>> My bad, IPC in Go seems to be implemented -
>> https://issues.apache.org/jira/browse/ARROW-3679
>>
>> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <se...@gmail.com>
>> wrote:
>>
>> As far as i know, Go does support IPC (as in the arrow IPC format)
>>
>> Another option which has been discussed at some point was to have a
>> shared memory allocator so the arrow arrays could be shared between
>> processes.
>>
>> I haven't looked in details what implementing plasma support for Go would
>> need on the Go side...
>>
>> -s
>>
>>
>> sent from my droid
>>
>> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <mi...@353solutions.com> wrote:
>>
>> Hi Clive,
>>
>> I'd like to understand the high level design for a system where a Go
>> process can communicate an Arrow data structure to a python process on the
>> same CPU
>>
>> I see two options
>> - Different processes with hared memory, probably using plasma
>> - Same process. The either Go uses Python shared library or Python using
>> Go compiled to shared library (-build-mode=c-shared)
>>
>>
>> - and for the python process to zero-copy gain access to that data,
>> change it and inform the Go process.  This is low latency so I don't want
>> to save to file.
>>
>> IIRC arrow is not built for mutation. You build an Array/Table once and
>> then use it.
>>
>> Would this need the use of Plasma as a zero-copy store for the data
>> between the two processes or do I need to use IPC? But with IPC you are
>> transferring the data which is not needed in this case as I understand it.
>> Any pointers to examples would be appreciated.
>>
>> See above about options. Note that currently the Go arrow implementation
>> doesn't support IPC or plasma (though it's in the works).
>>
>> Yoni & I are working on another option which is using the C++ arrow
>> library from Go. It does support plasma and since it uses the same
>> underlying C++ library that Python does you'll be able to pass a pointer
>> around without copying data. It's at very alpha-ish state but you're more
>> than welcomed to give it a try - https://github.com/353solutions/carrow
>>
>> Happy hacking,
>> Miki
>>
>>
>>
>
> --
>
>
> <https://www.seldon.io>
> Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP
> (map <https://goo.gl/maps/BbJgCdNso5Q2>). Registered in England & Wales,
> No. 9188032. VAT GB 258424587. Privacy Policy
> <https://www.seldon.io/privacy/>.
>

Re: Go / Python Sharing

Posted by Clive Cox <cc...@seldon.io>.
Thanks for all the informative replies.

 In our case the Python and Go would be in separate processes. So for that
as I understand the conversation so far the options are:

   - Use of Plasma. This requires pending updates for the current Go
   implementation? (happy to help here)
   - IPC - but this will require sending the data over the wire?

Thanks,

 Clive





On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn <uw...@xhochy.com> wrote:

> Hello all,
>
> I've been using the in-process sharing method for quite some time for the
> Python<->Java interaction and I really like the ease of doing it all in the
> same process. Especially as this avoids any memory-copy or shared memory
> handling. This is really useful for the case where you only want to call a
> single routine in another language.
>
> Thus I would really like to see this also implemented for Go (and Rust) so
> that one can build custom UDFs in it and use them from Python code. The
> pre-conditions for this are that we have IPC tests that verify that both
> libraries use the exact same memory layout and that we can pull out the
> memory pointer from the Go Arrow structures into the C++ memory structures
> and also keep a reference between both so that memory tracking doesn't
> deallocate the underlying memory. For that we have in Python the
> pyarrow.foreign_buffer
> https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292
>  function.
>
> For the Go<->Python case, I would though recommend to solve this as a
> Go<->C++ interface as this would make interaction for all the libraries
> based on the C++ one (like R, Ruby, ..) possible.
>
> Uwe
>
> On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote:
>
> My bad, IPC in Go seems to be implemented -
> https://issues.apache.org/jira/browse/ARROW-3679
>
> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <se...@gmail.com>
> wrote:
>
> As far as i know, Go does support IPC (as in the arrow IPC format)
>
> Another option which has been discussed at some point was to have a shared
> memory allocator so the arrow arrays could be shared between processes.
>
> I haven't looked in details what implementing plasma support for Go would
> need on the Go side...
>
> -s
>
>
> sent from my droid
>
> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <mi...@353solutions.com> wrote:
>
> Hi Clive,
>
> I'd like to understand the high level design for a system where a Go
> process can communicate an Arrow data structure to a python process on the
> same CPU
>
> I see two options
> - Different processes with hared memory, probably using plasma
> - Same process. The either Go uses Python shared library or Python using
> Go compiled to shared library (-build-mode=c-shared)
>
>
> - and for the python process to zero-copy gain access to that data, change
> it and inform the Go process.  This is low latency so I don't want to save
> to file.
>
> IIRC arrow is not built for mutation. You build an Array/Table once and
> then use it.
>
> Would this need the use of Plasma as a zero-copy store for the data
> between the two processes or do I need to use IPC? But with IPC you are
> transferring the data which is not needed in this case as I understand it.
> Any pointers to examples would be appreciated.
>
> See above about options. Note that currently the Go arrow implementation
> doesn't support IPC or plasma (though it's in the works).
>
> Yoni & I are working on another option which is using the C++ arrow
> library from Go. It does support plasma and since it uses the same
> underlying C++ library that Python does you'll be able to pass a pointer
> around without copying data. It's at very alpha-ish state but you're more
> than welcomed to give it a try - https://github.com/353solutions/carrow
>
> Happy hacking,
> Miki
>
>
>

-- 


<https://www.seldon.io>
Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP (
map <https://goo.gl/maps/BbJgCdNso5Q2>). Registered in England & Wales, No.
9188032. VAT GB 258424587. Privacy Policy <https://www.seldon.io/privacy/>.

Re: Go / Python Sharing

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello all,

I've been using the in-process sharing method for quite some time for the Python<->Java interaction and I really like the ease of doing it all in the same process. Especially as this avoids any memory-copy or shared memory handling. This is really useful for the case where you only want to call a single routine in another language.

Thus I would really like to see this also implemented for Go (and Rust) so that one can build custom UDFs in it and use them from Python code. The pre-conditions for this are that we have IPC tests that verify that both libraries use the exact same memory layout and that we can pull out the memory pointer from the Go Arrow structures into the C++ memory structures and also keep a reference between both so that memory tracking doesn't deallocate the underlying memory. For that we have in Python the pyarrow.foreign_buffer https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292 function.

For the Go<->Python case, I would though recommend to solve this as a Go<->C++ interface as this would make interaction for all the libraries based on the C++ one (like R, Ruby, ..) possible.

Uwe

On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote:
> My bad, IPC in Go seems to be implemented - https://issues.apache.org/jira/browse/ARROW-3679
> 
> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <se...@gmail.com> wrote:
>> As far as i know, Go does support IPC (as in the arrow IPC format)
>> 
>> Another option which has been discussed at some point was to have a shared memory allocator so the arrow arrays could be shared between processes.
>> 
>> I haven't looked in details what implementing plasma support for Go would need on the Go side...
>> 
>> -s
>> 
>> 
>> sent from my droid
>> 
>> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <mi...@353solutions.com> wrote:
>>> Hi Clive,
>>> 
>>>> I'd like to understand the high level design for a system where a Go process can communicate an Arrow data structure to a python process on the same CPU
>>> I see two options
>>> - Different processes with hared memory, probably using plasma
>>> - Same process. The either Go uses Python shared library or Python using Go compiled to shared library (-build-mode=c-shared)
>>> 
>>>> - and for the python process to zero-copy gain access to that data, change it and inform the Go process. This is low latency so I don't want to save to file.
>>> IIRC arrow is not built for mutation. You build an Array/Table once and then use it.
>>> 
>>>> Would this need the use of Plasma as a zero-copy store for the data between the two processes or do I need to use IPC? But with IPC you are transferring the data which is not needed in this case as I understand it. Any pointers to examples would be appreciated.
>>> See above about options. Note that currently the Go arrow implementation doesn't support IPC or plasma (though it's in the works).
>>> 
>>> Yoni & I are working on another option which is using the C++ arrow library from Go. It does support plasma and since it uses the same underlying C++ library that Python does you'll be able to pass a pointer around without copying data. It's at very alpha-ish state but you're more than welcomed to give it a try - https://github.com/353solutions/carrow
>>> 
>>> Happy hacking,
>>> Miki 

Re: Go / Python Sharing

Posted by Miki Tebeka <mi...@353solutions.com>.
My bad, IPC in Go seems to be implemented -
https://issues.apache.org/jira/browse/ARROW-3679

On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <se...@gmail.com> wrote:

> As far as i know, Go does support IPC (as in the arrow IPC format)
>
> Another option which has been discussed at some point was to have a shared
> memory allocator so the arrow arrays could be shared between processes.
>
> I haven't looked in details what implementing plasma support for Go would
> need on the Go side...
>
> -s
>
>
> sent from my droid
>
> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <mi...@353solutions.com> wrote:
>
>> Hi Clive,
>>
>> I'd like to understand the high level design for a system where a Go
>>> process can communicate an Arrow data structure to a python process on the
>>> same CPU
>>>
>> I see two options
>> - Different processes with hared memory, probably using plasma
>> - Same process. The either Go uses Python shared library or Python using
>> Go compiled to shared library (-build-mode=c-shared)
>>
>>
>>> - and for the python process to zero-copy gain access to that data,
>>> change it and inform the Go process.  This is low latency so I don't want
>>> to save to file.
>>>
>> IIRC arrow is not built for mutation. You build an Array/Table once and
>> then use it.
>>
>> Would this need the use of Plasma as a zero-copy store for the data
>>> between the two processes or do I need to use IPC? But with IPC you are
>>> transferring the data which is not needed in this case as I understand it.
>>> Any pointers to examples would be appreciated.
>>>
>> See above about options. Note that currently the Go arrow implementation
>> doesn't support IPC or plasma (though it's in the works).
>>
>> Yoni & I are working on another option which is using the C++ arrow
>> library from Go. It does support plasma and since it uses the same
>> underlying C++ library that Python does you'll be able to pass a pointer
>> around without copying data. It's at very alpha-ish state but you're more
>> than welcomed to give it a try - https://github.com/353solutions/carrow
>>
>> Happy hacking,
>> Miki
>>
>

Re: Go / Python Sharing

Posted by Sebastien Binet <se...@gmail.com>.
As far as i know, Go does support IPC (as in the arrow IPC format)

Another option which has been discussed at some point was to have a shared
memory allocator so the arrow arrays could be shared between processes.

I haven't looked in details what implementing plasma support for Go would
need on the Go side...

-s


sent from my droid

On Mon, Jul 8, 2019, 08:29 Miki Tebeka <mi...@353solutions.com> wrote:

> Hi Clive,
>
> I'd like to understand the high level design for a system where a Go
>> process can communicate an Arrow data structure to a python process on the
>> same CPU
>>
> I see two options
> - Different processes with hared memory, probably using plasma
> - Same process. The either Go uses Python shared library or Python using
> Go compiled to shared library (-build-mode=c-shared)
>
>
>> - and for the python process to zero-copy gain access to that data,
>> change it and inform the Go process.  This is low latency so I don't want
>> to save to file.
>>
> IIRC arrow is not built for mutation. You build an Array/Table once and
> then use it.
>
> Would this need the use of Plasma as a zero-copy store for the data
>> between the two processes or do I need to use IPC? But with IPC you are
>> transferring the data which is not needed in this case as I understand it.
>> Any pointers to examples would be appreciated.
>>
> See above about options. Note that currently the Go arrow implementation
> doesn't support IPC or plasma (though it's in the works).
>
> Yoni & I are working on another option which is using the C++ arrow
> library from Go. It does support plasma and since it uses the same
> underlying C++ library that Python does you'll be able to pass a pointer
> around without copying data. It's at very alpha-ish state but you're more
> than welcomed to give it a try - https://github.com/353solutions/carrow
>
> Happy hacking,
> Miki
>

Re: Go / Python Sharing

Posted by Miki Tebeka <mi...@353solutions.com>.
Hi Clive,

I'd like to understand the high level design for a system where a Go
> process can communicate an Arrow data structure to a python process on the
> same CPU
>
I see two options
- Different processes with hared memory, probably using plasma
- Same process. The either Go uses Python shared library or Python using Go
compiled to shared library (-build-mode=c-shared)


> - and for the python process to zero-copy gain access to that data, change
> it and inform the Go process.  This is low latency so I don't want to save
> to file.
>
IIRC arrow is not built for mutation. You build an Array/Table once and
then use it.

Would this need the use of Plasma as a zero-copy store for the data between
> the two processes or do I need to use IPC? But with IPC you are
> transferring the data which is not needed in this case as I understand it.
> Any pointers to examples would be appreciated.
>
See above about options. Note that currently the Go arrow implementation
doesn't support IPC or plasma (though it's in the works).

Yoni & I are working on another option which is using the C++ arrow library
from Go. It does support plasma and since it uses the same underlying C++
library that Python does you'll be able to pass a pointer around without
copying data. It's at very alpha-ish state but you're more than welcomed to
give it a try - https://github.com/353solutions/carrow

Happy hacking,
Miki