You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/09/01 18:46:31 UTC

Re: IPC Example

+ dev@

There's several examples of sending record batches from Java to Python
(and vice versa) over a network socket -- e.g. Jacques and I are
working on a prototype of a general purpose Arrow-native RPC framework
in Java and C++ respectively. Where there's some R&D needed is in Java
interactions with shared memory. So if you want to do a zero copy read
from a memory mapped file, then some development in the Arrow Java
libraries is required.

I'm not an expert but it seems like Netty has a mechanism to interact
with ByteBuffer, which should include MappedByteBuffer

https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ReadOnlyUnsafeDirectByteBuf.java#L25

Correspondingly, an interface could be developed to enable the Java
IPC code path to write to a shared memory region.

To the Java developers, could we create some JIRA issues (if there are
not already) around Java shared memory IPC?

- Wes
On Sat, Sep 1, 2018 at 2:32 PM ALBERTO Bocchinfuso
<al...@hotmail.it> wrote:
>
> I want to reinforce this request. I am interested in the same topic.
>
> I’d like an example specially focused on the creation of a RecordBatch passed from a Java program to a python one and vice-versa.
>
>
>
> Thanks,
>
> Alberto
>
>
>
> ________________________________
> Da: Clive Cox <cc...@seldon.io>
> Inviato: Saturday, September 1, 2018 6:12:29 PM
> A: user@arrow.apache.org
> Oggetto: IPC Example
>
> Hi,
>
>  Is there any example of how to do say Java - Python IPC? I'm not sure how to get started.
>
>  I'm thinking of using Arrow IPC to replace REST/gRPC APIs for communication when everything can be run on a single computer node and low-latency is the goal - hoping to remove the cost of serialization/deserialization and network costs. Would this make sense.
>
>  Thanks,
>
>  Clive
>

Re: IPC Example

Posted by Wes McKinney <we...@gmail.com>.
hi Pearu,

On Sat, Sep 1, 2018 at 3:15 PM Pearu Peterson
<pe...@quansight.com> wrote:
>
> Hi,
>
> I'd also like to enforce the raised question, in particular, it would be very useful to have basic examples of IPC between same or different languages, including C/C++, Python, Java, etc.
>
> Whatever combination of languages is used, the principles of IPC should be the same. For instance, in Python-Java or Python-Python IPC cases, the Python code should not depend on in what language is written the code running another process. Is this understanding correct?

Right, the IPC protocol is not language dependent; this is one of the
raison d'êtres of this project.

>
> Btw, while reading the arrow tests, I noticed a comment
>
>   IPC only supported on Linux
>
> in https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda-test.cc#L126
>
> Does this restriction apply only for CUDA IPC or is the comment more general?
> What would it take to add IPC support for Windows or OSX?

This only applies to CUDA IPC. I tried to find a definitive reference
but straight from NVIDIA:

https://github.com/NVIDIA/cuda-samples#cuda-interprocess-communication

We test shared memory IPC on all three platforms in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc

- Wes

>
> Best regards,
> Pearu
>
> On Sat, Sep 1, 2018 at 9:46 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>> + dev@
>>
>> There's several examples of sending record batches from Java to Python
>> (and vice versa) over a network socket -- e.g. Jacques and I are
>> working on a prototype of a general purpose Arrow-native RPC framework
>> in Java and C++ respectively. Where there's some R&D needed is in Java
>> interactions with shared memory. So if you want to do a zero copy read
>> from a memory mapped file, then some development in the Arrow Java
>> libraries is required.
>>
>> I'm not an expert but it seems like Netty has a mechanism to interact
>> with ByteBuffer, which should include MappedByteBuffer
>>
>> https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ReadOnlyUnsafeDirectByteBuf.java#L25
>>
>> Correspondingly, an interface could be developed to enable the Java
>> IPC code path to write to a shared memory region.
>>
>> To the Java developers, could we create some JIRA issues (if there are
>> not already) around Java shared memory IPC?
>>
>> - Wes
>> On Sat, Sep 1, 2018 at 2:32 PM ALBERTO Bocchinfuso
>> <al...@hotmail.it> wrote:
>> >
>> > I want to reinforce this request. I am interested in the same topic.
>> >
>> > I’d like an example specially focused on the creation of a RecordBatch passed from a Java program to a python one and vice-versa.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Alberto
>> >
>> >
>> >
>> > ________________________________
>> > Da: Clive Cox <cc...@seldon.io>
>> > Inviato: Saturday, September 1, 2018 6:12:29 PM
>> > A: user@arrow.apache.org
>> > Oggetto: IPC Example
>> >
>> > Hi,
>> >
>> >  Is there any example of how to do say Java - Python IPC? I'm not sure how to get started.
>> >
>> >  I'm thinking of using Arrow IPC to replace REST/gRPC APIs for communication when everything can be run on a single computer node and low-latency is the goal - hoping to remove the cost of serialization/deserialization and network costs. Would this make sense.
>> >
>> >  Thanks,
>> >
>> >  Clive
>> >
>
>

Re: IPC Example

Posted by Wes McKinney <we...@gmail.com>.
hi Pearu,

On Sat, Sep 1, 2018 at 3:15 PM Pearu Peterson
<pe...@quansight.com> wrote:
>
> Hi,
>
> I'd also like to enforce the raised question, in particular, it would be very useful to have basic examples of IPC between same or different languages, including C/C++, Python, Java, etc.
>
> Whatever combination of languages is used, the principles of IPC should be the same. For instance, in Python-Java or Python-Python IPC cases, the Python code should not depend on in what language is written the code running another process. Is this understanding correct?

Right, the IPC protocol is not language dependent; this is one of the
raison d'êtres of this project.

>
> Btw, while reading the arrow tests, I noticed a comment
>
>   IPC only supported on Linux
>
> in https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda-test.cc#L126
>
> Does this restriction apply only for CUDA IPC or is the comment more general?
> What would it take to add IPC support for Windows or OSX?

This only applies to CUDA IPC. I tried to find a definitive reference
but straight from NVIDIA:

https://github.com/NVIDIA/cuda-samples#cuda-interprocess-communication

We test shared memory IPC on all three platforms in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc

- Wes

>
> Best regards,
> Pearu
>
> On Sat, Sep 1, 2018 at 9:46 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>> + dev@
>>
>> There's several examples of sending record batches from Java to Python
>> (and vice versa) over a network socket -- e.g. Jacques and I are
>> working on a prototype of a general purpose Arrow-native RPC framework
>> in Java and C++ respectively. Where there's some R&D needed is in Java
>> interactions with shared memory. So if you want to do a zero copy read
>> from a memory mapped file, then some development in the Arrow Java
>> libraries is required.
>>
>> I'm not an expert but it seems like Netty has a mechanism to interact
>> with ByteBuffer, which should include MappedByteBuffer
>>
>> https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ReadOnlyUnsafeDirectByteBuf.java#L25
>>
>> Correspondingly, an interface could be developed to enable the Java
>> IPC code path to write to a shared memory region.
>>
>> To the Java developers, could we create some JIRA issues (if there are
>> not already) around Java shared memory IPC?
>>
>> - Wes
>> On Sat, Sep 1, 2018 at 2:32 PM ALBERTO Bocchinfuso
>> <al...@hotmail.it> wrote:
>> >
>> > I want to reinforce this request. I am interested in the same topic.
>> >
>> > I’d like an example specially focused on the creation of a RecordBatch passed from a Java program to a python one and vice-versa.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Alberto
>> >
>> >
>> >
>> > ________________________________
>> > Da: Clive Cox <cc...@seldon.io>
>> > Inviato: Saturday, September 1, 2018 6:12:29 PM
>> > A: user@arrow.apache.org
>> > Oggetto: IPC Example
>> >
>> > Hi,
>> >
>> >  Is there any example of how to do say Java - Python IPC? I'm not sure how to get started.
>> >
>> >  I'm thinking of using Arrow IPC to replace REST/gRPC APIs for communication when everything can be run on a single computer node and low-latency is the goal - hoping to remove the cost of serialization/deserialization and network costs. Would this make sense.
>> >
>> >  Thanks,
>> >
>> >  Clive
>> >
>
>

Re: IPC Example

Posted by Pearu Peterson <pe...@quansight.com>.
Hi,

I'd also like to enforce the raised question, in particular, it would be
very useful to have basic examples of IPC between same or different
languages, including C/C++, Python, Java, etc.

Whatever combination of languages is used, the principles of IPC should be
the same. For instance, in Python-Java or Python-Python IPC cases, the
Python code should not depend on in what language is written the code
running another process. Is this understanding correct?

Btw, while reading the arrow tests, I noticed a comment

  IPC only supported on Linux

in
https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda-test.cc#L126

Does this restriction apply only for CUDA IPC or is the comment more
general?
What would it take to add IPC support for Windows or OSX?

Best regards,
Pearu

On Sat, Sep 1, 2018 at 9:46 PM, Wes McKinney <we...@gmail.com> wrote:

> + dev@
>
> There's several examples of sending record batches from Java to Python
> (and vice versa) over a network socket -- e.g. Jacques and I are
> working on a prototype of a general purpose Arrow-native RPC framework
> in Java and C++ respectively. Where there's some R&D needed is in Java
> interactions with shared memory. So if you want to do a zero copy read
> from a memory mapped file, then some development in the Arrow Java
> libraries is required.
>
> I'm not an expert but it seems like Netty has a mechanism to interact
> with ByteBuffer, which should include MappedByteBuffer
>
> https://github.com/netty/netty/blob/4.1/buffer/src/
> main/java/io/netty/buffer/ReadOnlyUnsafeDirectByteBuf.java#L25
>
> Correspondingly, an interface could be developed to enable the Java
> IPC code path to write to a shared memory region.
>
> To the Java developers, could we create some JIRA issues (if there are
> not already) around Java shared memory IPC?
>
> - Wes
> On Sat, Sep 1, 2018 at 2:32 PM ALBERTO Bocchinfuso
> <al...@hotmail.it> wrote:
> >
> > I want to reinforce this request. I am interested in the same topic.
> >
> > I’d like an example specially focused on the creation of a RecordBatch
> passed from a Java program to a python one and vice-versa.
> >
> >
> >
> > Thanks,
> >
> > Alberto
> >
> >
> >
> > ________________________________
> > Da: Clive Cox <cc...@seldon.io>
> > Inviato: Saturday, September 1, 2018 6:12:29 PM
> > A: user@arrow.apache.org
> > Oggetto: IPC Example
> >
> > Hi,
> >
> >  Is there any example of how to do say Java - Python IPC? I'm not sure
> how to get started.
> >
> >  I'm thinking of using Arrow IPC to replace REST/gRPC APIs for
> communication when everything can be run on a single computer node and
> low-latency is the goal - hoping to remove the cost of
> serialization/deserialization and network costs. Would this make sense.
> >
> >  Thanks,
> >
> >  Clive
> >
>

Re: IPC Example

Posted by Pearu Peterson <pe...@quansight.com>.
Hi,

I'd also like to enforce the raised question, in particular, it would be
very useful to have basic examples of IPC between same or different
languages, including C/C++, Python, Java, etc.

Whatever combination of languages is used, the principles of IPC should be
the same. For instance, in Python-Java or Python-Python IPC cases, the
Python code should not depend on in what language is written the code
running another process. Is this understanding correct?

Btw, while reading the arrow tests, I noticed a comment

  IPC only supported on Linux

in
https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda-test.cc#L126

Does this restriction apply only for CUDA IPC or is the comment more
general?
What would it take to add IPC support for Windows or OSX?

Best regards,
Pearu

On Sat, Sep 1, 2018 at 9:46 PM, Wes McKinney <we...@gmail.com> wrote:

> + dev@
>
> There's several examples of sending record batches from Java to Python
> (and vice versa) over a network socket -- e.g. Jacques and I are
> working on a prototype of a general purpose Arrow-native RPC framework
> in Java and C++ respectively. Where there's some R&D needed is in Java
> interactions with shared memory. So if you want to do a zero copy read
> from a memory mapped file, then some development in the Arrow Java
> libraries is required.
>
> I'm not an expert but it seems like Netty has a mechanism to interact
> with ByteBuffer, which should include MappedByteBuffer
>
> https://github.com/netty/netty/blob/4.1/buffer/src/
> main/java/io/netty/buffer/ReadOnlyUnsafeDirectByteBuf.java#L25
>
> Correspondingly, an interface could be developed to enable the Java
> IPC code path to write to a shared memory region.
>
> To the Java developers, could we create some JIRA issues (if there are
> not already) around Java shared memory IPC?
>
> - Wes
> On Sat, Sep 1, 2018 at 2:32 PM ALBERTO Bocchinfuso
> <al...@hotmail.it> wrote:
> >
> > I want to reinforce this request. I am interested in the same topic.
> >
> > I’d like an example specially focused on the creation of a RecordBatch
> passed from a Java program to a python one and vice-versa.
> >
> >
> >
> > Thanks,
> >
> > Alberto
> >
> >
> >
> > ________________________________
> > Da: Clive Cox <cc...@seldon.io>
> > Inviato: Saturday, September 1, 2018 6:12:29 PM
> > A: user@arrow.apache.org
> > Oggetto: IPC Example
> >
> > Hi,
> >
> >  Is there any example of how to do say Java - Python IPC? I'm not sure
> how to get started.
> >
> >  I'm thinking of using Arrow IPC to replace REST/gRPC APIs for
> communication when everything can be run on a single computer node and
> low-latency is the goal - hoping to remove the cost of
> serialization/deserialization and network costs. Would this make sense.
> >
> >  Thanks,
> >
> >  Clive
> >
>