You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "周宇睿(闻拙)" <yu...@alibaba-inc.com> on 2018/07/16 11:14:57 UTC

Passing Arrow object across language

Hi guys:

I might miss something quite obviously. But how does Arrow passing objects across language? Let’s say I have a java program that invoke a c++ function via JNI, how does the c++ function pass an Arrow RecordBack object back to Java runtime without memory copy?

Any advise would be appreciated.
Thanks
Yurui 

from Alimail macOS

Re: Re: Passing Arrow object across language

Posted by Masayuki Takahashi <ma...@gmail.com>.
Hi Yurui,

>  Let’s say passed memory addresses from c++ to JVM and constructed the data structure in Java.

I think that you should add a method to release off heap the memory
area on the native side and call it on the Java side in this case.

In Gandiva, it seems to allocate the memory area of return values on
the Java side, pass its pointer to native.
https://github.com/dremio/gandiva/blob/master/java/src/test/java/org/apache/arrow/gandiva/evaluator/NativeEvaluatorTest.java#L142
https://github.com/dremio/gandiva/blob/master/java/src/main/java/org/apache/arrow/gandiva/evaluator/NativeEvaluator.java#L152

And release it on Java side.
https://github.com/dremio/gandiva/blob/master/java/src/test/java/org/apache/arrow/gandiva/evaluator/NativeEvaluatorTest.java#L158

> how could I make sure the memory will be released when necessary?

I think you need to use the memory diagnostic tool.
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks005.html

thanks.
2018年7月20日(金) 9:11 Wes McKinney <we...@gmail.com>:
>
> hi Yurui,
>
> I don't know if anyone has worked out a way to permanently transfer
> ownership from memory allocated by the Java or C++ libraries. This is
> partially what Plasma is for. I am not sure how Gandiva is currently
> dealing with memory management between Java and C++ in-process. If
> someone wants to comment that would be helpful.
>
> - Wes
>
> On Tue, Jul 17, 2018 at 12:19 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:
> > Hi Wes:
> >
> > Thank you for the response. Yes the examples you provided are very helpful.
> >
> > But I still have a question regarding memory management. Let’s say passed
> > memory addresses from c++ to JVM and constructed the data structure in Java.
> > Since this is an off heap memory, how could I make sure the memory will be
> > released when necessary?
> >
> > thanks
> > Yurui
> >
> > from Alimail macOS
> >
> > ------------------Original Mail ------------------
> > Sender:Wes McKinney <we...@gmail.com>
> > Send Date:Tue Jul 17 02:09:51 2018
> > Recipients: <de...@arrow.apache.org>
> > Subject:Re: Passing Arrow object across language
> >>
> >> I discussed some of these things at a high level in my talk at SciPy
> >> 2018 last week
> >>
> >>
> >> https://www.slideshare.net/wesm/apache-arrow-crosslanguage-development-platform-for-inmemory-data-105427919
> >>
> >> On Mon, Jul 16, 2018 at 2:08 PM, Wes McKinney <we...@gmail.com> wrote:
> >> > hi Yurui,
> >> >
> >> > You can also share data structures through JNI without using the IPC
> >> > tools at all, which could require memory copying to produce the IPC
> >> > messages.
> >> >
> >> > What you can do is obtain the memory addresses for the component
> >> > buffers of an array (or vector, as called in Java) and construct the
> >> > data structure from the memory addresses on the other side. We are
> >> > doing exactly this already in Python using JPype (which is JNI-based):
> >> >
> >> > https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py
> >> >
> >> > The Gandiva project uses JNI to pass Java Netty buffer memory
> >> > addresses to C++, you can see the code for creating the arrays from
> >> > the memory addresses and then constructing a RecordBatch:
> >> >
> >> >
> >> > https://github.com/dremio/gandiva/blob/master/cpp/src/jni/native_builder.cc#L602
> >> >
> >> > I believe as time goes on we will have better and more standardized
> >> > APIs to deal with JNI<->C++ zero-copy passing, these implementations
> >> > have only been done relatively recently. Your contributions to the
> >> > Arrow project around this would be most welcomed!
> >> >
> >> > Thanks,
> >> > Wes
> >> >
> >> > On Mon, Jul 16, 2018 at 2:00 PM, Philipp Moritz <pc...@gmail.com>
> >> > wrote:
> >> >> Hey Yuri,
> >> >>
> >> >> you can use the Arrow IPC mechanism to do this:
> >> >>
> >> >> - https://github.com/apache/arrow/blob/master/format/IPC.md
> >> >> - Python: https://arrow.apache.org/docs/python/ipc.html
> >> >> - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
> >> >> - For Java, see the org.apache.arrow.vector.ipc namespace
> >> >>
> >> >> On the C++ side, you can for example use a RecordBatchStreamWriter to
> >> >> write
> >> >> the IPC message, and then on the Java side you could use the
> >> >> ArrowStreamReader to read it.
> >> >>
> >> >> There are some tests here:
> >> >>
> >> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
> >> >>
> >> >> https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc
> >> >>
> >> >> There is also integration tests here, although I'm not really familiar
> >> >> with
> >> >> them:
> >> >>
> >> >> https://github.com/apache/arrow/tree/master/integration
> >> >>
> >> >> If you could write a little tutorial/into on how to do this (maybe
> >> >> using
> >> >> Plasma for exchanging the data) and contribute it to the documentation,
> >> >> that would be amazing!
> >> >>
> >> >> Best,
> >> >> Philipp.
> >> >>
> >> >> On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com>
> >> >> wrote:
> >> >>
> >> >>> Hi guys:
> >> >>>
> >> >>> I might miss something quite obviously. But how does Arrow passing
> >> >>> objects
> >> >>> across language? Let’s say I have a java program that invoke a c++
> >> >>> function
> >> >>> via JNI, how does the c++ function pass an Arrow RecordBack object
> >> >>> back to
> >> >>> Java runtime without memory copy?
> >> >>>
> >> >>> Any advise would be appreciated.
> >> >>> Thanks
> >> >>> Yurui
> >> >>>
> >> >>> from Alimail macOS



-- 
高橋 真之

Re: Re: Passing Arrow object across language

Posted by Wes McKinney <we...@gmail.com>.
hi Yurui,

I don't know if anyone has worked out a way to permanently transfer
ownership from memory allocated by the Java or C++ libraries. This is
partially what Plasma is for. I am not sure how Gandiva is currently
dealing with memory management between Java and C++ in-process. If
someone wants to comment that would be helpful.

- Wes

On Tue, Jul 17, 2018 at 12:19 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:
> Hi Wes:
>
> Thank you for the response. Yes the examples you provided are very helpful.
>
> But I still have a question regarding memory management. Let’s say passed
> memory addresses from c++ to JVM and constructed the data structure in Java.
> Since this is an off heap memory, how could I make sure the memory will be
> released when necessary?
>
> thanks
> Yurui
>
> from Alimail macOS
>
> ------------------Original Mail ------------------
> Sender:Wes McKinney <we...@gmail.com>
> Send Date:Tue Jul 17 02:09:51 2018
> Recipients: <de...@arrow.apache.org>
> Subject:Re: Passing Arrow object across language
>>
>> I discussed some of these things at a high level in my talk at SciPy
>> 2018 last week
>>
>>
>> https://www.slideshare.net/wesm/apache-arrow-crosslanguage-development-platform-for-inmemory-data-105427919
>>
>> On Mon, Jul 16, 2018 at 2:08 PM, Wes McKinney <we...@gmail.com> wrote:
>> > hi Yurui,
>> >
>> > You can also share data structures through JNI without using the IPC
>> > tools at all, which could require memory copying to produce the IPC
>> > messages.
>> >
>> > What you can do is obtain the memory addresses for the component
>> > buffers of an array (or vector, as called in Java) and construct the
>> > data structure from the memory addresses on the other side. We are
>> > doing exactly this already in Python using JPype (which is JNI-based):
>> >
>> > https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py
>> >
>> > The Gandiva project uses JNI to pass Java Netty buffer memory
>> > addresses to C++, you can see the code for creating the arrays from
>> > the memory addresses and then constructing a RecordBatch:
>> >
>> >
>> > https://github.com/dremio/gandiva/blob/master/cpp/src/jni/native_builder.cc#L602
>> >
>> > I believe as time goes on we will have better and more standardized
>> > APIs to deal with JNI<->C++ zero-copy passing, these implementations
>> > have only been done relatively recently. Your contributions to the
>> > Arrow project around this would be most welcomed!
>> >
>> > Thanks,
>> > Wes
>> >
>> > On Mon, Jul 16, 2018 at 2:00 PM, Philipp Moritz <pc...@gmail.com>
>> > wrote:
>> >> Hey Yuri,
>> >>
>> >> you can use the Arrow IPC mechanism to do this:
>> >>
>> >> - https://github.com/apache/arrow/blob/master/format/IPC.md
>> >> - Python: https://arrow.apache.org/docs/python/ipc.html
>> >> - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
>> >> - For Java, see the org.apache.arrow.vector.ipc namespace
>> >>
>> >> On the C++ side, you can for example use a RecordBatchStreamWriter to
>> >> write
>> >> the IPC message, and then on the Java side you could use the
>> >> ArrowStreamReader to read it.
>> >>
>> >> There are some tests here:
>> >>
>> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
>> >>
>> >> https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc
>> >>
>> >> There is also integration tests here, although I'm not really familiar
>> >> with
>> >> them:
>> >>
>> >> https://github.com/apache/arrow/tree/master/integration
>> >>
>> >> If you could write a little tutorial/into on how to do this (maybe
>> >> using
>> >> Plasma for exchanging the data) and contribute it to the documentation,
>> >> that would be amazing!
>> >>
>> >> Best,
>> >> Philipp.
>> >>
>> >> On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com>
>> >> wrote:
>> >>
>> >>> Hi guys:
>> >>>
>> >>> I might miss something quite obviously. But how does Arrow passing
>> >>> objects
>> >>> across language? Let’s say I have a java program that invoke a c++
>> >>> function
>> >>> via JNI, how does the c++ function pass an Arrow RecordBack object
>> >>> back to
>> >>> Java runtime without memory copy?
>> >>>
>> >>> Any advise would be appreciated.
>> >>> Thanks
>> >>> Yurui
>> >>>
>> >>> from Alimail macOS

Re: Re: Passing Arrow object across language

Posted by "周宇睿(闻拙)" <yu...@alibaba-inc.com>.
Hi Wes:

Thank you for the response. Yes the examples you provided are very helpful. 

But I still have a question regarding memory management. Let’s say passed memory addresses from c++ to JVM and constructed the data structure in Java. Since this is an off heap memory, how could I make sure the memory will be released when necessary?

thanks
Yurui

from Alimail macOS
 ------------------Original Mail ------------------
Sender:Wes McKinney <we...@gmail.com>
Send Date:Tue Jul 17 02:09:51 2018
Recipients: <de...@arrow.apache.org>
Subject:Re: Passing Arrow object across language
I discussed some of these things at a high level in my talk at SciPy
2018 last week

https://www.slideshare.net/wesm/apache-arrow-crosslanguage-development-platform-for-inmemory-data-105427919

On Mon, Jul 16, 2018 at 2:08 PM, Wes McKinney <we...@gmail.com> wrote:
> hi Yurui,
>
> You can also share data structures through JNI without using the IPC
> tools at all, which could require memory copying to produce the IPC
> messages.
>
> What you can do is obtain the memory addresses for the component
> buffers of an array (or vector, as called in Java) and construct the
> data structure from the memory addresses on the other side. We are
> doing exactly this already in Python using JPype (which is JNI-based):
>
> https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py
>
> The Gandiva project uses JNI to pass Java Netty buffer memory
> addresses to C++, you can see the code for creating the arrays from
> the memory addresses and then constructing a RecordBatch:
>
> https://github.com/dremio/gandiva/blob/master/cpp/src/jni/native_builder.cc#L602
>
> I believe as time goes on we will have better and more standardized
> APIs to deal with JNI<->C++ zero-copy passing, these implementations
> have only been done relatively recently. Your contributions to the
> Arrow project around this would be most welcomed!
>
> Thanks,
> Wes
>
> On Mon, Jul 16, 2018 at 2:00 PM, Philipp Moritz <pc...@gmail.com> wrote:
>> Hey Yuri,
>>
>> you can use the Arrow IPC mechanism to do this:
>>
>> - https://github.com/apache/arrow/blob/master/format/IPC.md
>> - Python: https://arrow.apache.org/docs/python/ipc.html
>> - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
>> - For Java, see the org.apache.arrow.vector.ipc namespace
>>
>> On the C++ side, you can for example use a RecordBatchStreamWriter to write
>> the IPC message, and then on the Java side you could use the
>> ArrowStreamReader to read it.
>>
>> There are some tests here:
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
>> https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc
>>
>> There is also integration tests here, although I'm not really familiar with
>> them:
>>
>> https://github.com/apache/arrow/tree/master/integration
>>
>> If you could write a little tutorial/into on how to do this (maybe using
>> Plasma for exchanging the data) and contribute it to the documentation,
>> that would be amazing!
>>
>> Best,
>> Philipp.
>>
>> On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:
>>
>>> Hi guys:
>>>
>>> I might miss something quite obviously. But how does Arrow passing objects
>>> across language? Let’s say I have a java program that invoke a c++ function
>>> via JNI, how does the c++ function pass an Arrow RecordBack object back to
>>> Java runtime without memory copy?
>>>
>>> Any advise would be appreciated.
>>> Thanks
>>> Yurui
>>>
>>> from Alimail macOS

Re: Passing Arrow object across language

Posted by Wes McKinney <we...@gmail.com>.
I discussed some of these things at a high level in my talk at SciPy
2018 last week

https://www.slideshare.net/wesm/apache-arrow-crosslanguage-development-platform-for-inmemory-data-105427919

On Mon, Jul 16, 2018 at 2:08 PM, Wes McKinney <we...@gmail.com> wrote:
> hi Yurui,
>
> You can also share data structures through JNI without using the IPC
> tools at all, which could require memory copying to produce the IPC
> messages.
>
> What you can do is obtain the memory addresses for the component
> buffers of an array (or vector, as called in Java) and construct the
> data structure from the memory addresses on the other side. We are
> doing exactly this already in Python using JPype (which is JNI-based):
>
> https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py
>
> The Gandiva project uses JNI to pass Java Netty buffer memory
> addresses to C++, you can see the code for creating the arrays from
> the memory addresses and then constructing a RecordBatch:
>
> https://github.com/dremio/gandiva/blob/master/cpp/src/jni/native_builder.cc#L602
>
> I believe as time goes on we will have better and more standardized
> APIs to deal with JNI<->C++ zero-copy passing, these implementations
> have only been done relatively recently. Your contributions to the
> Arrow project around this would be most welcomed!
>
> Thanks,
> Wes
>
> On Mon, Jul 16, 2018 at 2:00 PM, Philipp Moritz <pc...@gmail.com> wrote:
>> Hey Yuri,
>>
>> you can use the Arrow IPC mechanism to do this:
>>
>> - https://github.com/apache/arrow/blob/master/format/IPC.md
>> - Python: https://arrow.apache.org/docs/python/ipc.html
>> - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
>> - For Java, see the org.apache.arrow.vector.ipc namespace
>>
>> On the C++ side, you can for example use a RecordBatchStreamWriter to write
>> the IPC message, and then on the Java side you could use the
>> ArrowStreamReader to read it.
>>
>> There are some tests here:
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
>> https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc
>>
>> There is also integration tests here, although I'm not really familiar with
>> them:
>>
>> https://github.com/apache/arrow/tree/master/integration
>>
>> If you could write a little tutorial/into on how to do this (maybe using
>> Plasma for exchanging the data) and contribute it to the documentation,
>> that would be amazing!
>>
>> Best,
>> Philipp.
>>
>> On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:
>>
>>> Hi guys:
>>>
>>> I might miss something quite obviously. But how does Arrow passing objects
>>> across language? Let’s say I have a java program that invoke a c++ function
>>> via JNI, how does the c++ function pass an Arrow RecordBack object back to
>>> Java runtime without memory copy?
>>>
>>> Any advise would be appreciated.
>>> Thanks
>>> Yurui
>>>
>>> from Alimail macOS

Re: Passing Arrow object across language

Posted by Wes McKinney <we...@gmail.com>.
hi Yurui,

You can also share data structures through JNI without using the IPC
tools at all, which could require memory copying to produce the IPC
messages.

What you can do is obtain the memory addresses for the component
buffers of an array (or vector, as called in Java) and construct the
data structure from the memory addresses on the other side. We are
doing exactly this already in Python using JPype (which is JNI-based):

https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py

The Gandiva project uses JNI to pass Java Netty buffer memory
addresses to C++, you can see the code for creating the arrays from
the memory addresses and then constructing a RecordBatch:

https://github.com/dremio/gandiva/blob/master/cpp/src/jni/native_builder.cc#L602

I believe as time goes on we will have better and more standardized
APIs to deal with JNI<->C++ zero-copy passing, these implementations
have only been done relatively recently. Your contributions to the
Arrow project around this would be most welcomed!

Thanks,
Wes

On Mon, Jul 16, 2018 at 2:00 PM, Philipp Moritz <pc...@gmail.com> wrote:
> Hey Yuri,
>
> you can use the Arrow IPC mechanism to do this:
>
> - https://github.com/apache/arrow/blob/master/format/IPC.md
> - Python: https://arrow.apache.org/docs/python/ipc.html
> - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
> - For Java, see the org.apache.arrow.vector.ipc namespace
>
> On the C++ side, you can for example use a RecordBatchStreamWriter to write
> the IPC message, and then on the Java side you could use the
> ArrowStreamReader to read it.
>
> There are some tests here:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
> https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc
>
> There is also integration tests here, although I'm not really familiar with
> them:
>
> https://github.com/apache/arrow/tree/master/integration
>
> If you could write a little tutorial/into on how to do this (maybe using
> Plasma for exchanging the data) and contribute it to the documentation,
> that would be amazing!
>
> Best,
> Philipp.
>
> On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:
>
>> Hi guys:
>>
>> I might miss something quite obviously. But how does Arrow passing objects
>> across language? Let’s say I have a java program that invoke a c++ function
>> via JNI, how does the c++ function pass an Arrow RecordBack object back to
>> Java runtime without memory copy?
>>
>> Any advise would be appreciated.
>> Thanks
>> Yurui
>>
>> from Alimail macOS

Re: Passing Arrow object across language

Posted by Philipp Moritz <pc...@gmail.com>.
Hey Yuri,

you can use the Arrow IPC mechanism to do this:

- https://github.com/apache/arrow/blob/master/format/IPC.md
- Python: https://arrow.apache.org/docs/python/ipc.html
- C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html
- For Java, see the org.apache.arrow.vector.ipc namespace

On the C++ side, you can for example use a RecordBatchStreamWriter to write
the IPC message, and then on the Java side you could use the
ArrowStreamReader to read it.

There are some tests here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc
https://github.com/apache/arrow/tree/master/java/vector/src/test/java/org/apache/arrow/vector/ipc

There is also integration tests here, although I'm not really familiar with
them:

https://github.com/apache/arrow/tree/master/integration

If you could write a little tutorial/into on how to do this (maybe using
Plasma for exchanging the data) and contribute it to the documentation,
that would be amazing!

Best,
Philipp.

On Mon, Jul 16, 2018 at 4:14 AM, 周宇睿(闻拙) <yu...@alibaba-inc.com> wrote:

> Hi guys:
>
> I might miss something quite obviously. But how does Arrow passing objects
> across language? Let’s say I have a java program that invoke a c++ function
> via JNI, how does the c++ function pass an Arrow RecordBack object back to
> Java runtime without memory copy?
>
> Any advise would be appreciated.
> Thanks
> Yurui
>
> from Alimail macOS