You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Lu Qi (JIRA)" <ji...@apache.org> on 2017/11/07 15:31:01 UTC

[jira] [Commented] (ARROW-1163) [Plasma] Java client for Plasma

    [ https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242190#comment-16242190 ] 

Lu Qi  commented on ARROW-1163:
-------------------------------

Hi,Philipp Moritz,
I've been working on reading and writing Tensor in Java for several weeks. I've got Tensor structure like this:
Class Tensor{ private float[] storage; private int[] shape }
I used JNI to leverage plasma C++ client . One good thing is when writing tensor ,there is 
"getPrimitiveArrayCritical" method which gets the address in Java heap (based on vm impletation),
thus I can construct Tensor in C++ easily without copying, although it stops GC in this process, but 
plasma writing is non blocking. On the other side of the world, when reading tensor , I need to copy 
the share memory into java heap, this will cost time.  So, in order to save reading time , pure Java 
client may be a good choice. 

As to pure Java client , may be we can use jni to get fd first and construct a FileDescriptor .
https://stackoverflow.com/questions/4845122/using-a-numbered-file-descriptor-from-java 


> [Plasma] Java client for Plasma
> -------------------------------
>
>                 Key: ARROW-1163
>                 URL: https://issues.apache.org/jira/browse/ARROW-1163
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Philipp Moritz
>
> We should start thinking about how a Java client for plasma would look like. Given the focus of arrow to support Python, C++ and Java really well, it is the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with JNI and wrap the C++ client or (in my opinion preferable) write a pure Java client. It would communicate with the Plasma store via Java flatbuffers over sockets.
> It seems that the only thing blocking a pure Java client at the moment is the way we ship file descriptors for the memory mapped files between store and client (see the file fling.cc in the Plasma repo). We would need to get rid of that because there is no pure Java API that allows transferring file descriptors over a process boundary. So the way to transfer memory mapped files over process boundaries then is probably to use the file system and keep the memory mapped files in the file system instead of unlinking them immediately (as we do at the moment), so they can be opened by the client process via their path.
> The challenge in this case is how to clean the files up and make sure they are not lying around if the plasma store crashes. One option is to store the plasma store PID with the file (i.e. as part of the file name) and let the plasma store clean them up the next time it is started); maybe there is OS level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has free cycles, they should feel free to chime in. Also opinions on the design are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)