You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 12:24:48 UTC

[GitHub] [arrow] amol- commented on a change in pull request #12635: ARROW-14672: [Docs] Document how to exchange data between Python and Java

amol- commented on a change in pull request #12635:
URL: https://github.com/apache/arrow/pull/12635#discussion_r833197301



##########
File path: docs/source/python/integration/python_java.rst
##########
@@ -0,0 +1,429 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Integrating PyArrow with Java
+=============================
+
+Arrow supports exchanging data within the same process through the
+:ref:`c-data-interface`.
+
+This can be used to exchange data between Python and Java functions and
+methods so that the two languages can interact without any cost of
+marshaling and unmarshaling data.
+
+.. note::
+
+    The article takes for granted that you have a ``Python`` environment
+    with ``pyarrow`` correctly installed and a ``Java`` environment with
+    ``arrow`` library correctly installed. 
+    The ``Java`` version must have been compiled with ``mvn -Parrow-c-data`` to
+    ensure CData exchange support is enabled.
+    See `Python Install Instructions <https://arrow.apache.org/docs/python/install.html>`_
+    and `Java Documentation <https://arrow.apache.org/docs/java/>`_
+    for further details.
+
+Invoking Java methods from Python
+---------------------------------
+
+Suppose we have a simple Java class providing a number as its output:
+
+.. code-block:: Java
+
+    public class Simple {
+        public static int getNumber() {
+            return 4;
+        }
+    }
+
+We would save such class in the ``Simple.java`` file and proceed with
+compiling it to ``Simple.class`` using ``javac Simple.java``.
+
+Once the ``Simple.class`` file is created we can use the class
+from Python using the 
+`JPype <https://jpype.readthedocs.io/>`_ library which
+enables a Java runtime within the Python interpreter.
+
+``jpype1`` can be installed using ``pip`` like most Python libraries
+
+.. code-block:: bash
+
+    $ pip install jpype1
+
+The most basic thing we can do with our ``Simple`` class is to
+use the ``Simple.getNumber`` method from Python and see 
+if it will return the result.
+
+To do so we can create a ``simple.py`` file which uses ``jpype`` to
+import the ``Simple`` class from ``Simple.class`` file and invoke 
+the ``Simple.getNumber`` method:
+
+.. code-block:: python
+
+    import jpype
+    from jpype.types import *
+
+    jpype.startJVM(classpath=["./"])
+
+    Simple = JClass('Simple')
+
+    print(Simple.getNumber())
+
+Running the ``simple.py`` file will show how our Python code is able
+to access the ``Java`` method and print the expected result:
+
+.. code-block:: bash
+
+    $ python simple.py 
+    4
+
+Java to Python using pyarrow.jvm
+--------------------------------
+
+PyArrow provides a ``pyarrow.jvm`` module that makes easier to
+interact with Java classes and convert the Java objects to actual
+Python objects.
+
+To showcase ``pyarrow.jvm`` we could create a more complex
+class, named ``FillTen.java``
+
+.. code-block:: java
+
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.BigIntVector;
+    import org.apache.arrow.vector.ValueVector;
+
+
+    public class FillTen {
+        static RootAllocator allocator = new RootAllocator();
+
+        public static ValueVector createArray() {
+            BigIntVector intVector = new BigIntVector("ints", allocator);
+            intVector.allocateNew(10);
+            intVector.setValueCount(10);
+            FillTen.fillValueVector(intVector);
+            return intVector;
+        }
+
+        private static void fillValueVector(ValueVector v) {
+            BigIntVector iv = (BigIntVector)v;
+            iv.setSafe(0, 1);
+            iv.setSafe(1, 2);
+            iv.setSafe(2, 3);
+            iv.setSafe(3, 4);
+            iv.setSafe(4, 5);
+            iv.setSafe(5, 6);
+            iv.setSafe(6, 7);
+            iv.setSafe(7, 8);
+            iv.setSafe(8, 9);
+            iv.setSafe(9, 10);
+        }
+    }
+
+This class provides a public ``createArray`` method that anyone can invoke
+to get back an array containing numbers from 1 to 10. 
+
+Given that this class now has a dependency on a bunch of packages,
+compiling it with ``javac`` is not enough anymore. We need to create
+a dedicated ``pom.xml`` file where we can collect the dependencies:
+
+.. code-block:: xml
+
+    <project>
+        <modelVersion>4.0.0</modelVersion>
+        
+        <groupId>org.apache.arrow.py2java</groupId>
+        <artifactId>FillTen</artifactId>
+        <version>1</version>
+
+        <properties>
+            <maven.compiler.source>1.7</maven.compiler.source>
+            <maven.compiler.target>1.7</maven.compiler.target>
+        </properties> 
+
+        <dependencies>
+            <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-memory</artifactId>
+            <version>8.0.0-SNAPSHOT</version>
+            <type>pom</type>
+            </dependency>
+            <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-memory-netty</artifactId>
+            <version>8.0.0-SNAPSHOT</version>
+            <type>jar</type>
+            </dependency>
+            <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-vector</artifactId>
+            <version>8.0.0-SNAPSHOT</version>
+            <type>pom</type>
+            </dependency> 
+            <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-c-data</artifactId>
+            <version>8.0.0-SNAPSHOT</version>
+            <type>jar</type>
+            </dependency>
+        </dependencies>

Review comment:
       Addressed, just used version 8.0.0 as the docs will publish when 8.0.0 is released.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org