You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by li...@apache.org on 2022/04/07 18:59:09 UTC

[arrow] branch master updated: ARROW-15578: [Java][Doc] Document C Data Interface and how to interface with other languages

This is an automated email from the ASF dual-hosted git repository.

lidavidm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new f0b5c49a0d ARROW-15578: [Java][Doc] Document C Data Interface and how to interface with other languages
f0b5c49a0d is described below

commit f0b5c49a0d60b5240b584ea27773a210aa7cead2
Author: david dali susanibar arce <da...@gmail.com>
AuthorDate: Thu Apr 7 14:58:48 2022 -0400

    ARROW-15578: [Java][Doc] Document C Data Interface and how to interface with other languages
    
    Document C Data and how to interface with other languages
    - Java - Python
    - Java - C++
    
    Closes #12794 from davisusanibar/java-tutorial-ARROW-15578
    
    Lead-authored-by: david dali susanibar arce <da...@gmail.com>
    Co-authored-by: David Li <li...@gmail.com>
    Signed-off-by: David Li <li...@gmail.com>
---
 docs/source/java/cdata.rst | 223 +++++++++++++++++++++++++++++++++++++++++++++
 docs/source/java/index.rst |   1 +
 2 files changed, 224 insertions(+)

diff --git a/docs/source/java/cdata.rst b/docs/source/java/cdata.rst
new file mode 100644
index 0000000000..e5ba387750
--- /dev/null
+++ b/docs/source/java/cdata.rst
@@ -0,0 +1,223 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================
+C Data Interface
+================
+
+.. contents::
+
+Arrow supports exchanging data without copying or serialization within the same process
+through the :ref:`c-data-interface`, even between different language runtimes.
+
+Java to Python
+--------------
+
+Use this guide to implement :doc:`Java to Python <../python/integration/python_java.rst>`
+communication using the C Data Interface.
+
+Java to C++
+-----------
+
+Example: Share an Int64 array from C++ to Java:
+
+**C++ Side**
+
+Use this guide to :doc:`compile arrow <../developers/cpp/building.rst>` library:
+
+.. code-block:: shell
+
+    $ git clone https://github.com/apache/arrow.git
+    $ cd arrow/cpp
+    $ mkdir build   # from inside the `cpp` subdirectory
+    $ cd build
+    $ cmake .. --preset ninja-debug-minimal
+    $ cmake --build .
+    $ tree debug/
+    debug/
+    ├── libarrow.800.0.0.dylib
+    ├── libarrow.800.dylib -> libarrow.800.0.0.dylib
+    └── libarrow.dylib -> libarrow.800.dylib
+
+Implement a function in CDataCppBridge.h that exports an array via the C Data Interface:
+
+.. code-block:: cpp
+
+    #include <iostream>
+    #include <arrow/api.h>
+    #include <arrow/c/bridge.h>
+
+    void FillInt64Array(const uintptr_t c_schema_ptr, const uintptr_t c_array_ptr) {
+        arrow::Int64Builder builder;
+        builder.Append(1);
+        builder.Append(2);
+        builder.Append(3);
+        builder.AppendNull();
+        builder.Append(5);
+        builder.Append(6);
+        builder.Append(7);
+        builder.Append(8);
+        builder.Append(9);
+        builder.Append(10);
+        std::shared_ptr<arrow::Array> array = *builder.Finish();
+
+        struct ArrowSchema* c_schema = reinterpret_cast<struct ArrowSchema*>(c_schema_ptr);
+        auto c_schema_status = arrow::ExportType(*array->type(), c_schema);
+        if (!c_schema_status.ok()) c_schema_status.Abort();
+
+        struct ArrowArray* c_array = reinterpret_cast<struct ArrowArray*>(c_array_ptr);
+        auto c_array_status = arrow::ExportArray(*array, c_array);
+        if (!c_array_status.ok()) c_array_status.Abort();
+    }
+
+**Java Side**
+
+For this example, we will use `JavaCPP`_ to call our C++ function from Java,
+without writing JNI bindings ourselves.
+
+.. code-block:: xml
+
+    <?xml version="1.0" encoding="UTF-8"?>
+    <project xmlns="http://maven.apache.org/POM/4.0.0"
+             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+        <modelVersion>4.0.0</modelVersion>
+
+        <groupId>org.example</groupId>
+        <artifactId>java-cdata-example</artifactId>
+        <version>1.0-SNAPSHOT</version>
+
+        <properties>
+            <maven.compiler.source>8</maven.compiler.source>
+            <maven.compiler.target>8</maven.compiler.target>
+            <arrow.version>8.0.0</arrow.version>
+        </properties>
+        <dependencies>
+            <dependency>
+                <groupId>org.bytedeco</groupId>
+                <artifactId>javacpp</artifactId>
+                <version>1.5.7</version>
+            </dependency>
+            <dependency>
+                <groupId>org.apache.arrow</groupId>
+                <artifactId>arrow-c-data</artifactId>
+                <version>${arrow.version}</version>
+            </dependency>
+            <dependency>
+                <groupId>org.apache.arrow</groupId>
+                <artifactId>arrow-vector</artifactId>
+                <version>${arrow.version}</version>
+            </dependency>
+            <dependency>
+                <groupId>org.apache.arrow</groupId>
+                <artifactId>arrow-memory-core</artifactId>
+                <version>${arrow.version}</version>
+            </dependency>
+            <dependency>
+                <groupId>org.apache.arrow</groupId>
+                <artifactId>arrow-memory-netty</artifactId>
+                <version>${arrow.version}</version>
+            </dependency>
+            <dependency>
+                <groupId>org.apache.arrow</groupId>
+                <artifactId>arrow-format</artifactId>
+                <version>${arrow.version}</version>
+            </dependency>
+        </dependencies>
+    </project>
+
+.. code-block:: java
+
+    import org.bytedeco.javacpp.annotation.Platform;
+    import org.bytedeco.javacpp.annotation.Properties;
+    import org.bytedeco.javacpp.tools.InfoMap;
+    import org.bytedeco.javacpp.tools.InfoMapper;
+
+    @Properties(
+            target = "CDataJavaToCppExample",
+            value = @Platform(
+                    include = {
+                            "CDataCppBridge.h"
+                    },
+                    compiler = {"cpp11"},
+                    linkpath = {"/arrow/cpp/build/debug/"},
+                    link = {"arrow"}
+            )
+    )
+    public class CDataJavaConfig implements InfoMapper {
+
+        @Override
+        public void map(InfoMap infoMap) {
+        }
+    }
+
+.. code-block:: shell
+
+    # Compile our Java code
+    $ javac -cp javacpp-1.5.7.jar CDataJavaConfig.java
+
+    # Generate CDataInterfaceLibrary
+    $ java -jar javacpp-1.5.7.jar CDataJavaConfig.java
+
+    # Generate libjniCDataInterfaceLibrary.dylib
+    $ java -jar javacpp-1.5.7.jar CDataJavaToCppExample.java
+
+    # Validate libjniCDataInterfaceLibrary.dylib created
+    $ otool -L macosx-x86_64/libjniCDataJavaToCppExample.dylib
+    macosx-x86_64/libjniCDataJavaToCppExample.dylib:
+        libjniCDataJavaToCppExample.dylib (compatibility version 0.0.0, current version 0.0.0)
+        @rpath/libarrow.800.dylib (compatibility version 800.0.0, current version 800.0.0)
+        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
+        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
+
+**Java Test**
+
+Let's create a Java class to test our bridge:
+
+.. code-block:: java
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.BigIntVector;
+
+    public class TestCDataInterface {
+        public static void main(String[] args) {
+            try(
+                BufferAllocator allocator = new RootAllocator();
+                ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator);
+                ArrowArray arrowArray = ArrowArray.allocateNew(allocator)
+            ){
+                CDataJavaToCppExample.FillInt64Array(
+                        arrowSchema.memoryAddress(), arrowArray.memoryAddress());
+                try(
+                    BigIntVector bigIntVector = (BigIntVector) Data.importVector(
+                            allocator, arrowArray, arrowSchema, null)
+                ){
+                    System.out.println("C++-allocated array: " + bigIntVector);
+                }
+            }
+        }
+    }
+
+.. code-block:: shell
+
+    C++-allocated array: [1, 2, 3, null, 5, 6, 7, 8, 9, 10]
+
+.. _`JavaCPP`: https://github.com/bytedeco/javacpp
\ No newline at end of file
diff --git a/docs/source/java/index.rst b/docs/source/java/index.rst
index 94afd7b82d..5b5265f327 100644
--- a/docs/source/java/index.rst
+++ b/docs/source/java/index.rst
@@ -33,4 +33,5 @@ on the Arrow format and other language bindings see the :doc:`parent documentati
    ipc
    algorithm
    dataset
+   cdata
    Reference (javadoc) <reference/index>