You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "vibhatha (via GitHub)" <gi...@apache.org> on 2023/09/05 18:18:33 UTC

[GitHub] [arrow-cookbook] vibhatha opened a new pull request, #327: [Java] How dictionaries work - roundtrip Java-Python

vibhatha opened a new pull request, #327:
URL: https://github.com/apache/arrow-cookbook/pull/327

     **INITIAL DRAFT PLEASE DO NOT REVIEW**
   
   This PR closes https://github.com/apache/arrow-cookbook/issues/213


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] davisusanibar commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1707577628

   > > Hi @vibhatha, just reviewing error logs and error mention that arrow-memory-netty is not available on the nightly packages, and that is true, as you can see at https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory-netty/, which is very rare, but it needs to be reviewed.
   > > Alternatively, you could start using the 14.0.0-SNAPSHOT version which can be configured in the arrow-cookbook/java/source/conf.py file as follows:
   > > ```
   > > if arrow_nightly and arrow_nightly != '0':
   > >     version = "14.0.0-SNAPSHOT"
   > > else:
   > >     version = "13.0.0"
   > > ```
   > 
   > If I understand correctly, should I make that change in this PR itself? Or should we create a separate PR for that?
   
   I am adding this change on this PR also https://github.com/apache/arrow-cookbook/pull/320, but it is true, this should be created in a independently PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1325705401


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {
+            CDataDictionaryProvider provider = new CDataDictionaryProvider();
+            MapValueConsumerV2 mvc = new MapValueConsumerV2(provider);//FIXME! Use constructor with dictionary provider

Review Comment:
   This is cleared and it is a typo. Thanks for catching this one. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1707570208

   > Hi @vibhatha, just reviewing error logs and error mention that arrow-memory-netty is not available on the nightly packages, and that is true, as you can see at https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory-netty/, which is very rare, but it needs to be reviewed.
   > 
   > Alternatively, you could start using the 14.0.0-SNAPSHOT version which can be configured in the arrow-cookbook/java/source/conf.py file as follows:
   > 
   > ```
   > if arrow_nightly and arrow_nightly != '0':
   >     version = "14.0.0-SNAPSHOT"
   > else:
   >     version = "13.0.0"
   > ```
   
   If I understand correctly, should I make that change in this PR itself? Or should we create a separate PR for that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323188292


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,201 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.

Review Comment:
   ```suggestion
       Data is generated in PyArrow and exported through the C Data Interface to Java.
   ```



##########
java/source/index.rst:
##########
@@ -43,6 +43,7 @@ This cookbook is tested with Apache Arrow |version|.
    data
    avro
    jdbc
+   python_java

Review Comment:
   Would "c data interface" potentially be a better name? The section below uses pyarrow/jpype as a python example, but in the future we could also add other language interfaces.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,201 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] davisusanibar commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323385717


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,200 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through the C Data Interface to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. code-block:: java
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {

Review Comment:
   Could be an option to also validate Python call as as part of Java testing using `.. testcode::` and .. `testoutput::` directives?
   
   Something like this:
   
   ```
   import org.apache.arrow.c.ArrowArray;
   import org.apache.arrow.c.ArrowSchema;
   import org.apache.arrow.c.CDataDictionaryProvider;
   import org.apache.arrow.c.Data;
   import org.apache.arrow.memory.BufferAllocator;
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.vector.BigIntVector;
   import org.apache.arrow.vector.FieldVector;
   
   
   public class MapValuesConsumer {
       private final static BufferAllocator allocator = new RootAllocator();
       private final CDataDictionaryProvider provider;
       private FieldVector vector;
   
       public MapValuesConsumer(CDataDictionaryProvider provider) {
           this.provider = provider;
       }
   
       public MapValuesConsumer() {
           this.provider = null;
       }
   
       public static BufferAllocator getAllocatorForJavaConsumer() {
           return allocator;
       }
   
       public FieldVector getVector() {
           return this.vector;
       }
   
       public void update(long c_array_ptr, long c_schema_ptr) {
           ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
           ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
           this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
           this.doWorkInJava(vector);
       }
   
       public void update2(long c_array_ptr, long c_schema_ptr) {
           ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
           ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
           this.vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
           this.doWorkInJava(vector);
       }
   
       private void doWorkInJava(FieldVector vector) {
           System.out.println("Doing work in Java");
           BigIntVector bigIntVector = (BigIntVector)vector;
           bigIntVector.setSafe(0, 2);
       }
   
       public static void main(String[] args) {
           simulateAsAJavaConsumers();
       }
   
       final static BigIntVector intVector =
               new BigIntVector("internal_test", allocator);
   
       public static BigIntVector getIntVectorForJavaConsumers() {
           intVector.allocateNew(3);
           intVector.set(0, 1);
           intVector.set(1, 7);
           intVector.set(2, 93);
           intVector.setValueCount(3);
           return intVector;
       }
   
       public static void simulateAsAJavaConsumers() {
           MapValuesConsumer mvc = new MapValuesConsumer();//FIXME! Use constructor with dictionary provider
           try (
               ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
               ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator)
           ) {
               //FIXME! Add custo  logic to emulate a dictionary provider adding
               Data.exportVector(allocator, getIntVectorForJavaConsumers(), null, arrowArray, arrowSchema);
               mvc.update2(arrowArray.memoryAddress(), arrowSchema.memoryAddress());
               try (FieldVector valueVectors = Data.importVector(allocator, arrowArray, arrowSchema, null);) {
                   System.out.print(valueVectors); //FIXME! Validate on .. testoutput::
               }
           }
           intVector.close(); //FIXME! Expose this method also to be called by end python program
           allocator.close();
       }
   }
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332015411


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array

Review Comment:
   In that case document why it is necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1331815121


##########
java/source/c_data.rst:
##########
@@ -0,0 +1,12 @@
+.. _c-data-java:
+
+==================
+C Data Integration
+==================
+
+C Data interface is an important aspect of supporting multiple languages in Apache Arrow. 
+A Java programme can seamlessly work with C++ and Python programmes. The following examples

Review Comment:
   I believe we've stuck to American English (sorry)
   
   ```suggestion
   A Java program can seamlessly work with C++ and Python programs. The following examples
   ```



##########
java/source/c_data.rst:
##########
@@ -0,0 +1,12 @@
+.. _c-data-java:
+
+==================
+C Data Integration
+==================
+
+C Data interface is an important aspect of supporting multiple languages in Apache Arrow. 

Review Comment:
   ```suggestion
   ================
   C Data Interface
   ================
   
   The Arrow C Data Interface enables zero-copy sharing of Arrow data between language runtimes. 
   ```
   
   Can you link to the Arrow docs for C Data Interface?



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.

Review Comment:
   ```suggestion
   This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
   and finally re-accessed and validated in Python for data consistency.
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------

Review Comment:
   ```suggestion
   Python Component
   ----------------
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------

Review Comment:
   ```suggestion
   Java Component
   --------------
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::

Review Comment:
   Format the code here as well.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+    import org.apache.arrow.util.AutoCloseables;
+
+
+    class MapValuesConsumer implements AutoCloseable {
+        private final BufferAllocator allocator;
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final BigIntVector intVector;
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider, BufferAllocator allocator) {
+            this.provider = provider;
+            this.allocator = allocator;
+            this.intVector = new BigIntVector("internal_test_vector", allocator);
+        }
+
+        public BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        public BigIntVector getIntVectorForJavaConsumer() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        @Override
+        public void close() throws Exception {
+            AutoCloseables.close(intVector);
+        }
+    }
+    try (BufferAllocator allocator = new RootAllocator()) {
+        CDataDictionaryProvider provider = new CDataDictionaryProvider();
+        try (final MapValuesConsumer mvc = new MapValuesConsumer(provider, allocator)) {
+            try (
+            ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
+            ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator)
+            )  {
+                    Data.exportVector(allocator, mvc.getIntVectorForJavaConsumer(), provider, arrowArray, arrowSchema);
+                    FieldVector updatedVector = mvc.updateFromJava(arrowArray.memoryAddress(), arrowSchema.memoryAddress());
+                    try (ArrowArray usedArray = ArrowArray.allocateNew(allocator);
+                        ArrowSchema usedSchema = ArrowSchema.allocateNew(allocator)) {
+                        Data.exportVector(allocator, updatedVector, provider, usedArray, usedSchema);
+                        try(FieldVector valueVectors = Data.importVector(allocator, usedArray, usedSchema, provider)) {
+                            System.out.println(valueVectors);
+                        }
+                    }
+                    updatedVector.close();
+                } catch (Exception ex) {
+                    ex.printStackTrace();
+                }
+        } catch (Exception ex) {
+            ex.printStackTrace();
+        }
+    } catch (Exception ex) {
+        ex.printStackTrace();
+    }
+
+
+.. testoutput::
+
+    Doing work in Java
+    [2, 7, 93]
+
+
+The Java component performs the following actions:
+
+1. Receives data from the Python component.
+2. Updates the data.
+3. Exports the updated data back to Python.
+
+By integrating PyArrow in Python and Java components, this example demonstrates that 
+a system can be created where data is shared and updated across both languages seamlessly.

Review Comment:
   Ditto



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *

Review Comment:
   Avoid wildcard imports.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)

Review Comment:
   misleading wording/naming



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)

Review Comment:
   ```suggestion
       print("Dictionary Created:", array)
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency

Review Comment:
   Move this to the top of the section and re-edit the prose.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.

Review Comment:
   ```suggestion
   The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
   Data is generated in PyArrow and exported through the C Data Interface to Java.
   ```
   Link `jpype` to their website/github repo.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python

Review Comment:
   Run your code through a formatter so that it's consistent. (You can use Sphinx directives that include code from files instead of having to inline the code here, to make it easier.)



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)

Review Comment:
   ```suggestion
       print("Updated Array:", updated_array)
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array

Review Comment:
   Explicit del should be unnecessary.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.

Review Comment:
   ```suggestion
   In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
   It then updates the data and sends it back to the Python component.
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.

Review Comment:
   This description is misleading. You cannot mutate data in the C Data Interface.



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.

Review Comment:
   Also, misleading wording



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332860457


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array

Review Comment:
   Got it, I have one question since this API is pretty new to me. 
   
   So what happens here is we call Java from Python. So Python VM is up first, then from Python VM we up another JVM. Then we access the memory from Java and from that we create a Python object. So the Python object and Java object points to the same memory. Is this statement correct?
   
   Then what could happen is, the Python shutsdown its VM and in the process it would try to shutdown JVM first. The `exportVector` function call to Java would call a function called `release_exported`. This is where we see that warning. 
   
   Further according to a comment in the `release_exported` in `jni_wrapper.cc`
   
   ```c++
   // It is possible for the JVM to be shut down when this is called;
   // guard against that.  Example: Python code using JPype may shut
   // down the JVM before releasing the stream.
   ```
   I believe this above warning could cause when attempting to delete global references? 
   Please correct me if I am wrong. And if there is a better and accurate explanation, would appreciate to learn a few things about it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332015081


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)

Review Comment:
   It is about both. See above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1728220472

   Given we have no roundtrip examples at all, I don't see why we are starting with dictionaries vs a simpler type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1717047314

   @danepitkin I am addressing the reviews and thank you for reviewing this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332793635


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.

Review Comment:
   I changed the wording. 
   
   ```txt
   This section demonstrates a data roundtrip where C Data interface is being used to provide
   the seamless access to data across language boundaries.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323237733


##########
java/source/index.rst:
##########
@@ -43,6 +43,7 @@ This cookbook is tested with Apache Arrow |version|.
    data
    avro
    jdbc
+   python_java

Review Comment:
   @danepitkin 
   Thank you for the quick review and suggestions. And I agree with your comment. Will make the changed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1727997553

   @lidavidm thanks a lot for your feedback. I will address these. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1728674698

   > Given we have no roundtrip examples at all, I don't see why we are starting with dictionaries vs a simpler type
   
   Good point. Although, I am merely picking up an existing issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332830590


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.

Review Comment:
   Changed the wording. Thanks for catching this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323876445


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,200 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through the C Data Interface to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. code-block:: java
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {

Review Comment:
   @davisusanibar this code doesn't seem to be working, but I get your idea. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] davisusanibar commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1707536667

   Hi @vibhatha, just reviewing error logs and error mention that arrow-memory-netty is not available on the nightly packages, and that is true, as you can see at https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory-netty/, which is very rare, but it needs to be reviewed.
   
   Alternatively, you could start using the 14.0.0-SNAPSHOT version which can be configured in the arrow-cookbook/java/source/conf.py file as follows:
   ```
   if arrow_nightly and arrow_nightly != '0':
       version = "14.0.0-SNAPSHOT"
   else:
       version = "13.0.0"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1707578491

   @davisusanibar sounds good and I will rebase once this PR is merged. Thank you :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1325063779


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {

Review Comment:
   ```suggestion
           public static void simulateJavaConsumer() {
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {
+            CDataDictionaryProvider provider = new CDataDictionaryProvider();
+            MapValueConsumerV2 mvc = new MapValueConsumerV2(provider);//FIXME! Use constructor with dictionary provider
+            try (
+                ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
+                ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator)
+            ) {
+                Data.exportVector(allocator, getIntVectorForJavaConsumers(), provider, arrowArray, arrowSchema);
+                FieldVector updatedVector = mvc.updateFromJava(arrowArray.memoryAddress(), arrowSchema.memoryAddress());
+                try (ArrowArray usedArray = ArrowArray.allocateNew(allocator);
+                    ArrowSchema usedSchema = ArrowSchema.allocateNew(allocator)) {
+                    Data.exportVector(allocator, updatedVector, provider, usedArray, usedSchema);
+                    try(FieldVector valueVectors = Data.importVector(allocator, usedArray, usedSchema, provider)) {
+                        System.out.println(valueVectors);
+                    }
+                }
+            }
+        }
+
+        public static void close() {
+            intVector.close();
+        }
+
+        public static void main(String[] args) {
+            simulateAsAJavaConsumers();

Review Comment:
   ```suggestion
               simulateJavaConsumer();
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {

Review Comment:
   ```suggestion
           private static BigIntVector getIntVectorForJavaConsumer() {
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {
+            CDataDictionaryProvider provider = new CDataDictionaryProvider();
+            MapValueConsumerV2 mvc = new MapValueConsumerV2(provider);//FIXME! Use constructor with dictionary provider
+            try (
+                ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
+                ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator)
+            ) {
+                Data.exportVector(allocator, getIntVectorForJavaConsumers(), provider, arrowArray, arrowSchema);

Review Comment:
   ```suggestion
                   Data.exportVector(allocator, getIntVectorForJavaConsumer(), provider, arrowArray, arrowSchema);
   ```



##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {
+            CDataDictionaryProvider provider = new CDataDictionaryProvider();
+            MapValueConsumerV2 mvc = new MapValueConsumerV2(provider);//FIXME! Use constructor with dictionary provider

Review Comment:
   Is the `//FIXME` still needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1714294814

   @danepitkin @davisusanibar This PR is ready for review, please take a look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1331960670


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)

Review Comment:
   I assume this is about the comment, how about 
   ```python
   # update values in Java
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332781091


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)

Review Comment:
   Got it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1325705717


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,254 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. testcode::
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+        private final static BigIntVector intVector = new BigIntVector("internal_test_vector", allocator);
+
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
+            this.doWorkInJava(vector);
+        }
+
+        public FieldVector updateFromJava(long c_array_ptr, long c_schema_ptr) {
+            ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
+            ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
+            vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
+            this.doWorkInJava(vector);
+            return vector;
+        }
+
+        private void doWorkInJava(FieldVector vector) {
+            System.out.println("Doing work in Java");
+            BigIntVector bigIntVector = (BigIntVector)vector;
+            bigIntVector.setSafe(0, 2);
+        }
+
+        private static BigIntVector getIntVectorForJavaConsumers() {
+            intVector.allocateNew(3);
+            intVector.set(0, 1);
+            intVector.set(1, 7);
+            intVector.set(2, 93);
+            intVector.setValueCount(3);
+            return intVector;
+        }
+
+        public static void simulateAsAJavaConsumers() {
+            CDataDictionaryProvider provider = new CDataDictionaryProvider();
+            MapValueConsumerV2 mvc = new MapValueConsumerV2(provider);//FIXME! Use constructor with dictionary provider

Review Comment:
   It was an intermediate commit to test an idea :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] davisusanibar commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323382429


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,200 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through the C Data Interface to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. code-block:: java

Review Comment:
   Java component is tested by custom [directives](https://github.com/davisusanibar/arrow-cookbook/blob/d4e3b349f37a2b8ff01547b57fd99a523c1bb684/java/ext/javadoctest.py#L104:L105) created with `testcode` and `testoutput`.
   
   During internal works, Java code is executed by Jshell and output defined by system.out.print on the testcode is compared to the value defined on the test output, such as:
   
   ```
   .. testcode::
   
       System.out.print("testme");
   
   .. testoutput::
   
       testme
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1323812447


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,200 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through the C Data Interface to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array
+
+.. code-block:: shell
+
+    From Python
+    Dictionary Created:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        0,
+        1,
+        2,
+        0,
+        3
+    ]
+    Doing work in Java
+    From Java back to Python
+    Updated Array:
+    -- dictionary:
+    [
+        "A",
+        "B",
+        "C",
+        "D"
+    ]
+    -- indices:
+    [
+        2,
+        1,
+        2,
+        0,
+        3
+    ]
+
+In the Python component, the following steps are executed to demonstrate the data roundtrip:
+
+1. Create data in Python 
+2. Export data to Java
+3. Import updated data from Java
+4. Validate the data consistency
+
+
+Java Component:
+---------------
+
+    In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. 
+    It then updates the data and sends it back to the Python component.
+
+.. code-block:: java
+
+    import org.apache.arrow.c.ArrowArray;
+    import org.apache.arrow.c.ArrowSchema;
+    import org.apache.arrow.c.Data;
+    import org.apache.arrow.c.CDataDictionaryProvider;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.FieldVector;
+    import org.apache.arrow.vector.BigIntVector;
+
+
+    public class MapValuesConsumer {
+        private final static BufferAllocator allocator = new RootAllocator();
+        private final CDataDictionaryProvider provider;
+        private FieldVector vector;
+
+        public MapValuesConsumer(CDataDictionaryProvider provider) {
+            this.provider = provider;
+        }
+
+        public static BufferAllocator getAllocatorForJavaConsumer() {
+            return allocator;
+        }
+
+        public FieldVector getVector() {
+            return this.vector;
+        }
+
+        public void update(long c_array_ptr, long c_schema_ptr) {

Review Comment:
   Thank you for the insight, @davisusanibar. I will work on this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1718668273

   @danepitkin I am working on updating this PR for Java doctests. I will address the reviews as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] vibhatha commented on a diff in pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1331962637


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated with Java applications.
+This document provides a guide on how to enable seamless data exchange between Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data interface.
+    # Since the array from Java and array created in Python should have same data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array

Review Comment:
   I get the following warning when I remove that line (I added it for this reason, but I maybe missing something in Java end). 
   
   ```bash
   WARNING: Failed to release Java C Data resource: Failed to attach the current thread to a Java VM
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] davisusanibar commented on pull request #327: [Java] How dictionaries work - roundtrip Java-Python

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#issuecomment-1728290171

   > Given we have no roundtrip examples at all, I don't see why we are starting with dictionaries vs a simpler type
   
   Could it be consider as a simpler type https://github.com/apache/arrow-cookbook/pull/325? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org