You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/16 15:15:37 UTC

[GitHub] [arrow-cookbook] davisusanibar opened a new pull request #113: [Java]: WIP Java cookbook recipes

davisusanibar opened a new pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113


   1. Initial java cookboom recipes for:
   - Reading and Writing Data
   - Creating Arrow Objects
   - Working with Schema
   - Data Manipulation
   
   2. Pending task:
   - Define a way how to validate java recipe documentation. Planning to use [java-sphinx](https://github.com/bronto/javasphinx) but it is out off scope. Probably implement as a java unit test and test sourcecode before the documentation creation but this validate the source not the code in the documentation
   
   - Other pending task is to review github workflow and align java recipe on that
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781628920



##########
File path: java/source/usecase.rst
##########
@@ -0,0 +1,277 @@
+========
+Use Case

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#issuecomment-1016516045


   Hi @lidavidm / @amol- I am going to close this PR
   
   There are good points need to be consider and I decided to send PR by recipe to we could map feedback in a better way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781195350



##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       Ah, really? Hmm. (Oh, it might be getting confused if java is both a directory and a target…in that case, I think `j` is OK as a target? So long as paths use `java`.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781622409



##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781362659



##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       You need too mark the target as `PHONY` to avoid having Make confuse it with a file or directory.
   See https://makefiletutorial.com/#phony

##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       You need to mark the target as `PHONY` to avoid having Make confuse it with a file or directory.
   See https://makefiletutorial.com/#phony




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781194359



##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       When change to java I see ouput: "make: `java' is up to date.", changed to javas




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782129280



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format

Review comment:
       Deleted

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.

Review comment:
       Changed

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) < 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;
+
+   true
+   true
+   true
+   true
+   true
+   true
+
+Search values on the array
+==========================
+
+Linear search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+    if (i == 0) {

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r783202899



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       FYI, I think that the original idea was to have the code in `.java` files that we could run to verify the examples and then use `:literalinclude:` to embed the function that should go into the recipe.
   
   I don't know if `jshell` opened for use additional possibilities, but we need an answer on how the testing part should be done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r783202899



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       FYI, I think that the original idea was to have the code in `.java` files that we could run to verify the examples and then use `:literalinclude:` to embed the function that should go into the recipe.
   
   I don't know if `jshell` opened for us additional possibilities, but we need an answer on how the testing part should be done. @lidavidm @davisusanibar




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782127914



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       The only purpose is to show how a schema could be created directly from a json definition instead of use Field / FieldType




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782252259



##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);

Review comment:
       Added a link to java reference and arrow documentation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r783202899



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       FYI, I think that the original idea was to have the code in `.java` files that we could run to verify the examples and then use `:literalinclude:` to embed the function that should go into the recipe.
   
   I don't know if `jshell` opened for us additional possibilities, but we need an answer on how the testing part should be done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#issuecomment-1005708795


   Ah, I submitted early, sorry. I'm still going through this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar closed pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar closed pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781622276



##########
File path: java/source/conf.py
##########
@@ -0,0 +1,55 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'java-cookbook'
+copyright = '2021, apache arrow'
+author = 'apache arrow'
+
+# The full version, including alpha/beta/rc tags
+release = 'arrow cookbook'

Review comment:
       Changed

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects

Review comment:
       Changed

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781605327



##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array

Review comment:
       > Should we try to be consistent about calling it a "vector" instead of an "array"?
   
   For this java cookbook we are decided to reuse "array" word that is used on python instead of mention "vector" that is only used on java 
   
   Array is more common word than Vector because at the end both terms are working close same
   
   How do you see that?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r783120749



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       `make java` builds the HTML output, but how do we test that the examples still work?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781701557



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data

Review comment:
       We explain that at "Creating Arrow Objects - Creating VectorSchemaRoot"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782258223



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type

Review comment:
       Related to cookbook is to show case how java arrow is working. Related to teach our audience we could define something like this comments on that part
   
   > 
   
   All the comments was removed because the title of cookbook also resume very well the objective of the code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r784012184



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       In our case to cover testing code documentation we have two options:
   1. Create java cookbook inside of java unit test and run "mvn test" before to excute "make java"
   2. Continue with jshell and create a custom tester output validator (extend SphinxDirective) as the same c++ do with [c-extension](https://github.com/apache/arrow-cookbook/blob/main/cpp/ext/recipeext.py)
   
   I could suggest that jshell offer fast initiation/testing arrow java code immediately but it needed to create a custom sphinx directive
   
   Java unit test code is the more easy way but we could lose jshell power
   
   My proposal is to continue with jshell and create another ticket to create a custom sphinx directive (time estimation: not idea at this moment)
   
   Please let me know if this make sense




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#issuecomment-1016517549


   Thanks. FWIW, if it's easier for you and if you do decide to go the route of a custom extension or something more involved, I think you could send a PR containing just the extension and one or two recipes - then we can iterate on that before writing out all the rest of the recipes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782147857



##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.

Review comment:
       What is the conceptual distinction between FieldVector and ValueVector?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4

Review comment:
       After looking at it rendered, I think the highlighting is not helpful here. It's only helpful once the example gets long enough.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3

Review comment:
       We especially shouldn't do something like this where the entire code block is highlighted.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality
+==================================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right);
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1);
+   true
+   jshell> visitor.equals(left2);
+   false
+
+Compare Values on the Array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVectorValueComparator(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-12
+
+   jshell> stableComparator.compare(0, 1) > 0;
+   true 
+   jshell> stableComparator.compare(1, 2) < 0;
+   true 
+   jshell> stableComparator.compare(2, 3) < 0;
+   true 
+   jshell> stableComparator.compare(1, 3) < 0;
+   true 
+   jshell> stableComparator.compare(3, 1) > 0;
+   true 
+   jshell> stableComparator.compare(3, 3) == 0;
+   true
+
+Search Values on the Array
+==========================
+
+Linear Search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)

Review comment:
       Instead of writing out the full name here, maybe just write `VectorSearcher#linearSearch` and link to Javadocs?

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality

Review comment:
       I realize I suggested this title but we need to be consistent with Array vs Vector. I still think we should only use Vector since that is what the library calls it.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,329 @@
+===================
+Working with schema
+===================

Review comment:
       ```suggestion
   ===================
   Working with Schema
   ===================
   ```

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());

Review comment:
       Even if we will have a separate tutorial, I don't think the cookbook should demonstrate incorrect code.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.
+
+.. code-block:: java
+   :emphasize-lines: 21
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation

Review comment:
       The example is demonstrating a misuse of the Arrow Java library. This allocator needs to be created somewhere else - perhaps it should be an argument of this function.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.

Review comment:
       Why is this using a line block (`| `)?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.

Review comment:
       This means that a VectorSchemaRoot really isn't anything like a Table at all.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);

Review comment:
       Frankly, the helpers make these examples rather pointless. The example isn't copy-pastable anymore and it lacks context; people have to scroll up and find the right overload. We should inline the helpers into the examples, and we need to provide more commentary on the steps. (For instance, setSafe will reallocate the vector, and we need to update the vector length manually.)

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.
+
+.. code-block:: java
+   :emphasize-lines: 21

Review comment:
       Highlighting the function declaration doesn't really help.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.
+
+.. code-block:: java
+   :emphasize-lines: 21
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);

Review comment:
       The example is rather long. I think we should omit the metadata since it's not relevant here. We can put that in a separate recipe.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.
+
+.. code-block:: java
+   :emphasize-lines: 21
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);

Review comment:
       Missing comment on the literal parameter.

##########
File path: java/source/demo/pom.xml
##########
@@ -0,0 +1,51 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       Why do we have this Maven project? I don't see it referenced anywhere in the actual cookbook, and if it's for testing, then manually copy-pasting code between Maven and Sphinx will quickly get error-prone/forgotten.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.
+
+.. code-block:: java
+   :emphasize-lines: 21
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));

Review comment:
       If it's too much code to inline the helpers, IMO we should just remove the list array from the example. The focus should be on VectorSchemaRoot anyways.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality
+==================================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right);
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1);
+   true
+   jshell> visitor.equals(left2);
+   false
+
+Compare Values on the Array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVectorValueComparator(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-12
+
+   jshell> stableComparator.compare(0, 1) > 0;
+   true 
+   jshell> stableComparator.compare(1, 2) < 0;
+   true 
+   jshell> stableComparator.compare(2, 3) < 0;
+   true 
+   jshell> stableComparator.compare(1, 3) < 0;
+   true 
+   jshell> stableComparator.compare(3, 1) > 0;
+   true 
+   jshell> stableComparator.compare(3, 3) == 0;
+   true
+
+Search Values on the Array
+==========================
+
+Linear Search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)

Review comment:
       The comment just repeats the prose, I don't think it's useful.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality
+==================================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right);
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1);
+   true
+   jshell> visitor.equals(left2);
+   false
+
+Compare Values on the Array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVectorValueComparator(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-12
+
+   jshell> stableComparator.compare(0, 1) > 0;
+   true 
+   jshell> stableComparator.compare(1, 2) < 0;
+   true 
+   jshell> stableComparator.compare(2, 3) < 0;
+   true 
+   jshell> stableComparator.compare(1, 3) < 0;
+   true 
+   jshell> stableComparator.compare(3, 1) > 0;
+   true 
+   jshell> stableComparator.compare(3, 3) == 0;
+   true
+
+Search Values on the Array
+==========================
+
+Linear Search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+       if (i == 0) {
+           rawVector.setNull(i);
+       } else {
+           rawVector.set(i, i);
+       }
+   }
+   negVector.set(0, -333);
+   VectorValueComparator<IntVector> comparatorInt = DefaultVectorComparators.createDefaultComparator(rawVector);
+
+   // do search
+   List<Integer> listResultLinearSearch = new ArrayList<Integer>();
+   for (int i = 0; i < 10; i++) {
+       int result = VectorSearcher.linearSearch(rawVector, comparatorInt, rawVector, i);
+       listResultLinearSearch.add(result);
+   }
+
+Verify results:
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+   
+   jshell> listResultLinearSearch
+
+   listResultLinearSearch ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+Binary Search - O(log(n))
+*************************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#binarySearch - O(log(n))
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+       if (i == 0) {

Review comment:
       FWIW I think our Java code uses a 2-space indent convention, maybe we should try to be consistent? (It would be easier if the Java code could be linted/formatted separately from the reST.)

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,348 @@
+.. _arrow-io:
+
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using Apache Arrow.
+
+Arrow defines two types of binary formats for serializing record batches `IPC <https://arrow.apache.org/docs/java/ipc.html>`_: Streaming format / File or Random Access format
+
+.. contents::
+
+Writing Array
+=============
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing Arrays with the IPC File Format
+***************************************
+
+Write - Random Access to File
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - Random Access to Buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing Arrays with the IPC Streamed Format

Review comment:
       ```suggestion
   Writing Arrays with the IPC Stream Format
   ```

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,348 @@
+.. _arrow-io:
+
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using Apache Arrow.
+
+Arrow defines two types of binary formats for serializing record batches `IPC <https://arrow.apache.org/docs/java/ipc.html>`_: Streaming format / File or Random Access format
+
+.. contents::
+
+Writing Array
+=============
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing Arrays with the IPC File Format
+***************************************
+
+Write - Random Access to File
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();

Review comment:
       It should be explicitly explained that `writeBatch` writes the current contents of the `vectorSchemaRoot`.

##########
File path: java/source/flight.rst
##########
@@ -0,0 +1,472 @@
+.. _arrow-flight:
+
+============
+Arrow Flight
+============
+
+Recipes related to leveraging Arrow Flight protocol
+
+.. contents::
+
+Simple Service with Arrow Flight
+================================
+
+Common Classes
+**************
+
+We are going to use this util for data manipulation:
+
+* InMemoryStore: A FlightProducer that hosts an in memory store of Arrow buffers. Used for integration testing.

Review comment:
       I would rather we not just pull the integration test code 1:1. It's not relevant to people. I think we should omit the Flight example for now and we can add it later (I can work on that, and Tom and I have been working on some Java/Flight examples too).

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality
+==================================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right);
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1);
+   true
+   jshell> visitor.equals(left2);
+   false
+
+Compare Values on the Array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVectorValueComparator(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-12
+
+   jshell> stableComparator.compare(0, 1) > 0;
+   true 
+   jshell> stableComparator.compare(1, 2) < 0;
+   true 
+   jshell> stableComparator.compare(2, 3) < 0;
+   true 
+   jshell> stableComparator.compare(1, 3) < 0;
+   true 
+   jshell> stableComparator.compare(3, 1) > 0;
+   true 
+   jshell> stableComparator.compare(3, 3) == 0;
+   true
+
+Search Values on the Array
+==========================
+
+Linear Search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+       if (i == 0) {
+           rawVector.setNull(i);
+       } else {
+           rawVector.set(i, i);
+       }
+   }
+   negVector.set(0, -333);
+   VectorValueComparator<IntVector> comparatorInt = DefaultVectorComparators.createDefaultComparator(rawVector);
+
+   // do search
+   List<Integer> listResultLinearSearch = new ArrayList<Integer>();
+   for (int i = 0; i < 10; i++) {
+       int result = VectorSearcher.linearSearch(rawVector, comparatorInt, rawVector, i);
+       listResultLinearSearch.add(result);
+   }
+
+Verify results:
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+   
+   jshell> listResultLinearSearch
+
+   listResultLinearSearch ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+Binary Search - O(log(n))
+*************************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#binarySearch - O(log(n))
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);

Review comment:
       We never explain this but effectively this zero-initializes the vector, right? Can we explain that in the vectors section of the cookbook?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,329 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type

Review comment:
       > Generally, tables have an associated schema. The Arrow Java library has classes for defining schemas. A schema consists of a list of Fields, where each Field has a name and a type for a particular column.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.

Review comment:
       The mutability description is a little redundant. We can say something like `Vector in the Java library is a mutable container, unlike Arrays in many other Arrow implementations which are mutable.`

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,608 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   class TestVectorValueComparator extends VectorValueComparator<VarCharVector> {
+       @Override
+       public int compareNotNull(int index1, int index2) {
+           byte b1 = vector1.get(index1)[0];
+           byte b2 = vector2.get(index2)[0];
+           return b1 - b2;
+       }
+
+       @Override
+       public VectorValueComparator<VarCharVector> createNew() {
+           return new TestVectorValueComparator();
+       }
+   }
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare Vectors for Field Equality
+==================================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right);
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1);
+   true
+   jshell> visitor.equals(left2);
+   false
+
+Compare Values on the Array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVectorValueComparator(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-12
+
+   jshell> stableComparator.compare(0, 1) > 0;
+   true 
+   jshell> stableComparator.compare(1, 2) < 0;
+   true 
+   jshell> stableComparator.compare(2, 3) < 0;
+   true 
+   jshell> stableComparator.compare(1, 3) < 0;
+   true 
+   jshell> stableComparator.compare(3, 1) > 0;
+   true 
+   jshell> stableComparator.compare(3, 3) == 0;
+   true
+
+Search Values on the Array
+==========================
+
+Linear Search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+       if (i == 0) {
+           rawVector.setNull(i);
+       } else {
+           rawVector.set(i, i);
+       }
+   }
+   negVector.set(0, -333);
+   VectorValueComparator<IntVector> comparatorInt = DefaultVectorComparators.createDefaultComparator(rawVector);
+
+   // do search
+   List<Integer> listResultLinearSearch = new ArrayList<Integer>();
+   for (int i = 0; i < 10; i++) {
+       int result = VectorSearcher.linearSearch(rawVector, comparatorInt, rawVector, i);
+       listResultLinearSearch.add(result);
+   }
+
+Verify results:
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+   
+   jshell> listResultLinearSearch
+
+   listResultLinearSearch ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+Binary Search - O(log(n))
+*************************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#binarySearch - O(log(n))
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+       if (i == 0) {
+           rawVector.setNull(i);
+       } else {
+           rawVector.set(i, i);
+       }
+   }
+   negVector.set(0, -333);
+   VectorValueComparator<IntVector> comparatorInt = DefaultVectorComparators.createDefaultComparator(rawVector);
+
+   // do search
+   List<Integer> listResultBinarySearch = new ArrayList<Integer>();
+   for (int i = 0; i < 10; i++) {
+       int result = VectorSearcher.binarySearch(rawVector, comparatorInt, rawVector, i);
+       listResultBinarySearch.add(result);
+   }
+
+Verify results:
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listResultBinarySearch
+
+   listResultBinarySearch ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+Sort Values on the Array
+========================
+
+In-place Sorter - O(nlog(n))
+****************************
+
+Sorting by manipulating the original vector.
+Algorithm: org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter - O(nlog(n))
+
+.. code-block:: java
+   :emphasize-lines: 22-24
+
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // Sort the vector - In-place sorter
+   IntVector vecToSort = new IntVector("in-place-sorter", rootAllocator);
+   vecToSort.allocateNew(10);
+   vecToSort.setValueCount(10);
+   // fill data to sort
+   vecToSort.set(0, 10);
+   vecToSort.set(1, 8);
+   vecToSort.setNull(2);
+   vecToSort.set(3, 10);
+   vecToSort.set(4, 12);
+   vecToSort.set(5, 17);
+   vecToSort.setNull(6);
+   vecToSort.set(7, 23);
+   vecToSort.set(8, 35);
+   vecToSort.set(9, 2);
+   // sort the vector
+   FixedWidthInPlaceVectorSorter sorter = new FixedWidthInPlaceVectorSorter();
+   VectorValueComparator<IntVector> comparator = DefaultVectorComparators.createDefaultComparator(vecToSort);
+   sorter.sortInPlace(vecToSort, comparator);
+
+Verify results:
+
+.. code-block:: java
+   :emphasize-lines: 1-22
+
+   jshell> vecToSort.getValueCount()==10;
+   true 
+   jshell> vecToSort.isNull(0);
+   true 
+   jshell> vecToSort.isNull(1);
+   true 
+   jshell> 2==vecToSort.get(2);
+   true 
+   jshell> 8==vecToSort.get(3);
+   true 
+   jshell> 10==vecToSort.get(4);
+   true 
+   jshell> 10==vecToSort.get(5);
+   true 
+   jshell> 12==vecToSort.get(6);
+   true 
+   jshell> 17==vecToSort.get(7);
+   true 
+   jshell> 23==vecToSort.get(8);
+   true 
+   jshell> 35==vecToSort.get(9);
+   true
+
+Out-place Sorter - O(nlog(n))
+*****************************
+
+Sorting by copies vector elements to a new vector in sorted order - O(nlog(n))
+Algorithm: : org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.
+FixedWidthOutOfPlaceVectorSorter & VariableWidthOutOfPlaceVectorSor

Review comment:
       This looks truncated.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   // create int vector
+   IntVector intVector = new IntVector("intVector", rootAllocator);

Review comment:
       Or really, this goes to the question above: a FieldVector combines a schema field with a vector?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       Right, but this is not a "standard" Arrow representation. The Java library is just applying Jackson to the Java classes. It's not interoperable with Python or any of the other implementations, nor does it appear to have any compatibility or stability guarantees. I'm not sure if we want to tell people to use this, and potentially mislead them, vs. demonstrating how to serialize a schema to the IPC format instead.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.

Review comment:
       If we are going to mix the `Vector` and `Array` terminology as described below, then can we say something like
   
   > A Vector is the basic unit in the Arrow Java library. It's similar to Arrays in other Arrow implementations.
   
   And link the second sentence to the Terminology section of the Arrow docs: https://arrow.apache.org/docs/format/Columnar.html#terminology

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,225 @@
+.. _arrow-create:
+
+======================
+Creating Arrow Objects
+======================
+
+| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+
+| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Creating Vectors (arrays)
+=========================
+
+Array of Int (32-bit integer value)
+-----------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   IntVector intVector = new IntVector("intVector", rootAllocator);
+   setVector(intVector, 1,2,3);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+
+   jshell> intVector
+
+   intVector ==> [1, 2, 3]
+
+Array of Varchar
+----------------
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.VarCharVector;
+
+   VarCharVector varcharVector = new VarCharVector("varcharVector", rootAllocator);
+   setVector(varcharVector, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> varcharVector
+
+   varcharVector ==> [david, gladis, juan]
+
+Array of List
+-------------
+
+.. code-block:: java
+   :emphasize-lines: 6
+
+   import org.apache.arrow.vector.complex.ListVector;
+
+   import static java.util.Arrays.asList;
+
+   ListVector listVector = ListVector.empty("listVector", rootAllocator);
+   setVector(listVector, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> listVector
+
+   listVector ==> [[1,3,5,7,9], [2,4,6,8,10], [1,2,3,5,8]]
+
+Creating VectorSchemaRoot (Table)
+=================================
+
+A `VectorSchemaRoot <https://arrow.apache.org/docs/java/vector_schema_root.html>`_ is a container that can hold batches, batches flow through VectorSchemaRoot as part of a pipeline.

Review comment:
       I still don't think Java has something analogous to Table and it's confusing to treat them as such.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781367199



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       We should avoid as much as possible utility functions in the recipes, users should be able to copy/paste the recipe, only change the values to match with their own variables/constants and have it work. So requiring users to lookup for utilities that they need to embed in their code goes against the principle of the Cookbook.
   
   If those utilities really add value, they should just be implemented in Arrow itself so that the cookbook and the users can rely on them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r778846417



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.

Review comment:
       Can we explain or link to docs about the difference between the two?
   
   Also I think we usually call the "random access format" the "file" format (vs the "stream" format), especially since it's called the file format below.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) < 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;

Review comment:
       and here as well, we should definitely use separate statements.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format

Review comment:
       > Arrow vectors can be serialized to disk as the Arrow IPC format. Such files can be directly memory-mapped when read.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file

Review comment:
       This is a little confusing, this makes it sound like we can write batches in random order (we cannot). Also, we've already stated that this is for the IPC file format in the section title. Maybe this can just be "Write to File" (and then "Write to In-Memory Buffer" below)?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:

Review comment:
       add metadata to a field, right?
   
   Also, how do we add metadata to a schema?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data

Review comment:
       Did we ever explain what a VectorSchemaRoot is?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       They take many lines of code, don't save all that many lines (especially since we have only a few values anyways), and force people to scroll back and forth.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type

Review comment:
       It might help here, for instance, to note below that we're creating a nested type, or that the `Int` definition accepts a bit width and a flag indicating signed/unsigned.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       Also, I'm not sure about promoting a separate serialization format for schemas. What about instead demonstrating how to serialize/deserialize a schema? (Though note that Schema.serialize is _not_ compatible with Python/C++.)

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;

Review comment:
       Hmm. Can we avoid wildcard imports? At least, we've avoided them so far and it obscures what comes from where.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) < 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;
+
+   true
+   true
+   true
+   true
+   true
+   true
+
+Search values on the array
+==========================
+
+Linear search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+    if (i == 0) {

Review comment:
       Can we try to be consistent about indent spacing?

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file

Review comment:
       These comments just restate the section title.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6

Review comment:
       If we're emphasizing all lines, I don't think there's a point.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format
+
+Read arrays with the IPC file format
+************************************
+
+Read - random access to file
+----------------------------
+
+Consider: Before to run next code you need to write array to file with `Write - random access to file`_.
+
+.. code-block:: java
+   :emphasize-lines: 7
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // read - random access to file
+   FileInputStream fileInputStream = new FileInputStream(file);
+   ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), rootAllocator);

Review comment:
       Does this guarantee memory mapping?

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format

Review comment:
       Do we need to repeat this?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       So at this point, I'm not sure if these utilities are actually helpful, vs. just manually calling `vector.setSafe(0, 1);` inline in the examples. 

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());

Review comment:
       Don't we need to close the writer, file, etc.?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);

Review comment:
       It's unexplained what allocateNew, set, setSafe, setValueCount, etc. actually do and why you might need them.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type

Review comment:
       As a general point, I don't think most of the code comments here have clarified things.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       Hmm. Is this JSON format defined somewhere/stable?

##########
File path: java/source/usecase.rst
##########
@@ -0,0 +1,277 @@
+========
+Use Case

Review comment:
       It seems these could all go under data manipulation.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.

Review comment:
       `union` is an overloaded word since it's also a type. Maybe `A Schema is a list of Fields, where each Field is a name and a type.`

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array

Review comment:
       Also the title is a little unclear to me…"Compare Vectors for Field Equality"?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);

Review comment:
       There's existing documentation about this: https://arrow.apache.org/docs/java/vector.html
   
   Is it possible to adapt that, or link to it? (If we link to it, we should set up the intersphinx plugin for this cookbook.)

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format
+
+Read arrays with the IPC file format
+************************************
+
+Read - random access to file
+----------------------------
+
+Consider: Before to run next code you need to write array to file with `Write - random access to file`_.
+
+.. code-block:: java
+   :emphasize-lines: 7
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // read - random access to file
+   FileInputStream fileInputStream = new FileInputStream(file);
+   ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), rootAllocator);

Review comment:
       If not, how do we enable it?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name

Review comment:
       contain?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name

Review comment:
       Also, "Tables" aren't a concept in the Java library.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on pull request #113: [Java]: WIP Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#issuecomment-996814152


   Just a reminder, once the Cookbook is merged, make sure to open a PR to add a reference to it from the Arrow Docs: https://github.com/apache/arrow/blob/master/docs/source/index.rst?plain=1#L58-L66


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r778826343



##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.

Review comment:
       Can we link to Javadocs? Is there an extension that can automate this in the same way intersphinx does for Sphinx docs?

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {

Review comment:
       Sorter -> Comparator?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.

Review comment:
       Maybe we should explain that vectors in Java are intended to be mutable, since that will also be foreign to users of other Arrow libraries.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.

Review comment:
       ```suggestion
   A vector is the basic unit in the Arrow Java library.
   ```

##########
File path: java/source/conf.py
##########
@@ -0,0 +1,55 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'java-cookbook'
+copyright = '2021, apache arrow'
+author = 'apache arrow'
+
+# The full version, including alpha/beta/rc tags
+release = 'arrow cookbook'

Review comment:
       It seems other cookbooks don't have 'release' and have different values for copyright/author, can we be consistent? (Also, it's 2022 now.)

##########
File path: java/source/demo/.cp.txt
##########
@@ -0,0 +1 @@
+/Users/dsusanibar/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-netty/6.0.0/arrow-memory-netty-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.11.4/jackson-core-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-algorithm/6.0.0/arrow-algorithm-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.11.4/jackson-annotations-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-vector/6.0.0/arrow-vector-6.0.0.jar:/Users/dsusanibar/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/6.0.0/arrow-memory-core-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.12.0/flatbuffers-java-1.12.0.jar:/Users/dsusanibar/.m2/repository/io/netty/netty-common/4.1.68.Final/netty-common-4.1.68.Final.jar:/Use
 rs/dsusanibar/.m2/repository/io/netty/netty-buffer/4.1.68.Final/netty-buffer-4.1.68.Final.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.11.4/jackson-databind-2.11.4.jar:/Users/dsusanibar/.m2/repository/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-format/6.0.0/arrow-format-6.0.0.jar

Review comment:
       Was this meant to be committed?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   // create int vector
+   IntVector intVector = new IntVector("intVector", rootAllocator);

Review comment:
       It might be helpful to explain that this gets used as the field name? (Coming off of Python/C++, this will be rather foreign.)

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);

Review comment:
       Wow, I didn't realize ListVector required you to manipulate the offsets directly.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   // create int vector

Review comment:
       Is the comment helpful here? The example itself is pretty minimal

##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       nit: why not write out `java` instead of `j`? (though I realize Python is abbreviated `py`)

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects

Review comment:
       ```suggestion
   Creating Arrow Objects
   ```

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal

Review comment:
       I'm not sure the comment helps here.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array

Review comment:
       Should we try to be consistent about calling it a "vector" instead of an "array"?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects

Review comment:
       If this is meant to be analogous to the pages for C++/Python, can we add a subheading for Vectors, and then possibly a subheading for VectorSchemaRoot?

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);

Review comment:
       nit: for clarity, why not separate lines?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int

Review comment:
       maybe ```Array of ``Int`` (32-bit integer)``` to assume less familiarity with the Java API?

##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       The same goes below for paths.

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.

Review comment:
       What is the distinction between FieldVector and ValueVector?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.

Review comment:
       I'm not going to go through and mark all of these, but let's please make sure to capitalize names properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782249041



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6

Review comment:
       Changed to only consider jshell commands

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file

Review comment:
       Deleted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781610506



##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
 all: html
 
 
-html: py r
+html: py r j

Review comment:
       Updated
   
   > It might be worth writing a custom Sphinx extension like we did for C++ for these examples, especially since we're apparently duplicating code between examples and actual Java code.
   > 
   > For instance, maybe we could have something that accepts a file containing Java code, and the block is a series of JShell statements. Then for a docs build, it would just render the Java code and the JShell session. For a test build, it would run the Java code, then run the JShell statements, and check that the output matches what's expected.
   
   In this case >jshell offer a "/edit" command when we could paste all our util code to define methods that then could be used for cookbook recipes or run all the cookbook code and then only validate output for /edit code and also /reset to do it again




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r783120749



##########
File path: Makefile
##########
@@ -13,6 +13,7 @@ help:
 	@echo "make test        Test cookbook for all platforms."
 	@echo "make py          Build the Cookbook for Python only."
 	@echo "make r           Build the Cookbook for R only."
+	@echo "make java        Build the Cookbook for Java only."

Review comment:
       `make java` builds the HTML output, but how do we test that the examples still work?
   
   All the other cookbooks have a `make pytest` or `make rtest` that verifies the examples agains the current release of arrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782250471



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       Updated to consider run every recipe independently without needed to run other code before




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782248914



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781622584



##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);

Review comment:
       We are using [java unit test code](https://github.com/apache/arrow/blob/16d5554ad2010bc7d224c7e3cad9b87188c92054/java/vector/src/test/java/org/apache/arrow/vector/testing/ValueVectorDataPopulator.java#L598) as a util methods in our cookbook

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int

Review comment:
       Changed

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+   :emphasize-lines: 4
+
+   import org.apache.arrow.vector.IntVector;
+
+   // create int vector

Review comment:
       Changed

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {

Review comment:
       Changed

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal

Review comment:
       Changed

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781622487



##########
File path: java/source/demo/.cp.txt
##########
@@ -0,0 +1 @@
+/Users/dsusanibar/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-netty/6.0.0/arrow-memory-netty-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.11.4/jackson-core-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-algorithm/6.0.0/arrow-algorithm-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.11.4/jackson-annotations-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-vector/6.0.0/arrow-vector-6.0.0.jar:/Users/dsusanibar/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/6.0.0/arrow-memory-core-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.12.0/flatbuffers-java-1.12.0.jar:/Users/dsusanibar/.m2/repository/io/netty/netty-common/4.1.68.Final/netty-common-4.1.68.Final.jar:/Use
 rs/dsusanibar/.m2/repository/io/netty/netty-buffer/4.1.68.Final/netty-buffer-4.1.68.Final.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.11.4/jackson-databind-2.11.4.jar:/Users/dsusanibar/.m2/repository/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-format/6.0.0/arrow-format-6.0.0.jar

Review comment:
       Deleted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r781625568



##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new StableVectorComparator<>(comparatorValues);//Stable comparator only supports comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) < 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782129089



##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name

Review comment:
       Changed

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array (points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.

Review comment:
       Thanks, changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782132581



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), "juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), "C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel());

Review comment:
       We decided to move all of this best practices to [java tutorial ticket](https://issues.apache.org/jira/browse/ARROW-15156) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org