You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 12:27:09 UTC

[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #183: ARROW-16170: [Java][Docs] Synch java code tutorial with java cookbook

lidavidm commented on code in PR #183:
URL: https://github.com/apache/arrow-cookbook/pull/183#discussion_r848369016


##########
java/source/data.rst:
##########
@@ -15,7 +15,7 @@ Compare Vectors for Field Equality
     import org.apache.arrow.vector.compare.TypeEqualsVisitor;
     import org.apache.arrow.memory.RootAllocator;
 
-    RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
+    RootAllocator rootAllocator = new RootAllocator();

Review Comment:
   Can we fix all of these to be `BufferAllocator allocator = ...`?



##########
java/source/io.rst:
##########
@@ -33,30 +34,34 @@ Write - Out to File
     import org.apache.arrow.vector.VectorSchemaRoot;
     import static java.util.Arrays.asList;
     import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
     import java.io.File;
     import java.io.FileOutputStream;
     import java.io.IOException;
 
-    try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+    try (
+        BufferAllocator allocator = new RootAllocator()

Review Comment:
   Why the extra lines here?



##########
java/source/schema.rst:
##########
@@ -58,7 +58,7 @@ Definition of columnar fields for string (name), integer (age) and array (points
 
     points: List<intCol: Int(32, true)>
 
-Define Metadata for Field
+Adding Metadata for Field
 =========================

Review Comment:
   ```suggestion
   Adding Metadata to Fields
   =========================
   ```



##########
java/source/schema.rst:
##########
@@ -82,10 +81,10 @@ In case we need to add metadata to our definition we could use:
 
     {A=Id card, B=Passport, C=Visa}
 
-Create the Schema
-=================
+Creating the Schema
+===================
 
-A schema is a list of Fields, where each Field is defined by name and type.
+A Schema describe a sequence of columns in tabular data.

Review Comment:
   ```suggestion
   A schema describes a sequence of columns in tabular data, and consists
   of a list of fields.
   ```



##########
java/source/create.rst:
##########
@@ -4,33 +4,41 @@
 Creating Arrow Objects
 ======================
 
-| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+A vector is the basic unit in the Arrow Java library. Data types
+describe the types of values; ValueVectors are sequences of typed
+values. Vector by definition is intended to be mutable, a Vector
+can be changed it is mutable. Vector represent a one-dimensional
+sequence of homogeneous values.

Review Comment:
   ```suggestion
   values. Vectors represent a one-dimensional sequence of values of
   the same type. They are mutable containers.
   ```



##########
java/source/create.rst:
##########
@@ -65,81 +76,38 @@ Array of List
 
 .. testcode::
 
+    import org.apache.arrow.memory.BufferAllocator;
     import org.apache.arrow.memory.RootAllocator;
     import org.apache.arrow.vector.complex.impl.UnionListWriter;
     import org.apache.arrow.vector.complex.ListVector;
 
-    RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
-    ListVector listVector = ListVector.empty("listVector", rootAllocator);
-    UnionListWriter listWriter = listVector.getWriter();
-    int[] data = new int[] { 1, 2, 3, 10, 20, 30, 100, 200, 300, 1000, 2000, 3000 };
-    int tmp_index = 0;
-    for(int i = 0; i < 4; i++) {
-        listWriter.setPosition(i);
-        listWriter.startList();
-        for(int j = 0; j < 3; j++) {
-            listWriter.writeInt(data[tmp_index]);
-            tmp_index = tmp_index + 1;
+    try(
+        BufferAllocator allocator = new RootAllocator();
+        ListVector listVector = ListVector.empty("listVector", allocator);
+        UnionListWriter listWriter = listVector.getWriter()
+    ) {
+        int[] data = new int[] { 1, 2, 3, 10, 20, 30, 100, 200, 300, 1000, 2000, 3000 };
+        int tmp_index = 0;
+        for(int i = 0; i < 4; i++) {
+            listWriter.setPosition(i);
+            listWriter.startList();
+            for(int j = 0; j < 3; j++) {
+                listWriter.writeInt(data[tmp_index]);
+                tmp_index = tmp_index + 1;
+            }
+            listWriter.setValueCount(3);
+            listWriter.endList();
         }
-        listWriter.setValueCount(3);
-        listWriter.endList();
-    }
-    listVector.setValueCount(4);
+        listVector.setValueCount(4);
 
-    System.out.print(listVector);
+        System.out.print(listVector);
+    } catch (Exception e) {
+        e.printStackTrace();
+    }
 
 .. testoutput::
 
     [[1,2,3], [10,20,30], [100,200,300], [1000,2000,3000]]
 
-Creating VectorSchemaRoot (Table)

Review Comment:
   IMO we should keep the VectorSchemaRoot example



##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
     import org.apache.arrow.vector.types.pojo.ArrowType;
     import org.apache.arrow.vector.types.pojo.Field;
     import org.apache.arrow.vector.types.pojo.FieldType;
+    import java.util.ArrayList;
+    import java.util.List;
 
     Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
-    Map<String, String> metadata = new HashMap<>();
-    metadata.put("A", "Id card");
-    metadata.put("B", "Passport");
-    metadata.put("C", "Visa");
-    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
     Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
     FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
     FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
     Field childField = new Field("intCol", intType, null);
     List<Field> childFields = new ArrayList<>();
     childFields.add(childField);
     Field points = new Field("points", listType, childFields);
-
-    // create a definition
     Schema schemaPerson = new Schema(asList(name, document, age, points));
 
-    System.out.print(schemaPerson)
+    System.out.print(schemaPerson);
 
 .. testoutput::
 
     Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
 
+Adding Metadata for Schema

Review Comment:
   IMO, we can just have one section for both field and schema metadata instead of separating it.



##########
java/source/io.rst:
##########
@@ -87,33 +93,39 @@ Write - Out to Buffer
     import org.apache.arrow.vector.VectorSchemaRoot;
     import static java.util.Arrays.asList;
     import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
     import java.io.ByteArrayOutputStream;
     import java.io.IOException;
     import java.nio.channels.Channels;
 
-    try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+    try (
+        BufferAllocator allocator = new RootAllocator()
+    ) {

Review Comment:
   Same here, why the extra lines?



##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
     import org.apache.arrow.vector.types.pojo.ArrowType;
     import org.apache.arrow.vector.types.pojo.Field;
     import org.apache.arrow.vector.types.pojo.FieldType;
+    import java.util.ArrayList;
+    import java.util.List;
 
     Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
-    Map<String, String> metadata = new HashMap<>();
-    metadata.put("A", "Id card");
-    metadata.put("B", "Passport");
-    metadata.put("C", "Visa");
-    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
     Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
     FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
     FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
     Field childField = new Field("intCol", intType, null);
     List<Field> childFields = new ArrayList<>();
     childFields.add(childField);
     Field points = new Field("points", listType, childFields);
-
-    // create a definition
     Schema schemaPerson = new Schema(asList(name, document, age, points));
 
-    System.out.print(schemaPerson)
+    System.out.print(schemaPerson);
 
 .. testoutput::
 
     Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
 
+Adding Metadata for Schema

Review Comment:
   ```suggestion
   Adding Metadata to Schemas
   ```



##########
java/source/create.rst:
##########
@@ -4,33 +4,41 @@
 Creating Arrow Objects
 ======================
 
-| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+A vector is the basic unit in the Arrow Java library. Data types
+describe the types of values; ValueVectors are sequences of typed
+values. Vector by definition is intended to be mutable, a Vector
+can be changed it is mutable. Vector represent a one-dimensional
+sequence of homogeneous values.
 
-| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+Vectors are provided by java arrow for the interface `FieldVector`_
+that extends `ValueVector`_.

Review Comment:
   ```suggestion
   Vectors implement the interface `ValueVector`_. The Arrow libraries provide
   implementations of vectors for various data types.
   ```



##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
     import org.apache.arrow.vector.types.pojo.ArrowType;
     import org.apache.arrow.vector.types.pojo.Field;
     import org.apache.arrow.vector.types.pojo.FieldType;
+    import java.util.ArrayList;
+    import java.util.List;
 
     Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
-    Map<String, String> metadata = new HashMap<>();
-    metadata.put("A", "Id card");
-    metadata.put("B", "Passport");
-    metadata.put("C", "Visa");
-    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
     Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
     FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
     FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
     Field childField = new Field("intCol", intType, null);
     List<Field> childFields = new ArrayList<>();
     childFields.add(childField);
     Field points = new Field("points", listType, childFields);
-
-    // create a definition
     Schema schemaPerson = new Schema(asList(name, document, age, points));
 
-    System.out.print(schemaPerson)
+    System.out.print(schemaPerson);
 
 .. testoutput::
 
     Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
 
+Adding Metadata for Schema
+==========================
+
+In case we need to add metadata to our definition we could use:
+
+.. testcode::
+
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import static java.util.Arrays.asList;
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    import java.util.ArrayList;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Map;
+
+    Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
+    Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+    FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+    Field childField = new Field("intCol", intType, null);
+    List<Field> childFields = new ArrayList<>();
+    childFields.add(childField);
+    Field points = new Field("points", listType, childFields);
+    Map<String, String> metadataSchema = new HashMap<>();
+    metadataSchema.put("Key-1", "Value-1");
+    Schema schemaPerson = new Schema(asList(name, document, age, points), metadataSchema);
+
+    System.out.print(schemaPerson);
+
+.. testoutput::
+
+    Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>(metadata: {Key-1=Value-1})
+
 Populate Data
 =============
 
+Let's populate a `VectorSchemaRoot` with a small batch of records:

Review Comment:
   ```suggestion
   Let's populate a ``VectorSchemaRoot`` with a small batch of records:
   ```



##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
     import org.apache.arrow.vector.types.pojo.ArrowType;
     import org.apache.arrow.vector.types.pojo.Field;
     import org.apache.arrow.vector.types.pojo.FieldType;
+    import java.util.ArrayList;
+    import java.util.List;
 
     Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
-    Map<String, String> metadata = new HashMap<>();
-    metadata.put("A", "Id card");
-    metadata.put("B", "Passport");
-    metadata.put("C", "Visa");
-    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
     Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
     FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
     FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
     Field childField = new Field("intCol", intType, null);
     List<Field> childFields = new ArrayList<>();
     childFields.add(childField);
     Field points = new Field("points", listType, childFields);
-
-    // create a definition
     Schema schemaPerson = new Schema(asList(name, document, age, points));
 
-    System.out.print(schemaPerson)
+    System.out.print(schemaPerson);
 
 .. testoutput::
 
     Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
 
+Adding Metadata for Schema
+==========================
+
+In case we need to add metadata to our definition we could use:
+
+.. testcode::
+
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import static java.util.Arrays.asList;

Review Comment:
   Can we sort the imports?



##########
java/source/schema.rst:
##########
@@ -2,16 +2,16 @@
 Working with Schema
 ===================
 
-Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation.
-Consider that each name on the schema maps to a columns for a predefined data type
-
+Let's start talk about tabular data. Data often comes in the form of two-dimensional
+sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
 
 .. contents::
 
-Define Data Type
-================
+Creating Field
+==============

Review Comment:
   ```suggestion
   Creating Fields
   ===============
   ```



##########
java/source/schema.rst:
##########
@@ -2,16 +2,16 @@
 Working with Schema
 ===================
 
-Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation.
-Consider that each name on the schema maps to a columns for a predefined data type
-
+Let's start talk about tabular data. Data often comes in the form of two-dimensional
+sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.

Review Comment:
   ```suggestion
   Let's start talking about tabular data. Data often comes in the form of two-dimensional
   sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
   several abstractions to handle such data conveniently and efficiently.
   ```



##########
java/source/io.rst:
##########
@@ -33,30 +34,34 @@ Write - Out to File
     import org.apache.arrow.vector.VectorSchemaRoot;
     import static java.util.Arrays.asList;
     import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
     import java.io.File;
     import java.io.FileOutputStream;
     import java.io.IOException;
 
-    try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+    try (
+        BufferAllocator allocator = new RootAllocator()
+    ) {
         Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
         Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
         Schema schemaPerson = new Schema(asList(name, age));
-        try(VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator)){
+        try(
+            VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, allocator);
             VarCharVector nameVector = (VarCharVector) vectorSchemaRoot.getVector("name");
+            IntVector ageVector = (IntVector) vectorSchemaRoot.getVector("age")

Review Comment:
   You shouldn't need to close the vectors if they're part of a root, the root will close them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org