You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 12:27:09 UTC
[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #183: ARROW-16170: [Java][Docs] Synch java code tutorial with java cookbook
lidavidm commented on code in PR #183:
URL: https://github.com/apache/arrow-cookbook/pull/183#discussion_r848369016
##########
java/source/data.rst:
##########
@@ -15,7 +15,7 @@ Compare Vectors for Field Equality
import org.apache.arrow.vector.compare.TypeEqualsVisitor;
import org.apache.arrow.memory.RootAllocator;
- RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
+ RootAllocator rootAllocator = new RootAllocator();
Review Comment:
Can we fix all of these to be `BufferAllocator allocator = ...`?
##########
java/source/io.rst:
##########
@@ -33,30 +34,34 @@ Write - Out to File
import org.apache.arrow.vector.VectorSchemaRoot;
import static java.util.Arrays.asList;
import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
- try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+ try (
+ BufferAllocator allocator = new RootAllocator()
Review Comment:
Why the extra lines here?
##########
java/source/schema.rst:
##########
@@ -58,7 +58,7 @@ Definition of columnar fields for string (name), integer (age) and array (points
points: List<intCol: Int(32, true)>
-Define Metadata for Field
+Adding Metadata for Field
=========================
Review Comment:
```suggestion
Adding Metadata to Fields
=========================
```
##########
java/source/schema.rst:
##########
@@ -82,10 +81,10 @@ In case we need to add metadata to our definition we could use:
{A=Id card, B=Passport, C=Visa}
-Create the Schema
-=================
+Creating the Schema
+===================
-A schema is a list of Fields, where each Field is defined by name and type.
+A Schema describe a sequence of columns in tabular data.
Review Comment:
```suggestion
A schema describes a sequence of columns in tabular data, and consists
of a list of fields.
```
##########
java/source/create.rst:
##########
@@ -4,33 +4,41 @@
Creating Arrow Objects
======================
-| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+A vector is the basic unit in the Arrow Java library. Data types
+describe the types of values; ValueVectors are sequences of typed
+values. Vector by definition is intended to be mutable, a Vector
+can be changed it is mutable. Vector represent a one-dimensional
+sequence of homogeneous values.
Review Comment:
```suggestion
values. Vectors represent a one-dimensional sequence of values of
the same type. They are mutable containers.
```
##########
java/source/create.rst:
##########
@@ -65,81 +76,38 @@ Array of List
.. testcode::
+ import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.complex.impl.UnionListWriter;
import org.apache.arrow.vector.complex.ListVector;
- RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
- ListVector listVector = ListVector.empty("listVector", rootAllocator);
- UnionListWriter listWriter = listVector.getWriter();
- int[] data = new int[] { 1, 2, 3, 10, 20, 30, 100, 200, 300, 1000, 2000, 3000 };
- int tmp_index = 0;
- for(int i = 0; i < 4; i++) {
- listWriter.setPosition(i);
- listWriter.startList();
- for(int j = 0; j < 3; j++) {
- listWriter.writeInt(data[tmp_index]);
- tmp_index = tmp_index + 1;
+ try(
+ BufferAllocator allocator = new RootAllocator();
+ ListVector listVector = ListVector.empty("listVector", allocator);
+ UnionListWriter listWriter = listVector.getWriter()
+ ) {
+ int[] data = new int[] { 1, 2, 3, 10, 20, 30, 100, 200, 300, 1000, 2000, 3000 };
+ int tmp_index = 0;
+ for(int i = 0; i < 4; i++) {
+ listWriter.setPosition(i);
+ listWriter.startList();
+ for(int j = 0; j < 3; j++) {
+ listWriter.writeInt(data[tmp_index]);
+ tmp_index = tmp_index + 1;
+ }
+ listWriter.setValueCount(3);
+ listWriter.endList();
}
- listWriter.setValueCount(3);
- listWriter.endList();
- }
- listVector.setValueCount(4);
+ listVector.setValueCount(4);
- System.out.print(listVector);
+ System.out.print(listVector);
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
.. testoutput::
[[1,2,3], [10,20,30], [100,200,300], [1000,2000,3000]]
-Creating VectorSchemaRoot (Table)
Review Comment:
IMO we should keep the VectorSchemaRoot example
##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
+ import java.util.ArrayList;
+ import java.util.List;
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
- Map<String, String> metadata = new HashMap<>();
- metadata.put("A", "Id card");
- metadata.put("B", "Passport");
- metadata.put("C", "Visa");
- Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+ Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
Field childField = new Field("intCol", intType, null);
List<Field> childFields = new ArrayList<>();
childFields.add(childField);
Field points = new Field("points", listType, childFields);
-
- // create a definition
Schema schemaPerson = new Schema(asList(name, document, age, points));
- System.out.print(schemaPerson)
+ System.out.print(schemaPerson);
.. testoutput::
Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+Adding Metadata for Schema
Review Comment:
IMO, we can just have one section for both field and schema metadata instead of separating it.
##########
java/source/io.rst:
##########
@@ -87,33 +93,39 @@ Write - Out to Buffer
import org.apache.arrow.vector.VectorSchemaRoot;
import static java.util.Arrays.asList;
import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.channels.Channels;
- try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+ try (
+ BufferAllocator allocator = new RootAllocator()
+ ) {
Review Comment:
Same here, why the extra lines?
##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
+ import java.util.ArrayList;
+ import java.util.List;
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
- Map<String, String> metadata = new HashMap<>();
- metadata.put("A", "Id card");
- metadata.put("B", "Passport");
- metadata.put("C", "Visa");
- Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+ Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
Field childField = new Field("intCol", intType, null);
List<Field> childFields = new ArrayList<>();
childFields.add(childField);
Field points = new Field("points", listType, childFields);
-
- // create a definition
Schema schemaPerson = new Schema(asList(name, document, age, points));
- System.out.print(schemaPerson)
+ System.out.print(schemaPerson);
.. testoutput::
Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+Adding Metadata for Schema
Review Comment:
```suggestion
Adding Metadata to Schemas
```
##########
java/source/create.rst:
##########
@@ -4,33 +4,41 @@
Creating Arrow Objects
======================
-| A vector is the basic unit in the Arrow Java library. Vector by definition is intended to be mutable, a Vector can be changed it is mutable.
+A vector is the basic unit in the Arrow Java library. Data types
+describe the types of values; ValueVectors are sequences of typed
+values. Vector by definition is intended to be mutable, a Vector
+can be changed it is mutable. Vector represent a one-dimensional
+sequence of homogeneous values.
-| Vectors are provided by java arrow for the interface `FieldVector <https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/FieldVector.html>`_ that extends `ValueVector <https://arrow.apache.org/docs/java/vector.html>`_.
+Vectors are provided by java arrow for the interface `FieldVector`_
+that extends `ValueVector`_.
Review Comment:
```suggestion
Vectors implement the interface `ValueVector`_. The Arrow libraries provide
implementations of vectors for various data types.
```
##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
+ import java.util.ArrayList;
+ import java.util.List;
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
- Map<String, String> metadata = new HashMap<>();
- metadata.put("A", "Id card");
- metadata.put("B", "Passport");
- metadata.put("C", "Visa");
- Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+ Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
Field childField = new Field("intCol", intType, null);
List<Field> childFields = new ArrayList<>();
childFields.add(childField);
Field points = new Field("points", listType, childFields);
-
- // create a definition
Schema schemaPerson = new Schema(asList(name, document, age, points));
- System.out.print(schemaPerson)
+ System.out.print(schemaPerson);
.. testoutput::
Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+Adding Metadata for Schema
+==========================
+
+In case we need to add metadata to our definition we could use:
+
+.. testcode::
+
+ import org.apache.arrow.vector.types.pojo.Schema;
+ import static java.util.Arrays.asList;
+ import org.apache.arrow.vector.types.pojo.ArrowType;
+ import org.apache.arrow.vector.types.pojo.Field;
+ import org.apache.arrow.vector.types.pojo.FieldType;
+
+ import java.util.ArrayList;
+ import java.util.HashMap;
+ import java.util.List;
+ import java.util.Map;
+
+ Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
+ Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
+ Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
+ FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
+ FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
+ Field childField = new Field("intCol", intType, null);
+ List<Field> childFields = new ArrayList<>();
+ childFields.add(childField);
+ Field points = new Field("points", listType, childFields);
+ Map<String, String> metadataSchema = new HashMap<>();
+ metadataSchema.put("Key-1", "Value-1");
+ Schema schemaPerson = new Schema(asList(name, document, age, points), metadataSchema);
+
+ System.out.print(schemaPerson);
+
+.. testoutput::
+
+ Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>(metadata: {Key-1=Value-1})
+
Populate Data
=============
+Let's populate a `VectorSchemaRoot` with a small batch of records:
Review Comment:
```suggestion
Let's populate a ``VectorSchemaRoot`` with a small batch of records:
```
##########
java/source/schema.rst:
##########
@@ -94,122 +93,135 @@ A schema is a list of Fields, where each Field is defined by name and type.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
+ import java.util.ArrayList;
+ import java.util.List;
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
- Map<String, String> metadata = new HashMap<>();
- metadata.put("A", "Id card");
- metadata.put("B", "Passport");
- metadata.put("C", "Visa");
- Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null, metadata), null);
+ Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), null), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
FieldType intType = new FieldType(true, new ArrowType.Int(32, true), /*dictionary=*/null);
FieldType listType = new FieldType(true, new ArrowType.List(), /*dictionary=*/null);
Field childField = new Field("intCol", intType, null);
List<Field> childFields = new ArrayList<>();
childFields.add(childField);
Field points = new Field("points", listType, childFields);
-
- // create a definition
Schema schemaPerson = new Schema(asList(name, document, age, points));
- System.out.print(schemaPerson)
+ System.out.print(schemaPerson);
.. testoutput::
Schema<name: Utf8, document: Utf8, age: Int(32, true), points: List<intCol: Int(32, true)>>
+Adding Metadata for Schema
+==========================
+
+In case we need to add metadata to our definition we could use:
+
+.. testcode::
+
+ import org.apache.arrow.vector.types.pojo.Schema;
+ import static java.util.Arrays.asList;
Review Comment:
Can we sort the imports?
##########
java/source/schema.rst:
##########
@@ -2,16 +2,16 @@
Working with Schema
===================
-Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation.
-Consider that each name on the schema maps to a columns for a predefined data type
-
+Let's start talk about tabular data. Data often comes in the form of two-dimensional
+sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
.. contents::
-Define Data Type
-================
+Creating Field
+==============
Review Comment:
```suggestion
Creating Fields
===============
```
##########
java/source/schema.rst:
##########
@@ -2,16 +2,16 @@
Working with Schema
===================
-Common definition of table has an schema. Java arrow is columnar oriented and it also has an schema representation.
-Consider that each name on the schema maps to a columns for a predefined data type
-
+Let's start talk about tabular data. Data often comes in the form of two-dimensional
+sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
Review Comment:
```suggestion
Let's start talking about tabular data. Data often comes in the form of two-dimensional
sets of heterogeneous data (such as database tables, CSV files...). Arrow provides
several abstractions to handle such data conveniently and efficiently.
```
##########
java/source/io.rst:
##########
@@ -33,30 +34,34 @@ Write - Out to File
import org.apache.arrow.vector.VectorSchemaRoot;
import static java.util.Arrays.asList;
import org.apache.arrow.vector.ipc.ArrowFileWriter;
-
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
- try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+ try (
+ BufferAllocator allocator = new RootAllocator()
+ ) {
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
Schema schemaPerson = new Schema(asList(name, age));
- try(VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator)){
+ try(
+ VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, allocator);
VarCharVector nameVector = (VarCharVector) vectorSchemaRoot.getVector("name");
+ IntVector ageVector = (IntVector) vectorSchemaRoot.getVector("age")
Review Comment:
You shouldn't need to close the vectors if they're part of a root, the root will close them
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org