You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/15 15:02:21 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

lidavidm commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827076543



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata

Review comment:
       ```suggestion
       // Create a column "document" of string type with metadata
   ```

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional

Review comment:
       Tables don't exist in Java. This should talk about VectorSchemaRoot.

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);
+
+Tables
+======
+
+There is not a object or implementation of Table on java side. More close definition
+like a table could be VectorSchemaRoot (see the next section).

Review comment:
       IMO we don't need a whole section for this. In the next section we can say something like "VectorSchemaRoot is somewhat analogous to tables and record batches in the other Arrow implementations in that they all are 2D datasets, but the usage is different."

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -72,3 +131,6 @@ A new :class:`VectorSchemaRoot` could be sliced from an existing instance with z
     // 0 indicates start index (inclusive) and 5 indicated length (exclusive).
     VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);
 
+.. _`Field`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Field.html
+.. _`Schema`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Schema.html
+.. _`Flight`: https://github.com/apache/arrow/tree/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight

Review comment:
       Link to the API docs instead?

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);

Review comment:
       import java.util.Arrays?

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata

Review comment:
       ```suggestion
   type, and some optional key-value metadata.
   ```

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -50,8 +109,8 @@ Here is the example of building a :class:`VectorSchemaRoot`
     VectorSchemaRoot vectorSchemaRoot = new VectorSchemaRoot(fields, vectors);
 
 The vectors within a :class:`VectorSchemaRoot` could be loaded/unloaded via :class:`VectorLoader` and :class:`VectorUnloader`.
-:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch`(
-representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
+:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch` (
+representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message ). Examples as below

Review comment:
       ```suggestion
   representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org