You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/15 13:57:49 UTC

[GitHub] [arrow] davisusanibar opened a new pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

davisusanibar opened a new pull request #12634:
URL: https://github.com/apache/arrow/pull/12634


   Update current VectorSchemaRoot documentation to be more generic, change title to tabular data. Adding definitions of:
   
   - Field
   - Schema


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1068058227






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1070897803


   > Can you rebase to kick off the pipelines?
   
   Added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r829123862



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type, and some optional key-value metadata.
+
+.. code-block:: Java
+
+    // Create a column "document" of string type with metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure consisting of any number of columns. It holds a sequence of fields together
+with some optional schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import static java.util.Arrays.asList;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(asList(a, b), metadata);
+
 VectorSchemaRoot
 ================
+
+.. note::
+
+    VectorSchemaRoot is somewhat analogous to tables and record batches in the other Arrow implementations
+    in that they all are 2D datasets, but the usage is different.
+
 A :class:`VectorSchemaRoot` is a container that can hold batches, batches flow through :class:`VectorSchemaRoot`

Review comment:
       Not so sure about this since we haven't introduced batches yet at this point…but we can revisit this later.

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.

Review comment:
       Wait, sorry I missed this. We should not talk about tables in Java. Since we haven't introduced VectorSchemaRoot yet, we can talk about "tabular data" abstractly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r829123862



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type, and some optional key-value metadata.
+
+.. code-block:: Java
+
+    // Create a column "document" of string type with metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure consisting of any number of columns. It holds a sequence of fields together
+with some optional schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import static java.util.Arrays.asList;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(asList(a, b), metadata);
+
 VectorSchemaRoot
 ================
+
+.. note::
+
+    VectorSchemaRoot is somewhat analogous to tables and record batches in the other Arrow implementations
+    in that they all are 2D datasets, but the usage is different.
+
 A :class:`VectorSchemaRoot` is a container that can hold batches, batches flow through :class:`VectorSchemaRoot`

Review comment:
       Not so sure about this since we haven't introduced batches yet at this point…but we can revisit this later.

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.

Review comment:
       Wait, sorry I missed this. We should not talk about tables in Java. Since we haven't introduced VectorSchemaRoot yet, we can talk about "tabular data" abstractly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827478781



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827478575



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1069477644


   > Looks good to me. Is this ready for review?
   
   Yes, it is ready, I forgot to change the title of the PR. Thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827478384



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -72,3 +131,6 @@ A new :class:`VectorSchemaRoot` could be sliced from an existing instance with z
     // 0 indicates start index (inclusive) and 5 indicated length (exclusive).
     VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);
 
+.. _`Field`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Field.html
+.. _`Schema`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Schema.html
+.. _`Flight`: https://github.com/apache/arrow/tree/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r836198571



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.

Review comment:
       ```suggestion
   Fields are used to denote the particular columns of tabular data.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827478534



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);
+
+Tables
+======
+
+There is not a object or implementation of Table on java side. More close definition
+like a table could be VectorSchemaRoot (see the next section).

Review comment:
       Deleted and added as a note on VectorSchemRoot 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] davisusanibar commented on pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
davisusanibar commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1070897803


   > Can you rebase to kick off the pipelines?
   
   Added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12634: ARROW-15576: [Java][Doc] Document VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1080681651


   Benchmark runs are scheduled for baseline = 495eb168e869b210d3c055811d30dec7abc4e30d and contender = 68164c8df941c97b8fabaa4d0fd417e1905895a7. 68164c8df941c97b8fabaa4d0fd417e1905895a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/07062a47567148dbb6959ac5126d265c...dad7c884cd334f2da9712dd5a122869b/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3e221e58ade24c2790b1c9dc8357347e...32be7f52132a4885a6f42adb11adea8d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1d6e0b89bb304fb4bcd84657f8ea363b...f7c68b0e0bc84847a396dccdf7ffc304/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a673237cfca5474183c66614b3666ec3...a7cea01cb1834f2cb5d8262c50ccf3e9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12634: ARROW-15576: [Java][Doc] Document VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1080681651


   Benchmark runs are scheduled for baseline = 495eb168e869b210d3c055811d30dec7abc4e30d and contender = 68164c8df941c97b8fabaa4d0fd417e1905895a7. 68164c8df941c97b8fabaa4d0fd417e1905895a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/07062a47567148dbb6959ac5126d265c...dad7c884cd334f2da9712dd5a122869b/)
   [Finished :arrow_down:0.04% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3e221e58ade24c2790b1c9dc8357347e...32be7f52132a4885a6f42adb11adea8d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1d6e0b89bb304fb4bcd84657f8ea363b...f7c68b0e0bc84847a396dccdf7ffc304/)
   [Finished :arrow_down:5.49% :arrow_up:16.76% :warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a673237cfca5474183c66614b3666ec3...a7cea01cb1834f2cb5d8262c50ccf3e9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12634: ARROW-15576: [Java][Doc] Document VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1080681651


   Benchmark runs are scheduled for baseline = 495eb168e869b210d3c055811d30dec7abc4e30d and contender = 68164c8df941c97b8fabaa4d0fd417e1905895a7. 68164c8df941c97b8fabaa4d0fd417e1905895a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/07062a47567148dbb6959ac5126d265c...dad7c884cd334f2da9712dd5a122869b/)
   [Finished :arrow_down:0.04% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3e221e58ade24c2790b1c9dc8357347e...32be7f52132a4885a6f42adb11adea8d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1d6e0b89bb304fb4bcd84657f8ea363b...f7c68b0e0bc84847a396dccdf7ffc304/)
   [Finished :arrow_down:5.49% :arrow_up:16.76% :warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a673237cfca5474183c66614b3666ec3...a7cea01cb1834f2cb5d8262c50ccf3e9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #12634: ARROW-15576: [Java][Doc] Document VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12634:
URL: https://github.com/apache/arrow/pull/12634


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12634: ARROW-15576: [Java][Doc] WIP Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r827076543



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata

Review comment:
       ```suggestion
       // Create a column "document" of string type with metadata
   ```

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional

Review comment:
       Tables don't exist in Java. This should talk about VectorSchemaRoot.

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);
+
+Tables
+======
+
+There is not a object or implementation of Table on java side. More close definition
+like a table could be VectorSchemaRoot (see the next section).

Review comment:
       IMO we don't need a whole section for this. In the next section we can say something like "VectorSchemaRoot is somewhat analogous to tables and record batches in the other Arrow implementations in that they all are 2D datasets, but the usage is different."

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -72,3 +131,6 @@ A new :class:`VectorSchemaRoot` could be sliced from an existing instance with z
     // 0 indicates start index (inclusive) and 5 indicated length (exclusive).
     VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);
 
+.. _`Field`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Field.html
+.. _`Schema`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Schema.html
+.. _`Flight`: https://github.com/apache/arrow/tree/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight

Review comment:
       Link to the API docs instead?

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata
+
+.. code-block:: Java
+
+    // Create a column A with utf8 string column and metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure of a two-dimensional dataset such
+as a table.  It holds a sequence of fields together with some optional
+schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(java.util.Arrays.asList(a, b), metadata);

Review comment:
       import java.util.Arrays?

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,80 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type and some optional metadata

Review comment:
       ```suggestion
   type, and some optional key-value metadata.
   ```

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -50,8 +109,8 @@ Here is the example of building a :class:`VectorSchemaRoot`
     VectorSchemaRoot vectorSchemaRoot = new VectorSchemaRoot(fields, vectors);
 
 The vectors within a :class:`VectorSchemaRoot` could be loaded/unloaded via :class:`VectorLoader` and :class:`VectorUnloader`.
-:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch`(
-representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
+:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch` (
+representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message ). Examples as below

Review comment:
       ```suggestion
   representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12634: ARROW-15576: [Java][Doc] Apache Arrow VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1069532109


   Can you rebase to kick off the pipelines?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12634: ARROW-15576: [Java][Doc] Document VectorSchemaRoots for 2D data

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#issuecomment-1080681651


   Benchmark runs are scheduled for baseline = 495eb168e869b210d3c055811d30dec7abc4e30d and contender = 68164c8df941c97b8fabaa4d0fd417e1905895a7. 68164c8df941c97b8fabaa4d0fd417e1905895a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/07062a47567148dbb6959ac5126d265c...dad7c884cd334f2da9712dd5a122869b/)
   [Finished :arrow_down:0.04% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3e221e58ade24c2790b1c9dc8357347e...32be7f52132a4885a6f42adb11adea8d/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1d6e0b89bb304fb4bcd84657f8ea363b...f7c68b0e0bc84847a396dccdf7ffc304/)
   [Finished :arrow_down:5.49% :arrow_up:16.76% :warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a673237cfca5474183c66614b3666ec3...a7cea01cb1834f2cb5d8262c50ccf3e9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org