You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "pitrou (via GitHub)" <gi...@apache.org> on 2023/09/28 14:01:40 UTC

[GitHub] [arrow] pitrou opened a new pull request, #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

pitrou opened a new pull request, #37935:
URL: https://github.com/apache/arrow/pull/37935

   ### Rationale for this change
   
   gh-37537 added integration testing for the C Data Interface, but the documentation was not updated.
   
   ### What changes are included in this PR?
   
   Add documentation for C Data Interface integration testing.
   
   ### Are these changes tested?
   
   N/A, only doc changes.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37935:
URL: https://github.com/apache/arrow/pull/37935#discussion_r1340228725


##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:

Review Comment:
   I'll mention data generation, though a couple of them are checked in as well (the "gold files").



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #37935:
URL: https://github.com/apache/arrow/pull/37935#issuecomment-1739283546

   @github-actions crossbow submit preview-docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #37935:
URL: https://github.com/apache/arrow/pull/37935#issuecomment-1739290836

   Revision: 48fdc8ad8df45af0ec481c9ee485aba97ba98f83
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-53333988cf](https://github.com/ursacomputing/crossbow/branches/all?query=actions-53333988cf)
   
   |Task|Status|
   |----|------|
   |preview-docs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-53333988cf-github-preview-docs)](https://github.com/ursacomputing/crossbow/actions/runs/6339997695/job/17220432540)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #37935:
URL: https://github.com/apache/arrow/pull/37935#discussion_r1340235939


##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:
+
+#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
+   and writes a Arrow IPC file (the file paths are typically given on the command

Review Comment:
   yep, oops



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #37935:
URL: https://github.com/apache/arrow/pull/37935


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #37935:
URL: https://github.com/apache/arrow/pull/37935#issuecomment-1739387845

   @github-actions crossbow submit preview-docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37935:
URL: https://github.com/apache/arrow/pull/37935#discussion_r1340228329


##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:
+
+#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
+   and writes a Arrow IPC file (the file paths are typically given on the command

Review Comment:
   Hmm, is your suggestion a typo? Did you mean "an"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #37935:
URL: https://github.com/apache/arrow/pull/37935#issuecomment-1740283062

   After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e9730f5971480b942c7394846162c4dfa9145aa9.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/17241633662) has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #37935:
URL: https://github.com/apache/arrow/pull/37935#issuecomment-1739395782

   Revision: 1368d28ee2e1771ab9e2e39548394ab543d0572b
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-ec09d2921f](https://github.com/ursacomputing/crossbow/branches/all?query=actions-ec09d2921f)
   
   |Task|Status|
   |----|------|
   |preview-docs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-ec09d2921f-github-preview-docs)](https://github.com/ursacomputing/crossbow/actions/runs/6340443888/job/17221824724)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #37935:
URL: https://github.com/apache/arrow/pull/37935#discussion_r1340220584


##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:
+
+#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
+   and writes a Arrow IPC file (the file paths are typically given on the command

Review Comment:
   ```suggestion
      and writes aa Arrow IPC file (the file paths are typically given on the command
   ```



##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:
+
+#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
+   and writes a Arrow IPC file (the file paths are typically given on the command
+   line).
+
+#. A Java executable reads the JSON file, converts it into Arrow in-memory data;
+   it also reads the Arrow IPC file. Finally, it validates that both Arrow

Review Comment:
   ```suggestion
      it also reads the Arrow IPC file generated by C++. Finally, it validates that both Arrow
   ```



##########
docs/source/format/Integration.rst:
##########
@@ -20,32 +20,97 @@
 Integration Testing
 ===================
 
+To ensure Arrow implementations are interoperable between each other,
+the Arrow project includes cross-language integration tests which are
+regularly run as Continuous Integration tasks.
+
+The integration tests exercise compliance with several Arrow specifications:
+the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
+and the :ref:`C Data Interface <c-data-interface>`.
+
+Strategy
+--------
+
 Our strategy for integration testing between Arrow implementations is:
 
 * Test datasets are specified in a custom human-readable, JSON-based format
-  designed exclusively for Arrow's integration tests
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-* Each testing executable is used to generate binary Arrow file representations
-  from the JSON-based test datasets. These results are then used to call the
-  testing executable of each other implementation to validate the contents
-  against the corresponding JSON file.
-  - *ie.* the C++ testing executable generates binary arrow files from JSON
-  specified datasets. The resulting files are then used as input to the Java
-  testing executable for validation, confirming that the Java implementation 
-  can correctly read what the C++ implementation wrote.
+  designed exclusively for Arrow's integration tests.
+
+* Each implementation provides entry points capable of converting
+  between the JSON and the Arrow in-memory representation, and of exposing
+  Arrow in-memory data using the desired format.
+
+* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
+  all supported pairs of (producer, consumer) implementations. The producer
+  typically reads a JSON file, converts it to in-memory Arrow data, and exposes
+  this data using the format under test. The consumer reads the data in the
+  said format and converts it back to Arrow in-memory data; it also reads
+  the same JSON file as the producer, and validates that both datasets are
+  identical.
+
+* Each (producer, consumer) pair is tested over a range of JSON files
+  representing different data type categories, such as numerics, lists, etc.
+  This makes it easier to pinpoint incompatibilities than if all data types
+  were represented in a single file.
+
+Example: IPC format
+~~~~~~~~~~~~~~~~~~~
+
+Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
+of the Arrow IPC format. Testing a JSON file would go as follows:

Review Comment:
   Should we mention `datagen.py` here or in the strategy section? Without the mention a reader might expect the JSON files to be checked in somewhere



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org