You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "pitrou (via GitHub)" <gi...@apache.org> on 2023/09/18 14:57:42 UTC

[GitHub] [arrow] pitrou opened a new pull request, #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

pitrou opened a new pull request, #37769:
URL: https://github.com/apache/arrow/pull/37769

   ### Rationale for this change
   
   Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.
   
   ### What changes are included in this PR?
   
   1. Add Archery infrastructure for integration testing of the C Data Interface
   2. Add implementation of this interface for Arrow C++
   
   ### Are these changes tested?
   
   Yes, by construction.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.

conbench-apache-arrow[bot] commented on PR #37769:
URL: https://github.com/apache/arrow/pull/37769#issuecomment-1726228339

   After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b646ad4c2b826fe08b31d19e6435f73650bcb5e.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/16936555638) has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.

bkietz commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330233148


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   I prefer mine for less inversion of control since the exporter doesn't need to manage the importer's garbage collection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329621605


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]
+                                 ) -> bool:
+        """
+        Compare the current memory allocation state with the recorded one.
+
+        Parameters
+        ----------
+        recorded : object
+            The previous allocation state returned by
+            `record_allocation_state()`
+        gc_until : callable
+            A callable itself accepting a callable predicate, and
+            returning a boolean.
+            `gc_until` should try to release memory until the predicate
+            becomes true, or until it decides to give up. The final value
+            of the predicate should be returned.
+            `gc_until` is typically provided by the C Data Interface importer.
+
+        Returns
+        -------
+        success : bool
+            Whether memory allocation state finally reached its previously
+            recorded value.
+        """
+        raise NotImplementedError
+
+
+class CDataImporter(ABC):
+
+    @abstractmethod
+    def import_schema_and_compare_to_json(self, json_path: os.PathLike,
+                                          c_schema_ptr: object):
+        """
+        Import schema and compare it to the schema of a JSON integration file.
+
+        An error is raised if importing fails or the schemas differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to import from.
+        """
+
+    @abstractmethod
+    def import_batch_and_compare_to_json(self, json_path: os.PathLike,
+                                         num_batch: int,
+                                         c_array_ptr: object):
+        """
+        Import record batch and compare it to one of the batches
+        from a JSON integration file.
+
+        The schema used for importing the record batch is the one from
+        the JSON file.
+
+        An error is raised if importing fails or the batches differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_array_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to import from.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means calling the `release` callback of
+        a C Data Interface export (which should then trigger a deallocation
+        mechanism on the exporter).
+
+        If false, then `gc_until` is allowed to raise NotImplementedError.
+        """
+
+    def gc_until(self, predicate: _Predicate):

Review Comment:
   Actually, it should be an abstract method. Thanks for noticing.



##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]
+                                 ) -> bool:
+        """
+        Compare the current memory allocation state with the recorded one.
+
+        Parameters
+        ----------
+        recorded : object
+            The previous allocation state returned by
+            `record_allocation_state()`
+        gc_until : callable
+            A callable itself accepting a callable predicate, and
+            returning a boolean.
+            `gc_until` should try to release memory until the predicate
+            becomes true, or until it decides to give up. The final value
+            of the predicate should be returned.
+            `gc_until` is typically provided by the C Data Interface importer.
+
+        Returns
+        -------
+        success : bool
+            Whether memory allocation state finally reached its previously
+            recorded value.
+        """
+        raise NotImplementedError
+
+
+class CDataImporter(ABC):
+
+    @abstractmethod
+    def import_schema_and_compare_to_json(self, json_path: os.PathLike,
+                                          c_schema_ptr: object):
+        """
+        Import schema and compare it to the schema of a JSON integration file.
+
+        An error is raised if importing fails or the schemas differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to import from.
+        """
+
+    @abstractmethod
+    def import_batch_and_compare_to_json(self, json_path: os.PathLike,
+                                         num_batch: int,
+                                         c_array_ptr: object):
+        """
+        Import record batch and compare it to one of the batches
+        from a JSON integration file.
+
+        The schema used for importing the record batch is the one from
+        the JSON file.
+
+        An error is raised if importing fails or the batches differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_array_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to import from.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means calling the `release` callback of
+        a C Data Interface export (which should then trigger a deallocation
+        mechanism on the exporter).
+
+        If false, then `gc_until` is allowed to raise NotImplementedError.
+        """
+
+    def gc_until(self, predicate: _Predicate):

Review Comment:
   Yes, you're right. Let me do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329622514


##########
dev/archery/archery/integration/tester_cpp.py:
##########
@@ -133,3 +142,104 @@ def flight_request(self, port, json_path=None, scenario_name=None):
         if self.debug:
             log(' '.join(cmd))
         run_cmd(cmd)
+
+    def make_c_data_exporter(self):
+        return CppCDataExporter(self.debug, self.args)
+
+    def make_c_data_importer(self):
+        return CppCDataImporter(self.debug, self.args)
+
+
+_cpp_c_data_entrypoints = """
+    const char* ArrowCpp_CDataIntegration_ExportSchemaFromJson(
+        const char* json_path, struct ArrowSchema* out);
+    const char* ArrowCpp_CDataIntegration_ImportSchemaAndCompareToJson(
+        const char* json_path, struct ArrowSchema* schema);
+
+    const char* ArrowCpp_CDataIntegration_ExportBatchFromJson(
+        const char* json_path, int num_batch, struct ArrowArray* out);
+    const char* ArrowCpp_CDataIntegration_ImportBatchAndCompareToJson(
+        const char* json_path, int num_batch, struct ArrowArray* batch);
+
+    int64_t ArrowCpp_BytesAllocated();
+    """
+
+
+@functools.lru_cache
+def _load_ffi(ffi, lib_path=_ARROW_DLL):
+    ffi.cdef(_cpp_c_data_entrypoints)
+    dll = ffi.dlopen(lib_path)
+    dll.ArrowCpp_CDataIntegration_ExportSchemaFromJson
+    return dll
+
+
+class _CDataBase:

Review Comment:
   Hmm, I could do the change, but it wouldn't be more concise. Is there a concern with the top-level helper function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330156007


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   It probably could, though I'm not sure it's simpler :-). Both solutions (yours and mine) are IMHO not very elegant, and we may have to revisit if making two GCs coexist ends up more complicated...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.

bkietz commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330083582


##########
dev/archery/archery/integration/tester_cpp.py:
##########
@@ -133,3 +142,104 @@ def flight_request(self, port, json_path=None, scenario_name=None):
         if self.debug:
             log(' '.join(cmd))
         run_cmd(cmd)
+
+    def make_c_data_exporter(self):
+        return CppCDataExporter(self.debug, self.args)
+
+    def make_c_data_importer(self):
+        return CppCDataImporter(self.debug, self.args)
+
+
+_cpp_c_data_entrypoints = """
+    const char* ArrowCpp_CDataIntegration_ExportSchemaFromJson(
+        const char* json_path, struct ArrowSchema* out);
+    const char* ArrowCpp_CDataIntegration_ImportSchemaAndCompareToJson(
+        const char* json_path, struct ArrowSchema* schema);
+
+    const char* ArrowCpp_CDataIntegration_ExportBatchFromJson(
+        const char* json_path, int num_batch, struct ArrowArray* out);
+    const char* ArrowCpp_CDataIntegration_ImportBatchAndCompareToJson(
+        const char* json_path, int num_batch, struct ArrowArray* batch);
+
+    int64_t ArrowCpp_BytesAllocated();
+    """
+
+
+@functools.lru_cache
+def _load_ffi(ffi, lib_path=_ARROW_DLL):
+    ffi.cdef(_cpp_c_data_entrypoints)
+    dll = ffi.dlopen(lib_path)
+    dll.ArrowCpp_CDataIntegration_ExportSchemaFromJson
+    return dll
+
+
+class _CDataBase:

Review Comment:
   It's a nit: a cache implies multiplicity to me, which isn't the intent here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on PR #37769:
URL: https://github.com/apache/arrow/pull/37769#issuecomment-1723725043

   @wjones127 @tustvold, FYI,  following this PR, you'll need to add the `--run-ipc` option in these two places:
   * https://github.com/apache/arrow-rs/blob/master/arrow-integration-testing/README.md
   * https://github.com/apache/arrow-rs/blob/master/.github/workflows/integration.yml#L107
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329621605


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]
+                                 ) -> bool:
+        """
+        Compare the current memory allocation state with the recorded one.
+
+        Parameters
+        ----------
+        recorded : object
+            The previous allocation state returned by
+            `record_allocation_state()`
+        gc_until : callable
+            A callable itself accepting a callable predicate, and
+            returning a boolean.
+            `gc_until` should try to release memory until the predicate
+            becomes true, or until it decides to give up. The final value
+            of the predicate should be returned.
+            `gc_until` is typically provided by the C Data Interface importer.
+
+        Returns
+        -------
+        success : bool
+            Whether memory allocation state finally reached its previously
+            recorded value.
+        """
+        raise NotImplementedError
+
+
+class CDataImporter(ABC):
+
+    @abstractmethod
+    def import_schema_and_compare_to_json(self, json_path: os.PathLike,
+                                          c_schema_ptr: object):
+        """
+        Import schema and compare it to the schema of a JSON integration file.
+
+        An error is raised if importing fails or the schemas differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to import from.
+        """
+
+    @abstractmethod
+    def import_batch_and_compare_to_json(self, json_path: os.PathLike,
+                                         num_batch: int,
+                                         c_array_ptr: object):
+        """
+        Import record batch and compare it to one of the batches
+        from a JSON integration file.
+
+        The schema used for importing the record batch is the one from
+        the JSON file.
+
+        An error is raised if importing fails or the batches differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_array_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to import from.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means calling the `release` callback of
+        a C Data Interface export (which should then trigger a deallocation
+        mechanism on the exporter).
+
+        If false, then `gc_until` is allowed to raise NotImplementedError.
+        """
+
+    def gc_until(self, predicate: _Predicate):

Review Comment:
   Actually, it should be an abstract method. Thanks for noticing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.

bkietz commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329064897


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   I find this parameter's type quite confusing. IIUC it's a cancel token to cut off long-running gc? I'm not sure why this control is given to an exporter or importer. Instead, I would think it would make more sense for the runner `check_memory_released` to construct this token. It can be `Callable[[], bool]` and returns true if GC is taking too long and the runner is aborting the current case



##########
dev/archery/archery/integration/tester_cpp.py:
##########
@@ -133,3 +142,104 @@ def flight_request(self, port, json_path=None, scenario_name=None):
         if self.debug:
             log(' '.join(cmd))
         run_cmd(cmd)
+
+    def make_c_data_exporter(self):
+        return CppCDataExporter(self.debug, self.args)
+
+    def make_c_data_importer(self):
+        return CppCDataImporter(self.debug, self.args)
+
+
+_cpp_c_data_entrypoints = """
+    const char* ArrowCpp_CDataIntegration_ExportSchemaFromJson(
+        const char* json_path, struct ArrowSchema* out);
+    const char* ArrowCpp_CDataIntegration_ImportSchemaAndCompareToJson(
+        const char* json_path, struct ArrowSchema* schema);
+
+    const char* ArrowCpp_CDataIntegration_ExportBatchFromJson(
+        const char* json_path, int num_batch, struct ArrowArray* out);
+    const char* ArrowCpp_CDataIntegration_ImportBatchAndCompareToJson(
+        const char* json_path, int num_batch, struct ArrowArray* batch);
+
+    int64_t ArrowCpp_BytesAllocated();
+    """
+
+
+@functools.lru_cache
+def _load_ffi(ffi, lib_path=_ARROW_DLL):
+    ffi.cdef(_cpp_c_data_entrypoints)
+    dll = ffi.dlopen(lib_path)
+    dll.ArrowCpp_CDataIntegration_ExportSchemaFromJson
+    return dll
+
+
+class _CDataBase:

Review Comment:
   ```suggestion
   class _CDataBase:
       _dll = None
       @property
       def dll(self):
           if _CDataBase._dll is None:
               ...
               _CDataBase._dll = dll
           return _CDataBase._dll
   ```
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330105368


##########
dev/archery/archery/integration/tester_cpp.py:
##########
@@ -133,3 +142,104 @@ def flight_request(self, port, json_path=None, scenario_name=None):
         if self.debug:
             log(' '.join(cmd))
         run_cmd(cmd)
+
+    def make_c_data_exporter(self):
+        return CppCDataExporter(self.debug, self.args)
+
+    def make_c_data_importer(self):
+        return CppCDataImporter(self.debug, self.args)
+
+
+_cpp_c_data_entrypoints = """
+    const char* ArrowCpp_CDataIntegration_ExportSchemaFromJson(
+        const char* json_path, struct ArrowSchema* out);
+    const char* ArrowCpp_CDataIntegration_ImportSchemaAndCompareToJson(
+        const char* json_path, struct ArrowSchema* schema);
+
+    const char* ArrowCpp_CDataIntegration_ExportBatchFromJson(
+        const char* json_path, int num_batch, struct ArrowArray* out);
+    const char* ArrowCpp_CDataIntegration_ImportBatchAndCompareToJson(
+        const char* json_path, int num_batch, struct ArrowArray* batch);
+
+    int64_t ArrowCpp_BytesAllocated();
+    """
+
+
+@functools.lru_cache
+def _load_ffi(ffi, lib_path=_ARROW_DLL):
+    ffi.cdef(_cpp_c_data_entrypoints)
+    dll = ffi.dlopen(lib_path)
+    dll.ArrowCpp_CDataIntegration_ExportSchemaFromJson
+    return dll
+
+
+class _CDataBase:

Review Comment:
   Right, using `lru_cache` is a common idiom for memoization using the Python stdlib.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329621354


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   No, really, it's as the doc states. Some runtimes may need several GC calls to properly release memory, hence the name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wjones127 commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.

wjones127 commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329198105


##########
dev/archery/archery/integration/runner.py:
##########
@@ -380,28 +399,148 @@ def _run_flight_test_case(self,
                 outcome.failure = Failure(test_case, producer, consumer,
                                           sys.exc_info())
 
+        log('=' * 70)
+
+        return outcome
+
+    def _compare_c_data_implementations(
+        self,
+        producer: Tester,
+        consumer: Tester
+    ):
+        log('##########################################################')
+        log(f'C Data Interface: '
+            f'{producer.name} exporting, {consumer.name} importing')
+        log('##########################################################')
+
+        # Serial execution is required for proper memory accounting
+        serial = True
+
+        exporter = producer.make_c_data_exporter()
+        importer = consumer.make_c_data_importer()
+
+        case_runner = partial(self._run_c_schema_test_case, producer, consumer,
+                              exporter, importer)
+        self._run_test_cases(case_runner, self.json_files, serial=serial)
+
+        case_runner = partial(self._run_c_array_test_cases, producer, consumer,
+                              exporter, importer)
+        self._run_test_cases(case_runner, self.json_files, serial=serial)
+
+    def _run_c_schema_test_case(self,
+                                producer: Tester, consumer: Tester,
+                                exporter: CDataExporter,
+                                importer: CDataImporter,
+                                test_case: datagen.File) -> Outcome:
+        """
+        Run one C ArrowSchema test case.
+        """
+        outcome = Outcome()
+
+        def do_run():
+            json_path = test_case.path
+            ffi = cdata.ffi()
+            c_schema_ptr = ffi.new("struct ArrowSchema*")
+            with cdata.check_memory_released(exporter, importer):
+                exporter.export_schema_from_json(json_path, c_schema_ptr)
+                importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
+
+        log('=' * 70)
+        log(f'Testing C ArrowSchema from file {test_case.name!r}')
+
+        if test_case.should_skip(producer.name, SKIP_C_SCHEMA):
+            log(f'-- Skipping test because producer {producer.name} does '
+                f'not support C ArrowSchema')
+            outcome.skipped = True
+
+        elif test_case.should_skip(consumer.name, SKIP_C_SCHEMA):
+            log(f'-- Skipping test because consumer {consumer.name} does '
+                f'not support C ArrowSchema')
+            outcome.skipped = True
+
+        else:
+            try:
+                do_run()
+            except Exception:
+                traceback.print_exc(file=printer.stdout)
+                outcome.failure = Failure(test_case, producer, consumer,
+                                          sys.exc_info())
+
+        log('=' * 70)
+
+        return outcome
+
+    def _run_c_array_test_cases(self,
+                                producer: Tester, consumer: Tester,
+                                exporter: CDataExporter,
+                                importer: CDataImporter,
+                                test_case: datagen.File) -> Outcome:
+        """
+        Run one set C ArrowArray test cases.
+        """
+        outcome = Outcome()
+
+        def do_run():
+            json_path = test_case.path
+            ffi = cdata.ffi()
+            c_array_ptr = ffi.new("struct ArrowArray*")
+            for num_batch in range(test_case.num_batches):
+                log(f'... with record batch #{num_batch}')
+                with cdata.check_memory_released(exporter, importer):
+                    exporter.export_batch_from_json(json_path,
+                                                    num_batch,
+                                                    c_array_ptr)
+                    importer.import_batch_and_compare_to_json(json_path,
+                                                              num_batch,
+                                                              c_array_ptr)
+
+        log('=' * 70)
+        log(f'Testing C ArrowArray '
+            f'from file {test_case.name!r}')
+
+        if test_case.should_skip(producer.name, SKIP_C_ARRAY):
+            log(f'-- Skipping test because producer {producer.name} does '
+                f'not support C ArrowSchema')

Review Comment:
   ```suggestion
                   f'not support C ArrowArray')
   ```



##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]
+                                 ) -> bool:
+        """
+        Compare the current memory allocation state with the recorded one.
+
+        Parameters
+        ----------
+        recorded : object
+            The previous allocation state returned by
+            `record_allocation_state()`
+        gc_until : callable
+            A callable itself accepting a callable predicate, and
+            returning a boolean.
+            `gc_until` should try to release memory until the predicate
+            becomes true, or until it decides to give up. The final value
+            of the predicate should be returned.
+            `gc_until` is typically provided by the C Data Interface importer.
+
+        Returns
+        -------
+        success : bool
+            Whether memory allocation state finally reached its previously
+            recorded value.
+        """
+        raise NotImplementedError
+
+
+class CDataImporter(ABC):
+
+    @abstractmethod
+    def import_schema_and_compare_to_json(self, json_path: os.PathLike,
+                                          c_schema_ptr: object):
+        """
+        Import schema and compare it to the schema of a JSON integration file.
+
+        An error is raised if importing fails or the schemas differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to import from.
+        """
+
+    @abstractmethod
+    def import_batch_and_compare_to_json(self, json_path: os.PathLike,
+                                         num_batch: int,
+                                         c_array_ptr: object):
+        """
+        Import record batch and compare it to one of the batches
+        from a JSON integration file.
+
+        The schema used for importing the record batch is the one from
+        the JSON file.
+
+        An error is raised if importing fails or the batches differ.
+
+        Parameters
+        ----------
+        json_path : Path
+            The path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_array_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to import from.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means calling the `release` callback of
+        a C Data Interface export (which should then trigger a deallocation
+        mechanism on the exporter).
+
+        If false, then `gc_until` is allowed to raise NotImplementedError.
+        """
+
+    def gc_until(self, predicate: _Predicate):

Review Comment:
   Should this have a default implementation of `raise NotImplementedError`?



##########
dev/archery/archery/integration/runner.py:
##########
@@ -380,28 +399,148 @@ def _run_flight_test_case(self,
                 outcome.failure = Failure(test_case, producer, consumer,
                                           sys.exc_info())
 
+        log('=' * 70)
+
+        return outcome
+
+    def _compare_c_data_implementations(
+        self,
+        producer: Tester,
+        consumer: Tester
+    ):
+        log('##########################################################')
+        log(f'C Data Interface: '
+            f'{producer.name} exporting, {consumer.name} importing')
+        log('##########################################################')
+
+        # Serial execution is required for proper memory accounting
+        serial = True
+
+        exporter = producer.make_c_data_exporter()
+        importer = consumer.make_c_data_importer()
+
+        case_runner = partial(self._run_c_schema_test_case, producer, consumer,
+                              exporter, importer)
+        self._run_test_cases(case_runner, self.json_files, serial=serial)
+
+        case_runner = partial(self._run_c_array_test_cases, producer, consumer,
+                              exporter, importer)
+        self._run_test_cases(case_runner, self.json_files, serial=serial)
+
+    def _run_c_schema_test_case(self,
+                                producer: Tester, consumer: Tester,
+                                exporter: CDataExporter,
+                                importer: CDataImporter,
+                                test_case: datagen.File) -> Outcome:
+        """
+        Run one C ArrowSchema test case.
+        """
+        outcome = Outcome()
+
+        def do_run():
+            json_path = test_case.path
+            ffi = cdata.ffi()
+            c_schema_ptr = ffi.new("struct ArrowSchema*")
+            with cdata.check_memory_released(exporter, importer):
+                exporter.export_schema_from_json(json_path, c_schema_ptr)
+                importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
+
+        log('=' * 70)
+        log(f'Testing C ArrowSchema from file {test_case.name!r}')
+
+        if test_case.should_skip(producer.name, SKIP_C_SCHEMA):
+            log(f'-- Skipping test because producer {producer.name} does '
+                f'not support C ArrowSchema')
+            outcome.skipped = True
+
+        elif test_case.should_skip(consumer.name, SKIP_C_SCHEMA):
+            log(f'-- Skipping test because consumer {consumer.name} does '
+                f'not support C ArrowSchema')
+            outcome.skipped = True
+
+        else:
+            try:
+                do_run()
+            except Exception:
+                traceback.print_exc(file=printer.stdout)
+                outcome.failure = Failure(test_case, producer, consumer,
+                                          sys.exc_info())
+
+        log('=' * 70)
+
+        return outcome
+
+    def _run_c_array_test_cases(self,
+                                producer: Tester, consumer: Tester,
+                                exporter: CDataExporter,
+                                importer: CDataImporter,
+                                test_case: datagen.File) -> Outcome:
+        """
+        Run one set C ArrowArray test cases.
+        """
+        outcome = Outcome()
+
+        def do_run():
+            json_path = test_case.path
+            ffi = cdata.ffi()
+            c_array_ptr = ffi.new("struct ArrowArray*")
+            for num_batch in range(test_case.num_batches):
+                log(f'... with record batch #{num_batch}')
+                with cdata.check_memory_released(exporter, importer):
+                    exporter.export_batch_from_json(json_path,
+                                                    num_batch,
+                                                    c_array_ptr)
+                    importer.import_batch_and_compare_to_json(json_path,
+                                                              num_batch,
+                                                              c_array_ptr)
+
+        log('=' * 70)
+        log(f'Testing C ArrowArray '
+            f'from file {test_case.name!r}')
+
+        if test_case.should_skip(producer.name, SKIP_C_ARRAY):
+            log(f'-- Skipping test because producer {producer.name} does '
+                f'not support C ArrowSchema')
+            outcome.skipped = True
+
+        elif test_case.should_skip(consumer.name, SKIP_C_ARRAY):
+            log(f'-- Skipping test because consumer {consumer.name} does '
+                f'not support C ArrowSchema')

Review Comment:
   ```suggestion
                   f'not support C ArrowArray')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] tustvold commented on pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #37769:
URL: https://github.com/apache/arrow/pull/37769#issuecomment-1723770064

   Thank you for the heads up - created a tracking issue so this doesn't get missed - https://github.com/apache/arrow-rs/issues/4828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on PR #37769:
URL: https://github.com/apache/arrow/pull/37769#issuecomment-1725782617

   I'll merge this PR now, thanks for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330250291


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   I'll go with the current version. This can evolve quite easily as this is internal tooling, so no API guarantees.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "bkietz (via GitHub)" <gi...@apache.org>.

bkietz commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1330099588


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   I'm probably missing something, but it seems like this could be done more simply with something like:
   ```python
   @contextmanager
   def check_memory_released(exporter: CDataExporter, importer: CDataImporter,
                             gc_timeout: Callable[[], bool] = _default_timeout):
       do_check = (exporter.supports_releasing_memory and
                   importer.supports_releasing_memory)
       if not do_check:
           yield; return
       before = exporter.record_allocation_state()
       yield
       while exporter.record_allocation_state() != before:
           if gc_timeout():
               raise ...
           importer.gc_once()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou merged pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou merged PR #37769:
URL: https://github.com/apache/arrow/pull/37769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org