You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "bkietz (via GitHub)" <gi...@apache.org> on 2023/09/18 17:42:06 UTC

[GitHub] [arrow] bkietz commented on a diff in pull request #37769: GH-37537: [Integration][C++] Add C Data Interface integration testing

bkietz commented on code in PR #37769:
URL: https://github.com/apache/arrow/pull/37769#discussion_r1329064897


##########
dev/archery/archery/integration/tester.py:
##########
@@ -17,12 +17,180 @@
 
 # Base class for language-specific integration test harnesses
 
+from abc import ABC, abstractmethod
+import os
 import subprocess
+import typing
 
 from .util import log
 
 
-class Tester(object):
+_Predicate = typing.Callable[[], bool]
+
+
+class CDataExporter(ABC):
+
+    @abstractmethod
+    def export_schema_from_json(self, json_path: os.PathLike,
+                                c_schema_ptr: object):
+        """
+        Read a JSON integration file and export its schema.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowSchema`` struct to export to.
+        """
+
+    @abstractmethod
+    def export_batch_from_json(self, json_path: os.PathLike,
+                               num_batch: int,
+                               c_array_ptr: object):
+        """
+        Read a JSON integration file and export one of its batches.
+
+        Parameters
+        ----------
+        json_path : Path
+            Path to the JSON file
+        num_batch : int
+            Number of the record batch in the JSON file
+        c_schema_ptr : cffi pointer value
+            Pointer to the ``ArrowArray`` struct to export to.
+        """
+
+    @property
+    @abstractmethod
+    def supports_releasing_memory(self) -> bool:
+        """
+        Whether the implementation is able to release memory deterministically.
+
+        Here, "release memory" means that, after the `release` callback of
+        a C Data Interface export is called, `compare_allocation_state` is
+        able to trigger the deallocation of the memory underlying the export
+        (for example buffer data).
+
+        If false, then `record_allocation_state` and `compare_allocation_state`
+        are allowed to raise NotImplementedError.
+        """
+
+    def record_allocation_state(self) -> object:
+        """
+        Record the current memory allocation state.
+
+        Returns
+        -------
+        state : object
+            Opaque object representing the allocation state,
+            for example the number of allocated bytes.
+        """
+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]

Review Comment:
   I find this parameter's type quite confusing. IIUC it's a cancel token to cut off long-running gc? I'm not sure why this control is given to an exporter or importer. Instead, I would think it would make more sense for the runner `check_memory_released` to construct this token. It can be `Callable[[], bool]` and returns true if GC is taking too long and the runner is aborting the current case



##########
dev/archery/archery/integration/tester_cpp.py:
##########
@@ -133,3 +142,104 @@ def flight_request(self, port, json_path=None, scenario_name=None):
         if self.debug:
             log(' '.join(cmd))
         run_cmd(cmd)
+
+    def make_c_data_exporter(self):
+        return CppCDataExporter(self.debug, self.args)
+
+    def make_c_data_importer(self):
+        return CppCDataImporter(self.debug, self.args)
+
+
+_cpp_c_data_entrypoints = """
+    const char* ArrowCpp_CDataIntegration_ExportSchemaFromJson(
+        const char* json_path, struct ArrowSchema* out);
+    const char* ArrowCpp_CDataIntegration_ImportSchemaAndCompareToJson(
+        const char* json_path, struct ArrowSchema* schema);
+
+    const char* ArrowCpp_CDataIntegration_ExportBatchFromJson(
+        const char* json_path, int num_batch, struct ArrowArray* out);
+    const char* ArrowCpp_CDataIntegration_ImportBatchAndCompareToJson(
+        const char* json_path, int num_batch, struct ArrowArray* batch);
+
+    int64_t ArrowCpp_BytesAllocated();
+    """
+
+
+@functools.lru_cache
+def _load_ffi(ffi, lib_path=_ARROW_DLL):
+    ffi.cdef(_cpp_c_data_entrypoints)
+    dll = ffi.dlopen(lib_path)
+    dll.ArrowCpp_CDataIntegration_ExportSchemaFromJson
+    return dll
+
+
+class _CDataBase:

Review Comment:
   ```suggestion
   class _CDataBase:
       _dll = None
       @property
       def dll(self):
           if _CDataBase._dll is None:
               ...
               _CDataBase._dll = dll
           return _CDataBase._dll
   ```
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org