You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2024/03/22 15:36:30 UTC

[PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

paleolimbot opened a new pull request, #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409#discussion_r1550782191


##########
python/tests/test_device.py:
##########
@@ -31,12 +30,51 @@ def test_cpu_device():
     cpu = device.resolve(1, 0)
     assert cpu.device_type == 1
 
-    pa_array = pa.array([1, 2, 3])
 
-    darray = device.c_device_array(pa_array)
+def test_c_device_array():
+    # Unrecognized arguments should be passed to c_array() to generate CPU array
+    darray = device.c_device_array([1, 2, 3], na.int32())
+
     assert darray.device_type == 1
     assert darray.device_id == 0
-    assert darray.array.length == 3
     assert "device_type: 1" in repr(darray)

Review Comment:
   This was a bit of a rabbit hole but a very good rabbit hole. There's no enum, but there is an ABI-stable set of defines that I turned into one and the result is much better!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409#discussion_r1550781472


##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -1038,6 +1064,12 @@ cdef class CArray:
         self._base = base
         self._ptr = <ArrowArray*>addr
         self._schema = schema
+        self._device_type = ARROW_DEVICE_CPU
+        self._device_id = 0
+
+    cdef _set_device(self, ArrowDeviceType device_type, int64_t device_id):
+        self._device_type = device_type
+        self._device_id = device_id

Review Comment:
   I would like to move away from *any* arguments in `__cinit__` in most cases because a user could theoretically call `nanoarrow.CArray(...)` and get very strange errors. They should really all be `ClassName._construct()` or something (but maybe in a future PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409#discussion_r1550030752


##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -1115,6 +1149,11 @@ cdef class CArray:
         """
         self._assert_valid()
 
+        if self._device_type != ARROW_DEVICE_CPU:
+            raise ValueError(
+                "Can't invoke __arrow_c_aray__ on non-CPU array "

Review Comment:
   ```suggestion
                   "Can't invoke __arrow_c_array__ on non-CPU array "
   ```



##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -2340,7 +2404,53 @@ cdef class CDeviceArray:
 
     @property
     def array(self):
-        return CArray(self, <uintptr_t>&self._ptr.array, self._schema)
+        # TODO: We loose access to the sync_event here, so we probably need to
+        # synchronize (or propatate it, or somehow prevent data access downstream)
+        cdef CArray array = CArray(self, <uintptr_t>&self._ptr.array, self._schema)
+        array._set_device(self._ptr.device_type, self._ptr.device_id)
+        return array
+
+    def view(self):
+        return self.array.view()
+
+    def __arrow_c_array__(self, requested_schema=None):
+        return self.array.__arrow_c_array__(requested_schema=requested_schema)
+
+    def __arrow_c_device_array__(self, requested_schema=None):
+        if requested_schema is not None:
+            raise NotImplementedError("requested_schema")
+
+        # TODO: evaluate whether we need to synchronize here or whether we should
+        # move device arrays instead of shallow-copying them
+        device_array_capsule = alloc_c_device_array_shallow_copy(self._base, self._ptr)
+        return self._schema.__arrow_c_schema__(), device_array_capsule
+
+    @staticmethod
+    def _import_from_c_capsule(schema_capsule, device_array_capsule):
+        """
+        Import from a ArrowSchema and ArrowArray PyCapsule tuple.

Review Comment:
   ```suggestion
           Import from an ArrowSchema and ArrowArray PyCapsule tuple.
   ```



##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -2340,7 +2404,53 @@ cdef class CDeviceArray:
 
     @property
     def array(self):
-        return CArray(self, <uintptr_t>&self._ptr.array, self._schema)
+        # TODO: We loose access to the sync_event here, so we probably need to

Review Comment:
   ```suggestion
           # TODO: We lose access to the sync_event here, so we probably need to
   ```



##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -2340,7 +2404,53 @@ cdef class CDeviceArray:
 
     @property
     def array(self):
-        return CArray(self, <uintptr_t>&self._ptr.array, self._schema)
+        # TODO: We loose access to the sync_event here, so we probably need to
+        # synchronize (or propatate it, or somehow prevent data access downstream)

Review Comment:
   ```suggestion
           # synchronize (or propagate it, or somehow prevent data access downstream)
   ```



##########
python/tests/test_device.py:
##########
@@ -31,12 +30,51 @@ def test_cpu_device():
     cpu = device.resolve(1, 0)
     assert cpu.device_type == 1
 
-    pa_array = pa.array([1, 2, 3])
 
-    darray = device.c_device_array(pa_array)
+def test_c_device_array():
+    # Unrecognized arguments should be passed to c_array() to generate CPU array
+    darray = device.c_device_array([1, 2, 3], na.int32())
+
     assert darray.device_type == 1
     assert darray.device_id == 0

Review Comment:
   Out of curiosity, are there enums that can be used here instead of values 1, 0? Or do we need to wait for DLPack support.



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -427,7 +427,7 @@ def c_array_view(obj, schema=None) -> CArrayView:
     if isinstance(obj, CArrayView) and schema is None:
         return obj
 
-    return CArrayView.from_array(c_array(obj, schema))
+    return c_array(obj, schema).view()

Review Comment:
   Very cool!



##########
python/src/nanoarrow/_lib.pyx:
##########
@@ -1038,6 +1064,12 @@ cdef class CArray:
         self._base = base
         self._ptr = <ArrowArray*>addr
         self._schema = schema
+        self._device_type = ARROW_DEVICE_CPU
+        self._device_id = 0
+
+    cdef _set_device(self, ArrowDeviceType device_type, int64_t device_id):
+        self._device_type = device_type
+        self._device_id = device_id

Review Comment:
   This function is called frequently after initialization. Is it worth allowing `__cinit__` to set device type/id?



##########
python/tests/test_device.py:
##########
@@ -31,12 +30,51 @@ def test_cpu_device():
     cpu = device.resolve(1, 0)
     assert cpu.device_type == 1
 
-    pa_array = pa.array([1, 2, 3])
 
-    darray = device.c_device_array(pa_array)
+def test_c_device_array():
+    # Unrecognized arguments should be passed to c_array() to generate CPU array
+    darray = device.c_device_array([1, 2, 3], na.int32())
+
     assert darray.device_type == 1
     assert darray.device_id == 0
-    assert darray.array.length == 3
     assert "device_type: 1" in repr(darray)

Review Comment:
   If there are enums, it would be nice to print the name (e.g. `device_type: 1 (CPU)`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray [arrow-nanoarrow]

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #409:
URL: https://github.com/apache/arrow-nanoarrow/pull/409#issuecomment-2036079003

   Apologies for the two additional changes, but:
   
   - After adding the `DeviceType`, it was pretty clear that the `CDevice` was never going to get a `Device` wrapper in not-Cython. Interacting with the device is something that mostly happens in the classes that live in Cython, and the only thing that anybody needs to know about it otherwise is to print it or know if it's the CPU or not. For the other classes there's some payoff to wrapping them in Python (better IDE documentation + completion, iteration time), but the `Device` doesn't benefit from any of those.
   - The discussion in https://github.com/apache/arrow/issues/40801 suggested that the device_id for the CPU device should be -1 instead of 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org