You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kkraus14 (via GitHub)" <gi...@apache.org> on 2023/04/10 15:49:52 UTC

[GitHub] [arrow] kkraus14 commented on a diff in pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

kkraus14 commented on code in PR #34972:
URL: https://github.com/apache/arrow/pull/34972#discussion_r1161837328


##########
cpp/src/arrow/c/abi.h:
##########
@@ -106,6 +169,77 @@ struct ArrowArrayStream {
 
 #endif  // ARROW_C_STREAM_INTERFACE
 
+#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
+#define ARROW_C_DEVICE_STREAM_INTERFACE
+
+struct ArrowDeviceArrayStream {
+  // The device that this stream produces data on.
+  // All ArrowDeviceArrays that are produced by this
+  // stream should have the same device_type as set
+  // here. The device_type needs to be provided here
+  // so that consumers can provide the correct type
+  // of stream_ptr when calling get_next.
+  ArrowDeviceType device_type;
+
+  // Callback to get the stream schema
+  // (will be the same for all arrays in the stream).
+  //
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  //
+  // If successful, the ArrowSchema must be released independently from the stream.
+  int (*get_schema)(struct ArrowDeviceArrayStream*, struct ArrowSchema* out);
+
+  // Callback to get the device id for the next array.
+  // This is necessary so that the proper/correct stream pointer can be provided
+  // to get_next. The parameter provided must not be null.
+  //
+  // Return value: 0 if successful, an `errno`-compatible error code otherwise.
+  //
+  // The next call to `get_next` should provide an ArrowDeviceArray whose
+  // device_id matches what is provided here, and whose device_type is the
+  // same as the device_type member of this stream.
+  int (*get_next_device_id)(struct ArrowDeviceArrayStream*, int* out_device_id);
+
+  // Callback to get the next array
+  // (if no error and the array is released, the stream has ended)
+  //
+  // the provided stream_ptr should be the appropriate stream, or
+  // equivalent object, for the device that the data is allocated on
+  // to indicate where the consumer wants the data to be accessible.

Review Comment:
   I drove a lot of the development on the `__cuda_array_interface__` with @seibert and @sklam and the stream was added later and I haven't seen it used. The challenge is that most frameworks internally manage their streams and don't have the ability to share a stream and control its lifetime nicely.
   
   The idea of a consumer of the interface passing a stream to the producer of the interface is that the producer can guarantee that the data is safe to consume on the passed in stream. A bunch of array library maintainers including folks from Tensorflow, PyTorch, JAX, CuPy, etc. all agreed on these semantics: https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org