You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/05/17 03:12:17 UTC

[GitHub] [arrow-rs] wjones127 opened a new pull request, #4232: feat(api!): make ArrowArrayStreamReader Send

wjones127 opened a new pull request, #4232:
URL: https://github.com/apache/arrow-rs/pull/4232

   # Which issue does this PR close?
   
   Closes #4222
   
   # Rationale for this change
    
   The underlying implementation is arguably Send, and making it not Send makes it extremely hard to work with. This drops support for Clone, but if someone wants to sacrifice Send for Clone, they can wrap it in an Arc themselves.
   
   # What changes are included in this PR?
   
   * Alters implementation of ArrowArrayStreamReader to use Box instead of Arc internally.
   * Refactors PyArrowConvert convert into a few different traits, so that we can differentiate between conversions that borrow vs those that take ownership.
   
   # Are there any user-facing changes?
   
   This is a breaking change to `PyArrowConvert`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#issuecomment-1560978249

   The integration test failure looks legitimate and would suggest something is off with this PR somewhere...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1200161791


##########
arrow/src/ffi_stream.rs:
##########
@@ -261,24 +256,21 @@ fn get_error_code(err: &ArrowError) -> i32 {
 /// Struct used to fetch `RecordBatch` from the C Stream Interface.
 /// Its main responsibility is to expose `RecordBatchReader` functionality
 /// that requires [FFI_ArrowArrayStream].
-#[derive(Debug, Clone)]
+#[derive(Debug)]
 pub struct ArrowArrayStreamReader {
-    stream: Arc<FFI_ArrowArrayStream>,
+    stream: Box<FFI_ArrowArrayStream>,

Review Comment:
   Do we even need a box here?



##########
arrow/src/ffi_stream.rs:
##########
@@ -37,25 +37,19 @@
 //! let reader = Box::new(FileReader::try_new(file).unwrap());
 //!
 //! // export it
-//! let stream = Box::new(FFI_ArrowArrayStream::empty());
-//! let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
-//! unsafe { export_reader_into_raw(reader, stream_ptr) };
+//! let mut stream = FFI_ArrowArrayStream::empty();
+//! unsafe { export_reader_into_raw(reader, &mut stream) };
 //!
 //! // consumed and used by something else...
 //!
 //! // import it
-//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(stream_ptr).unwrap() };
+//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(&mut stream).unwrap() };

Review Comment:
   ```suggestion
   //! let stream_reader = ArrowArrayStreamReader::try_new(stream).unwrap();
   ```
   
   I think we could deprecate from_raw



##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };

Review Comment:
   I see why IntoPyArrow is needed, this needs to "consume" the stream



##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());

Review Comment:
   This boxing is unnecessary, we could take the opportunity to clean it up



##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };
+        unsafe { export_reader_into_raw(Box::new(self), stream_ptr) };
 
         let module = py.import("pyarrow")?;
         let class = module.getattr("RecordBatchReader")?;
-        let args = PyTuple::new(py, &[stream_ptr as Py_uintptr_t]);
+        let args = PyTuple::new(py, [stream_ptr as Py_uintptr_t]);
         let reader = class.call_method1("_import_from_c", args)?;
+
         Ok(PyObject::from(reader))
     }
 }
 
 /// A newtype wrapper around a `T: PyArrowConvert` that implements
 /// [`FromPyObject`] and [`IntoPy`] allowing usage with pyo3 macros
 #[derive(Debug)]
-pub struct PyArrowType<T: PyArrowConvert>(pub T);
+pub struct PyArrowType<T: FromPyArrow + IntoPyArrow>(pub T);
 
-impl<'source, T: PyArrowConvert> FromPyObject<'source> for PyArrowType<T> {
+impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> {
     fn extract(value: &'source PyAny) -> PyResult<Self> {
         Ok(Self(T::from_pyarrow(value)?))
     }
 }
 
-impl<'a, T: PyArrowConvert> IntoPy<PyObject> for PyArrowType<T> {
+impl<T: FromPyArrow + IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> {
     fn into_py(self, py: Python) -> PyObject {
-        self.0.to_pyarrow(py).unwrap()
+        self.0.into_pyarrow(py).unwrap()
     }
 }
 
-impl<T: PyArrowConvert> From<T> for PyArrowType<T> {
+impl<T: FromPyArrow + IntoPyArrow> From<T> for PyArrowType<T> {

Review Comment:
   ```suggestion
   impl<T> From<T> for PyArrowType<T> {
   ```



##########
arrow/src/pyarrow.rs:
##########
@@ -203,7 +232,7 @@ impl PyArrowConvert for RecordBatch {
     }
 }
 
-impl PyArrowConvert for ArrowArrayStreamReader {
+impl FromPyArrow for ArrowArrayStreamReader {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
         // prepare a pointer to receive the stream struct
         let stream = Box::new(FFI_ArrowArrayStream::empty());

Review Comment:
   I think we could also take the opportunity to remove the unnecessary box here



##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };
+        unsafe { export_reader_into_raw(Box::new(self), stream_ptr) };
 
         let module = py.import("pyarrow")?;
         let class = module.getattr("RecordBatchReader")?;
-        let args = PyTuple::new(py, &[stream_ptr as Py_uintptr_t]);
+        let args = PyTuple::new(py, [stream_ptr as Py_uintptr_t]);
         let reader = class.call_method1("_import_from_c", args)?;
+
         Ok(PyObject::from(reader))
     }
 }
 
 /// A newtype wrapper around a `T: PyArrowConvert` that implements
 /// [`FromPyObject`] and [`IntoPy`] allowing usage with pyo3 macros
 #[derive(Debug)]
-pub struct PyArrowType<T: PyArrowConvert>(pub T);
+pub struct PyArrowType<T: FromPyArrow + IntoPyArrow>(pub T);
 
-impl<'source, T: PyArrowConvert> FromPyObject<'source> for PyArrowType<T> {
+impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> {
     fn extract(value: &'source PyAny) -> PyResult<Self> {
         Ok(Self(T::from_pyarrow(value)?))
     }
 }
 
-impl<'a, T: PyArrowConvert> IntoPy<PyObject> for PyArrowType<T> {
+impl<T: FromPyArrow + IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> {

Review Comment:
   ```suggestion
   impl<T: IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> {
   ```



##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };
+        unsafe { export_reader_into_raw(Box::new(self), stream_ptr) };
 
         let module = py.import("pyarrow")?;
         let class = module.getattr("RecordBatchReader")?;
-        let args = PyTuple::new(py, &[stream_ptr as Py_uintptr_t]);
+        let args = PyTuple::new(py, [stream_ptr as Py_uintptr_t]);
         let reader = class.call_method1("_import_from_c", args)?;
+
         Ok(PyObject::from(reader))
     }
 }
 
 /// A newtype wrapper around a `T: PyArrowConvert` that implements
 /// [`FromPyObject`] and [`IntoPy`] allowing usage with pyo3 macros
 #[derive(Debug)]
-pub struct PyArrowType<T: PyArrowConvert>(pub T);
+pub struct PyArrowType<T: FromPyArrow + IntoPyArrow>(pub T);
 
-impl<'source, T: PyArrowConvert> FromPyObject<'source> for PyArrowType<T> {
+impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> {

Review Comment:
   ```suggestion
   impl<'source, T: FromPyArrow> FromPyObject<'source> for PyArrowType<T> {
   ```



##########
arrow/src/ffi_stream.rs:
##########
@@ -512,10 +487,9 @@ mod tests {
         let reader = TestRecordBatchReader::new(schema.clone(), iter);
 
         // Import through `FFI_ArrowArrayStream` as `ArrowArrayStreamReader`
-        let stream = Arc::new(FFI_ArrowArrayStream::new(reader));
-        let stream_ptr = Arc::into_raw(stream) as *mut FFI_ArrowArrayStream;
+        let mut stream = FFI_ArrowArrayStream::new(reader);
         let stream_reader =
-            unsafe { ArrowArrayStreamReader::from_raw(stream_ptr).unwrap() };
+            unsafe { ArrowArrayStreamReader::from_raw(&mut stream).unwrap() };

Review Comment:
   ```suggestion
               ArrowArrayStreamReader::try_new(stream).unwrap();
   ```



##########
arrow/src/ffi_stream.rs:
##########
@@ -231,8 +227,7 @@ impl ExportedArrayStream {
                     let struct_array = StructArray::from(batch);
                     let array = FFI_ArrowArray::new(&struct_array.to_data());
 
-                    unsafe { std::ptr::copy(addr_of!(array), out, 1) };
-                    std::mem::forget(array);
+                    unsafe { std::ptr::write_unaligned(out, array) };

Review Comment:
   I think this will cause the array to be released when `array` goes out of scope, i.e. immediately. I think eventually this will result in a double-free



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1201317777


##########
arrow/src/ffi_stream.rs:
##########
@@ -231,8 +227,7 @@ impl ExportedArrayStream {
                     let struct_array = StructArray::from(batch);
                     let array = FFI_ArrowArray::new(&struct_array.to_data());
 
-                    unsafe { std::ptr::copy(addr_of!(array), out, 1) };
-                    std::mem::forget(array);
+                    unsafe { std::ptr::write_unaligned(out, array) };

Review Comment:
   This is a good question. However, the docs for `write_unaligned` say that `src` is guaranteed not to be dropped:
   
   https://doc.rust-lang.org/std/ptr/fn.write_unaligned.html
   
   Internally, it's calling `intrinsics::forget()`. So I think this is sound.
   https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1447



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1201346721


##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };
+        unsafe { export_reader_into_raw(Box::new(self), stream_ptr) };
 
         let module = py.import("pyarrow")?;
         let class = module.getattr("RecordBatchReader")?;
-        let args = PyTuple::new(py, &[stream_ptr as Py_uintptr_t]);
+        let args = PyTuple::new(py, [stream_ptr as Py_uintptr_t]);
         let reader = class.call_method1("_import_from_c", args)?;
+
         Ok(PyObject::from(reader))
     }
 }
 
 /// A newtype wrapper around a `T: PyArrowConvert` that implements
 /// [`FromPyObject`] and [`IntoPy`] allowing usage with pyo3 macros
 #[derive(Debug)]
-pub struct PyArrowType<T: PyArrowConvert>(pub T);
+pub struct PyArrowType<T: FromPyArrow + IntoPyArrow>(pub T);
 
-impl<'source, T: PyArrowConvert> FromPyObject<'source> for PyArrowType<T> {
+impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> {

Review Comment:
   The reason both traits are there is because `PyArrowType<T>` requires. If I remove one we get
   
   ```
   the trait bound `T: FromPyArrow` is not satisfied
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1201344255


##########
arrow/src/ffi_stream.rs:
##########
@@ -37,25 +37,19 @@
 //! let reader = Box::new(FileReader::try_new(file).unwrap());
 //!
 //! // export it
-//! let stream = Box::new(FFI_ArrowArrayStream::empty());
-//! let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
-//! unsafe { export_reader_into_raw(reader, stream_ptr) };
+//! let mut stream = FFI_ArrowArrayStream::empty();
+//! unsafe { export_reader_into_raw(reader, &mut stream) };
 //!
 //! // consumed and used by something else...
 //!
 //! // import it
-//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(stream_ptr).unwrap() };
+//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(&mut stream).unwrap() };

Review Comment:
   I think it's still relevant. For example, there are cases where we need to implement a C API where we will be passed a pointer to an ArrowArrayStreamReader. For example, this is done in the "bind stream" function in ADBC:
   
   https://github.com/apache/arrow-adbc/blob/75ba2cef9080b19f4d242d0a03c17db304609d85/rust/src/implement/internal.rs#L647-L664



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#issuecomment-1556652400

   I'm not sure about the changes to the python interface, we've run into challenges in the past because python can't transfer ownership. I'll have a play later today and see if we can't make it Send without requiring a breaking change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1201884244


##########
arrow/src/ffi_stream.rs:
##########
@@ -231,8 +227,7 @@ impl ExportedArrayStream {
                     let struct_array = StructArray::from(batch);
                     let array = FFI_ArrowArray::new(&struct_array.to_data());
 
-                    unsafe { std::ptr::copy(addr_of!(array), out, 1) };
-                    std::mem::forget(array);
+                    unsafe { std::ptr::write_unaligned(out, array) };

Review Comment:
   TIL, I had no idea that was the semantic - makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#discussion_r1201344679


##########
arrow/src/pyarrow.rs:
##########
@@ -224,39 +253,42 @@ impl PyArrowConvert for ArrowArrayStreamReader {
 
         Ok(stream_reader)
     }
+}
 
-    fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
+impl IntoPyArrow for ArrowArrayStreamReader {
+    fn into_pyarrow(self, py: Python) -> PyResult<PyObject> {
         let stream = Box::new(FFI_ArrowArrayStream::empty());
         let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream;
 
-        unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) };

Review Comment:
   Exactly. It's a little messy, but I did want to convey we are moving ownership, while all the other conversion are either creating entirely new objects (data types) or sharing ownership (arrays).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232#issuecomment-1561315706

   Yes, thanks for the reminder. I'll look into this soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4232: feat(api!): make ArrowArrayStreamReader Send

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4232:
URL: https://github.com/apache/arrow-rs/pull/4232


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org