You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/23 14:32:54 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #3756: Zero-copy Vec conversion (#3516) (#1176)

tustvold opened a new pull request, #3756:
URL: https://github.com/apache/arrow-rs/pull/3756

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #3516
   Relates to #1176
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   This starts the process of migrating away from custom mutable buffer abstractions, in favour of `Vec`. This will allow eventually deprecating and removing `MutableBuffer` and `BufferBuilder`.
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1115827290


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   Adding Vec conversion at the `Buffer` level does allow for transmuting types provided they have the same layout, this is perhaps not ideal but isn't harmful. 
   
   Long-term I hope to deprecate and remove `Buffer` and `MutableBuffer`, just leaving `ScalarBuffer` which statically prevents this sort of type-conversion, but doing it this way allows for a gradual migration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1115827290


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   Adding Vec conversion at the `Buffer` level does allow for transmuting types provided they have the same layout, this is perhaps not ideal but isn't harmful. Long-term I hope to deprecate and remove `Buffer` and `MutableBuffer`, just leaving `ScalarBuffer` which statically prevents this sort of type-conversion, but this allows for a gradual migration.



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   Adding Vec conversion at the `Buffer` level does allow for transmuting types provided they have the same layout, this is perhaps not ideal but isn't harmful. Long-term I hope to deprecate and remove `Buffer` and `MutableBuffer`, just leaving `ScalarBuffer` which statically prevents this sort of type-conversion, but doing it this way allows for a gradual migration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118883572


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   > unsafe layout related code and do something similar to what we do here
   
   What would be the advantage of this? Does polars depend on the foreign vec interface? It doesn't seem to be doing anything materially different?
   
   Edit: In fact I'm fairly sure this line is technically unsound - https://github.com/DataEngineeringLabs/foreign_vec/blob/0d38968facee8a81748ec380fad78379d806fe1d/src/lib.rs#L49, there are a lot of safety constraints on [from_raw_parts](https://doc.rust-lang.org/beta/std/vec/struct.Vec.html#safety) that appear to be not being checked or documented?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#issuecomment-1449780932

   Benchmark runs are scheduled for baseline = 5edc954a939c129e1386c2ac19add45e6fcdc9cb and contender = d440c244bde4d1e99afcc72ac5b2c049a62f9225. d440c244bde4d1e99afcc72ac5b2c049a62f9225 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e8489f1a4bb04972b9678f535bbf4c2c...62691bd899b04c6db4bb22a31dcecccd/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/d510ddf2302d429c89c775fb246e25d1...aded177f66c24ca1acb4ed5842c4dea6/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/cdd4c890b35c4d4c8c0bb31721fd3529...4eba8cadc7f44cf3be4df0e262ffd23b/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/24c0a0165c3b4b5cbe197a82862a247b...283b0f8cdeec47b0adb281e6520ee011/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1115780349


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {

Review Comment:
   We can't implement `From` as this would conflict with the blanket `From<AsRef<[u8]>>`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#issuecomment-1446614562

   Integration failure is unrelated - https://github.com/apache/arrow/issues/34367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ritchie46 commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "ritchie46 (via GitHub)" <gi...@apache.org>.
ritchie46 commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118869920


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   Do we plan to store/create from layouts other than `Vec`? Otherwise we can just create from `Vec`, forget the owned Vector.
   
   On `drop` we recreate the `Vec` so that the drop sequence is executed.
   
   This would defer all this logic to the implementation in `std`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#issuecomment-1446368365

   It might be worth it to have @viirya  give this PR a once over before merge. 
   
   Not sure if @ritchie46  or @jorgecarleitao  are interested or want to comment either. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118871560


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   > Do we plan to store/create from layouts other than Vec? Otherwise we can just create from Vec, forget the owned Vector.
   
   Eventually we may deprecate and remove support for other layouts, but at least for a period we need to support aligned layouts such as those created by MutableBuffer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118804193


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 1);
+        assert_eq!(back.capacity(), 2);
+
+        // Cannot use aligned allocation
+        let b = Buffer::from(MutableBuffer::new(10));
+        let b = b.into_vec::<u8>().unwrap_err();
+        b.into_vec::<u64>().unwrap_err();
+
+        // Test slicing
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 64);
+
+        // Shared reference fails
+        let slice = slice.into_vec::<i128>().unwrap_err();
+        drop(b);
+
+        // Succeeds as no outstanding shared reference
+        let back = slice.into_vec::<i128>().unwrap();
+        assert_eq!(&back, &[1, 4, 7, 8]);
+        assert_eq!(back.capacity(), 20);
+
+        // Slicing by non-multiple length truncates
+        let mut a: Vec<i128> = Vec::with_capacity(8);
+        a.extend_from_slice(&[1, 4, 7, 3]);
+
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 34);
+        drop(b);
+
+        let back = slice.into_vec::<i128>().unwrap();

Review Comment:
   Yeah `slice(2)` covers this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118793512


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   The docs for [`dealloc`](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html#safety-2) are authoritative here
   
   > ptr must denote a block of memory currently allocated via this allocator
   > layout must be the same layout that was used to allocate that block of memory
   
   The layout of arrays, vecs, etc... is fixed and defined by [`Layout::array`](https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.array). As such we are fine provided we obey the same logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118800573


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };
+        let len = vec.len() * std::mem::size_of::<T>();
+        // Safety
+        // Layout guaranteed to be valid
+        let layout = unsafe { Layout::array::<T>(vec.capacity()).unwrap_unchecked() };

Review Comment:
   > Is there a reason to use unwrap_unchecked here
   
   This is a clone of [`Vec::current_memory`](https://github.com/rust-lang/rust/blob/ccce98535bd51687554be8a209b3a529ef84f7e1/library/alloc/src/raw_vec.rs#L246), I will add a link.
   
   The TLDR is that if the Vec is valid, it must have a valid layout (otherwise among other issues `Vec` wouldn't be able to deallocate itself).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118883572


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   > unsafe layout related code and do something similar to what we do here
   
   What would be the advantage of this? Does polars depend on the foreign vec interface?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118805416


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();

Review Comment:
   A capacity of 5 is tested above and results in an error, as the layout of the vec is invalid



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();

Review Comment:
   A capacity of 5 is tested above and results in an error, as the layout of the underlying allocation is invalid



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118871560


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   > Do we plan to store/create from layouts other than Vec? Otherwise we can just create from Vec, forget the owned Vector.
   
   Eventually we may deprecate and remove support for other layouts, but at least for a period we need to support aligned layouts such as those created by MutableBuffer so that we can avoid stop-the-world changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118756567


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   It seems like `into_raw_parts` would be ideal here, but sadly it appears to not yet be stabalized
   
   https://doc.rust-lang.org/beta/std/vec/struct.Vec.html#method.into_raw_parts
   
   I note that the docs say 
   
   > After calling this function, the caller is responsible for the memory previously managed by the Vec. The only way to do this is to convert the raw pointer, length, and capacity back into a Vec with the [from_raw_parts](https://github.com/apache/arrow-rs/pull/3756/struct.Vec.html#method.from_raw_parts) function, allowing the destructor to perform the cleanup.
   
   However, this code uses `Deallocation::Standard(layout)` to deallocate memory, which seems ok, I just wanted to point out it doesn't match the docs (though maybe this is fine)



##########
arrow-buffer/src/alloc/mod.rs:
##########
@@ -132,9 +137,8 @@ impl<T: RefUnwindSafe + Send + Sync> Allocation for T {}
 
 /// Mode of deallocating memory regions
 pub(crate) enum Deallocation {
-    /// An allocation of the given capacity that needs to be deallocated using arrows's cache aligned allocator.
-    /// See [allocate_aligned] and [free_aligned].
-    Arrow(usize),
+    /// An allocation using [`std::alloc`]
+    Standard(Layout),
     /// An allocation from an external source like the FFI interface or a Rust Vec.

Review Comment:
   I wonder if the comment for `Custom` should be updated to say like "external Rust Vec" or maybe remove the reference to Rust Vec entirely



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();

Review Comment:
   I think it is also worth a test when the capacity doesn't equally divide equally into the transmuted size -- like maybe change this test to have capacity `5` with `i128` and verify the `i256` output Vec only has capacity 2 



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };
+        let len = vec.len() * std::mem::size_of::<T>();
+        // Safety
+        // Layout guaranteed to be valid
+        let layout = unsafe { Layout::array::<T>(vec.capacity()).unwrap_unchecked() };

Review Comment:
   Is there a reason to use unwrap_unchecked here? Maybe it would be better to panic if the layout was messed up somehow? 
   
   Looks like it errors on overflow https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.array
   
   Like maybe the size overflows because someone passed in a giant Vec or something?
   
   🤔 



##########
arrow-buffer/src/buffer/scalar.rs:
##########
@@ -90,6 +90,15 @@ impl<T: ArrowNativeType> From<Buffer> for ScalarBuffer<T> {
     }
 }
 
+impl<T: ArrowNativeType> From<Vec<T>> for ScalarBuffer<T> {

Review Comment:
   😍 



##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -95,12 +102,14 @@ impl MutableBuffer {
 
     /// Allocates a new [MutableBuffer] from given `Bytes`.

Review Comment:
   maybe it is worth updating these docs to explain when an Err is returned



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 1);
+        assert_eq!(back.capacity(), 2);
+
+        // Cannot use aligned allocation
+        let b = Buffer::from(MutableBuffer::new(10));
+        let b = b.into_vec::<u8>().unwrap_err();
+        b.into_vec::<u64>().unwrap_err();
+
+        // Test slicing
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 64);
+
+        // Shared reference fails
+        let slice = slice.into_vec::<i128>().unwrap_err();
+        drop(b);
+
+        // Succeeds as no outstanding shared reference
+        let back = slice.into_vec::<i128>().unwrap();
+        assert_eq!(&back, &[1, 4, 7, 8]);
+        assert_eq!(back.capacity(), 20);
+
+        // Slicing by non-multiple length truncates
+        let mut a: Vec<i128> = Vec::with_capacity(8);
+        a.extend_from_slice(&[1, 4, 7, 3]);
+
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 34);
+        drop(b);
+
+        let back = slice.into_vec::<i128>().unwrap();

Review Comment:
   Is slicing with a non zero offset also covered? Maybe `slice(2)` below covers that 🤔 



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   > this is perhaps not ideal but isn't harmful.
   
   Is it a problem for something like converting integers to (invalid) floats 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118778172


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   Nope, there is a good justification for why this is perfectly fine [here](https://doc.rust-lang.org/std/primitive.f32.html#method.from_bits). The only place where this becomes problematic is where types have bit sequences that illegal, e.g. NonZeroU32 or bool, all `ArrowNativeType` are valid for all bit sequences.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#issuecomment-1448908233

   Unless I hear anything further, I plan to merge this tomorrow morning. Please let me know if you need more time to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118800573


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };
+        let len = vec.len() * std::mem::size_of::<T>();
+        // Safety
+        // Layout guaranteed to be valid
+        let layout = unsafe { Layout::array::<T>(vec.capacity()).unwrap_unchecked() };

Review Comment:
   > Is there a reason to use unwrap_unchecked here
   
   This is a clone of [`Vec::current_memory`](https://github.com/rust-lang/rust/blob/ccce98535bd51687554be8a209b3a529ef84f7e1/library/alloc/src/raw_vec.rs#L246), I will add a link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118800573


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };
+        let len = vec.len() * std::mem::size_of::<T>();
+        // Safety
+        // Layout guaranteed to be valid
+        let layout = unsafe { Layout::array::<T>(vec.capacity()).unwrap_unchecked() };

Review Comment:
   > Is there a reason to use unwrap_unchecked here
   
   This is a clone of `Vec::current_memory`, I will add a link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ritchie46 commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "ritchie46 (via GitHub)" <gi...@apache.org>.
ritchie46 commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118879907


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   Right, maybe we could add a comment that once we remove that support we could remove the unsafe layout related code and do something similar to what we do here: https://github.com/DataEngineeringLabs/foreign_vec/blob/0d38968facee8a81748ec380fad78379d806fe1d/src/lib.rs#L25



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118883572


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   > unsafe layout related code and do something similar to what we do here
   
   What would be the advantage of this? Does polars depend on the foreign vec interface? It doesn't seem to be doing anything materially different?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #3756: Zero-copy Vec conversion (#3516) (#1176)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1115795348


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -269,6 +289,43 @@ impl Buffer {
                 length,
             })
     }
+
+    /// Returns `Vec` for mutating the buffer if this buffer is not offset and was
+    /// allocated with the correct layout for `Vec<T>`

Review Comment:
   I think it would be good to explicitly say here an error is returned if the buffer can't be converted to a Vec (and ideally hint how to get a Vec out of it anyways (perhaps by copying)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org