You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/16 02:09:42 UTC

[GitHub] [arrow-rs] viirya opened a new pull request, #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

viirya opened a new pull request, #2883:
URL: https://github.com/apache/arrow-rs/pull/2883

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2882.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996471929


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();

Review Comment:
   It was actually the length I was concerned about



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279883368

   > As written this PR is unsound, as `PrimitiveArray::values` requires the values pointer to be correctly aligned.
   
   The alignment soundness in `PrimitiveArray::values` is guaranteed by `is_aligned_and_not_null`. This is how it is implemented:
   
   ```
   pub(crate) fn is_aligned_and_not_null<T>(ptr: *const T) -> bool {
       !ptr.is_null() && ptr.addr() % mem::align_of::<T>() == 0
   }
   ```
   
   It only checks if the ptr address is aligned with type `T`. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1280038974

   Thanks @tustvold !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996405766


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();

Review Comment:
   I think this is incorrect for the Decimal256 case? Perhaps we could make this method generic on the native type, to ensure the correct size and alignment is used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279884619

   > And it is obviously not guaranteed to be.
   
   It is supposed to be - https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding by ensuring the buffers are padded correctly, we can ensure that correctly aligning the entire memory allocation is sufficient to align the child allocations. When the writer has not done this correctly, we will have to copy the buffers
   
   > The alignment soundness in PrimitiveArray::values is guaranteed by is_aligned_and_not_null.
   
   Where is this called, I think I am being blind. It looks like this PR removes the alignment check in PrimitiveArray?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279883442

   > The alignment check applies on the entire memory allocation not slices.
   
   The alignment of the entire memory allocation is irrelevant, we only care that that the buffers within it are correctly aligned. My memory is a bit fuzzy, but I seem to remember that the arrow specification goes to great lengths to document how data should be padded to guarantee alignment. This should mean that if the original buffer is aligned, and the padding is correct, data can be zero-copy sliced - otherwise we have to copy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996384572


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,42 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]

Review Comment:
   This is why MIRI isn't failing and the debug assertion isn't firing, the test panics before it does anything interesting... :facepalm: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996397077


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();
+        let slice = &buffer.as_slice()[0..len_in_bytes];

Review Comment:
   We only need copying the range of current buffer (i.e., `length`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya merged pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya merged PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279882095

   I don't think this is correct, we rely on the memory within PrimitiveArray being correctly aligned?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996472593


##########
arrow/src/ipc/reader.rs:
##########
@@ -477,18 +478,23 @@ fn create_primitive_array(
         | Timestamp(_, _)
         | Date64
         | Duration(_)
-        | Interval(IntervalUnit::DayTime)
-        | Interval(IntervalUnit::MonthDayNano) => ArrayData::builder(data_type.clone())
+        | Interval(IntervalUnit::DayTime) => ArrayData::builder(data_type.clone())
             .len(length)
             .add_buffer(buffers[1].clone())
             .null_bit_buffer(null_buffer)
             .build()
             .unwrap(),
-        Decimal128(_, _) | Decimal256(_, _) => {
+        Interval(IntervalUnit::MonthDayNano) | Decimal128(_, _) | Decimal256(_, _) => {
+            let buffer = if matches!(data_type, &DataType::Decimal256(_, _)) {

Review Comment:
   Save a few lines? :) I lifted it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996472083


##########
arrow/src/ipc/reader.rs:
##########
@@ -477,18 +478,23 @@ fn create_primitive_array(
         | Timestamp(_, _)
         | Date64
         | Duration(_)
-        | Interval(IntervalUnit::DayTime)
-        | Interval(IntervalUnit::MonthDayNano) => ArrayData::builder(data_type.clone())
+        | Interval(IntervalUnit::DayTime) => ArrayData::builder(data_type.clone())
             .len(length)
             .add_buffer(buffers[1].clone())
             .null_bit_buffer(null_buffer)
             .build()
             .unwrap(),
-        Decimal128(_, _) | Decimal256(_, _) => {
+        Interval(IntervalUnit::MonthDayNano) | Decimal128(_, _) | Decimal256(_, _) => {
+            let buffer = if matches!(data_type, &DataType::Decimal256(_, _)) {

Review Comment:
   Why not lift into parent match?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996383031


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,42 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+        assert_eq!(array.values(), &[0, 0]);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.values(), &[0]);

Review Comment:
   For https://github.com/apache/arrow-rs/pull/2883/files#r996382807, I think you mean here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279884064

   > The alignment of the entire memory allocation is irrelevant, we only care that that the buffers within it are correctly aligned.
   
   `align_offset` actually checks the entire memory allocation. That said we allocate a memory allocation in IPC reader for all buffers data. Then we slice it for offset/length for each buffers. When we construct PrimitiveArray from one (sliced) buffer, `align_offset` checks if the entire memory allocation aligns with type `T`. And it is obviously not guaranteed to be.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996472323


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +505,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `T`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer<T>(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<T>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<T>();
+        let slice = &buffer.as_slice()[0..len_in_bytes];

Review Comment:
   Minor nit, this will panic for invalid data where previously we would error. Should just be a case of taking the minimum of expected and actual length



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996469970


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();

Review Comment:
   The alignment req for `i256` is also 16. But sounds better to make it generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now unsound, as this violates the safety requirement in PrimitiveArray::values.
   
   I'm not sure why MIRI isn't catching this...
   
   Edit: `array.value` doesn't call `array.values` so this test isn't unsound. If you add a call to `array.values()` in this test, MIRI will fail



##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now UB, as this violates the safety requirement in PrimitiveArray::values.
   
   I'm not sure why MIRI isn't catching this...
   
   Edit: `array.value` doesn't call `array.values` so this test isn't unsound. If you add a call to `array.values()` in this test, MIRI will fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now unsound, as this violates the safety requirement in PrimitiveArray::values.
   
   I'm not sure why MIRI isn't catching this...
   
   Edit: It is `array.values()` that is unsound now, not `array.value`. If you add a call to `array.values()` in this test, MIRI will fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now unsound, as this violates the safety requirement in PrimitiveArray::values



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996473145


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +512,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `T`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer<T>(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<T>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = (length * std::mem::size_of::<T>()).min(buffer.len());

Review Comment:
   Hm? Why? `buffer.len()` is the actual length in bytes, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279885507

   `assert_unsafe_precondition` is only checked in debug builds, although something is off as I would expect this to fire with your test.
   
   My understanding of the test is you are explicitly creating a raw_values pointer that is not aligned to T, which should then trigger the assert and MIRI to fail?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279885113

   > Where is this called, I think I am being blind. It looks like this PR removes the alignment check in PrimitiveArray?
   
   Hmm? No, this PR doesn't remove it.
   
   Let me quote what I saw for now:
   
   ```rust
   pub fn values(&self) -> &[T::Native] {
           // Soundness
           //     raw_values alignment & location is ensured by fn from(ArrayDataRef)
           //     buffer bounds/offset is ensured by the ArrayData instance.
           unsafe {
               std::slice::from_raw_parts(
                   self.raw_values.as_ptr().add(self.data.offset()),
                   self.len(),
               )
           }
       }
   ```
   
   ```rust
   pub const unsafe fn from_raw_parts<'a, T>(data: *const T, len: usize) -> &'a [T] {
       // SAFETY: the caller must uphold the safety contract for `from_raw_parts`.
       unsafe {
           assert_unsafe_precondition!(
               is_aligned_and_not_null(data)
                   && crate::mem::size_of::<T>().saturating_mul(len) <= isize::MAX as usize
           );
           &*ptr::slice_from_raw_parts(data, len)
       }
   }
   ```
   
   ```rust
   pub(crate) fn is_aligned_and_not_null<T>(ptr: *const T) -> bool {
       !ptr.is_null() && ptr.addr() % mem::align_of::<T>() == 0
   }
   ```
   
   BTW, I'm on a M1 Macbook so the toolchain is stable-aarch64-apple-darwin. Maybe you will see something different?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now unsound, as this violates the safety requirement in PrimitiveArray::values.
   
   Hopefully MIRI should fail for this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now unsound, as this violates the safety requirement in PrimitiveArray::values.
   
   I'm not sure why MIRI isn't catching this...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279889842

   > Because the allocation containing all the buffers is aligned to at least 8 bytes, and all the contained buffers are padded to a multiple of 8 bytes in length, each buffer starts and ends at an 8 byte boundary.
   
   Currently padding only guarantees 8 bytes alignment. Any larger alignment requirement can just fail the check.
   
   I'm not sure if the official doc explicitly asks for a 8 bytes alignment. For example,
   
   > Implementations are recommended to allocate memory on aligned addresses (multiple of 8- or 64-bytes) and pad (overallocate) to a length that is a multiple of 8 or 64 bytes.
   
   and for IPC,
   
   > The body, a flat sequence of memory buffers written end-to-end with appropriate padding to ensure a minimum of 8-byte alignment
   
   It sounds like it can be any alignment larger than 8 bytes. Maybe we can change to 16 bytes alignment?
   
   > As mentioned on the ticket, the issue appears to be that arm requires 16-byte alignment for i128 types, which isn't guaranteed by the standard which only mandates padding up to 8 bytes. As such we will need to copy the buffer to a new correctly aligned allocation in such a case. We could/should probably do this in general where the allocation is not sufficiently aligned.
   
   Hmm, for now this sounds like a special case (DecimalArray + arm). I'm okay for the copying approach. I will modify this.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279890454

   > It sounds like it can be any alignment larger than 8 bytes. 
   
   Correct
   
   > Maybe we can change to 16 bytes alignment?
   
   For buffers we allocate we use larger alignments (32 bytes on arm, 128 bytes on x86) - see https://github.com/apache/arrow-rs/blob/master/arrow-buffer/src/alloc/alignment.rs
   
   The issue is whatever wrote the test file in the ticket was only using the minimum 8 byte padding, and so we need to copy in such cases
   
   > Hmm, for now this sounds like a special case (DecimalArray + arm)
   
   I think it will also impact IntervalMonthDayNanoType which also uses i128
   
   > I'm okay for the copying approach
   
   TBC we should only copy as a fallback for when the buffer is not sufficiently aligned. We could probably do this in the general case, it is better than panicking.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279882752

   > I don't think this is correct, we rely on the memory within PrimitiveArray being correctly aligned?
   
   As described in the ticket, for IPC reader where we share same memory and slice with offset and length for individual arrays. The alignment check applies on the entire memory allocation not slices. There will be unexpected alignment check error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382807


##########
arrow-array/src/array/list_array.rs:
##########
@@ -878,6 +878,7 @@ mod tests {
         assert_eq!(array.len(), 2);
         assert_eq!(array.value(0), 0);
         assert_eq!(array.value(1), 0);
+        assert_eq!(array.values(), &[0, 0]);

Review Comment:
   You need to call array.values() on the unaligned PrimitiveArray



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996395378


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,34 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();
+        let pad_len = padding(len_in_bytes, align_req);
+        let mut aligned_buffer = MutableBuffer::with_capacity(len_in_bytes + pad_len);
+        aligned_buffer.extend_from_slice(&buffer.as_slice()[0..len_in_bytes]);
+        aligned_buffer.extend_from_slice(&vec![0u8; pad_len]);
+        aligned_buffer.into()
+    } else {
+        buffer.clone()
+    }
+}
+
+/// Calculate byte boundary and return the number of bytes needed to pad to `align_req` bytes
+#[inline]
+fn padding(len: usize, align_req: usize) -> usize {

Review Comment:
   This is already handled for you by MutableBuffer, you shouldn't need to add explicit padding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996395551


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +500,34 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `i128`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<i128>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = length * std::mem::size_of::<i128>();
+        let pad_len = padding(len_in_bytes, align_req);
+        let mut aligned_buffer = MutableBuffer::with_capacity(len_in_bytes + pad_len);
+        aligned_buffer.extend_from_slice(&buffer.as_slice()[0..len_in_bytes]);
+        aligned_buffer.extend_from_slice(&vec![0u8; pad_len]);
+        aligned_buffer.into()

Review Comment:
   ```suggestion
           Buffer::from_slice_ref(buffer.as_slice)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996473340


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +512,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `T`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer<T>(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<T>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = (length * std::mem::size_of::<T>()).min(buffer.len());

Review Comment:
   Nvm, still waking up 😂



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996382149


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,40 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]
     fn test_primitive_array_alignment() {
-        let ptr = arrow_buffer::alloc::allocate_aligned(8);
+        let ptr = arrow_buffer::alloc::allocate_aligned_zeroed(8);
         let buf = unsafe { Buffer::from_raw_parts(ptr, 8, 8) };
+        let buf1 = buf.slice(1);
         let buf2 = buf.slice(1);
+
+        // Aligned
+        let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf)
+            .len(2)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 2);
+        assert_eq!(array.value(0), 0);
+        assert_eq!(array.value(1), 0);
+
+        // Not aligned.
+        // `ArrayData::build` checks buffer length.
         let array_data = ArrayData::builder(DataType::Int32)
+            .add_buffer(buf1)
+            .len(1)
+            .build()
+            .unwrap();
+        let array = Int32Array::from(array_data);
+        assert_eq!(array.len(), 1);
+        assert_eq!(array.value(0), 0);

Review Comment:
   This is now UB, as this violates the safety requirement in PrimitiveArray::values.
   
   I'm not sure why MIRI isn't catching this...
   
   Edit: `array.value` doesn't call `array.values` so this test isn't UB. If you add a call to `array.values()` in this test, MIRI will fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1280040572

   Benchmark runs are scheduled for baseline = a3effc19cc13d5612ffcca5c04c44dee0995dd46 and contender = bfd87bd8d26d3d3d4851ea0e93035de28e59681e. bfd87bd8d26d3d3d4851ea0e93035de28e59681e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b81207139b80470098b715a833c6fc4d...81fb7696bd864815b51cbf0c42c4d7f7/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/e6bd0658cfc447fda7ed3bcac8f749eb...95c2c43b7478493984e8d2a2b3c4a816/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/3d06b676a4ef4e17b4127eb3a295bc68...4ab43c041eba4cb892e9f53f651d79d2/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4800e4c6093c4d15a12cc4f55de53ca5...67870d12a42c4716b01bab96927a8f18/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279886787

   > It is supposed to be - https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding by ensuring the buffers are padded correctly, we can ensure that correctly aligning the entire memory allocation is sufficient to align the child allocations. When the writer has not done this correctly, we will have to copy the buffers or return an error.
   > 
   > >
   
   How it can be?
   
   That's said we allocate a 80 bytes memory allocation for all buffers containing 1 Int32Array (40 bytes) and  and 1 Decimal128Array (32 bytes) and 1 Int32Array (8 bytes).
   
   When we try to create array for the Decimal128Array, it takes the slice buffer offset by 40 to 80. Then it will fails at `align_offset` as it is not aligned with i128 (16 bytes). 
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#issuecomment-1279887333

   > How it can be?
   
   Because the allocation containing all the buffers is aligned to 8 bytes, and all the contained buffers are padded to a multiple of 8 bytes, consequently each buffer is at an 8 byte boundary.
   
   As mentioned on the ticket, the issue appears to be that arm requires 16-byte alignment for i128 types, which isn't guaranteed by the standard which only mandates padding up to 8 bytes. As such we will need to copy the buffer to a new correctly aligned allocation in such a case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Skip memory alignment check when constructing PrimitiveArray from an array data reference

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996384572


##########
arrow-array/src/array/list_array.rs:
##########
@@ -861,16 +861,42 @@ mod tests {
     }
 
     #[test]
-    #[should_panic(expected = "memory is not aligned")]
+    #[should_panic(expected = "Need at least 8 bytes in buffers[0]")]

Review Comment:
   This is why MIRI isn't failing, the test panics before it does anything interesting... :facepalm: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2883: Copying inappropriately aligned buffer in ipc reader

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2883:
URL: https://github.com/apache/arrow-rs/pull/2883#discussion_r996472775


##########
arrow/src/ipc/reader.rs:
##########
@@ -499,6 +512,24 @@ fn create_primitive_array(
     make_array(array_data)
 }
 
+/// Checks if given `Buffer` is properly aligned with `T`.
+/// If not, copying the data and padded it for alignment.
+fn get_aligned_buffer<T>(buffer: &Buffer, length: usize) -> Buffer {
+    let ptr = buffer.as_ptr();
+    let align_req = std::mem::align_of::<T>();
+    let align_offset = ptr.align_offset(align_req);
+    // The buffer is not aligned properly. The writer might use a smaller alignment
+    // e.g. 8 bytes, but on some platform (e.g. ARM) i128 requires 16 bytes alignment.
+    // We need to copy the buffer as fallback.
+    if align_offset != 0 {
+        let len_in_bytes = (length * std::mem::size_of::<T>()).min(buffer.len());

Review Comment:
   I think the brackets on this are wrong



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org