You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/12 19:12:25 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8901: [Rust] [Proposal] Add guidelines about usage of `unsafe`

jorgecarleitao opened a new pull request #8901:
URL: https://github.com/apache/arrow/pull/8901


   I would like to propose that we outline and enforce guidelines on the arrow crate implementation with respect to the usage of `unsafe`.
   
   The background of this proposal are PRs #8645 and #8829. In both cases, while addressing an unrelated issue, they hit undefined behavior (UB) due to an incorrect usage of `unsafe` in the code base. This UB was very time-consuming to identify and debug: combined, they accounted for more than 12hs of my time.
   
   Safety against undefined behavior is the core premise of the Rust language. In many cases, the maintenance burden (time to find and fix bugs) does not justify the performance improvements and the decrease in motivation in handling them (they are just painful due to how difficult they are to debug). In particular, IMO those 12 hours would have been better spent in other parts of the code if `unsafe` would have not been used in the first place, which would have been likely translated in faster code or more features.
   
   There are situations where `unsafe` is necessary, and the guidelines outline these cases. However, I also see many uses of `unsafe` that are not necessary nor properly documented.
   
   The goal of these guidelines is to motivate contributors of the Rust implementation to be conscious about the maintenance cost of `unsafe`, and outline specific necessary conditions for any new `unsafe` to be introduced in the code base.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-745383359


   > I saw a tweet yesterday from @timClicks showing an example of how the imageproc project documents their unsafe code and I really like it. I think we should do something similiar. Maybe we could add this to the guidelines @jorgecarleitao ?
   > 
   > ```rust
   >             // JUSTIFICATION
   >             //  Benefit
   >             //      Using checked indexing here makes bench_integral_image_rgb take 1.05x as long
   >             //      (The results are noisy, but this seems to be reproducible. I've not checked the generated assembly.)
   >             //  Correctness
   >             //      x and y are within bounds by definition of in_width and in_height
   >             let input = unsafe { image.unsafe_get_pixel(x, y) };
   > ```
   
   Saw that too :) I think it's an interesting approach. In this PR https://github.com/apache/arrow/pull/8900 I added a comment with "SAFETY", but that misses the benefit part.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r541866432



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.
+
+### Considerations when introducing `unsafe`
+
+Usage of `unsafe` in this crate *must*:
+
+* not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
+* have code documentation for why a `safe` is not used / possible (e.g. `// 30% performance degradation if the safe counterpart is used, see bench X`)
+* have code documentation about which invariant the user needs to enforce to ensure no undefined behavior (e.g. `// this buffer must be constructed according to the arrow specification`)
+* if applicable, have the necessary `assert`s (not `debug_assert`!) of invariants.
+

Review comment:
       I think places that do use unsafe, should have their invariants in a debug assert (as it's often something you want to skill for release builds).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r542389842



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       I understand the sentiment.
   
   My concern is that I am not sure I would like to maintain a complex `unsafe` code for a 6% improvement _at this particular phase_ of the project.
   
   The rational for a concrete number is to impose a bound that we consider to not have the capacity to handle the maintenance cost for "small" improvements, and thus people should not try to focus on those types of improvements (again, _at this particular phase_ of the project).
   
   My concern with "performance benefits are sufficiently large" is that anything bigger than zero is always "sufficiently large" when compared to zero.
   
   What if we write something like
   
   > usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large (>~10%)
   
   so that we allow other values, but we offer a number for reference?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-745791950


   Thanks all for the suggestions. Really good stuff.
   
   I have incorporated all the changes:
   
   * the change of the requirement for performance was re-written as suggested by @alamb and @Dandandan 
   * added a code snipped with the idea from @andygrove
   * fixed typos identified by @paddyhoran 
   
   I have used the term `soundness` instead of `correctness` because the former is already defined on an official guide: https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library and is based on existing literature, while the latter seems to be used only by imageproc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r542656133



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       Yes, I think it would be preferable to have some room to trade safety for performance and vice versa. A change that makes everything a few percent faster might be acceptable, while a 50% improvement on a micro benchmark might not in some cases. And in some cases you maybe want to reduce the amount of safety, but might want to accept a certain negative performance impact.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ritchie46 commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r542505339



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       Could the improvement be expressed by the **"hotness"** of the code? I can imagine that a 6% improvement in the `Buffer`s is much more valuable than a 11% percentage improvement in a specific kernel.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743803763


   https://issues.apache.org/jira/browse/ARROW-10889


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743827047


   Very cool proposal. Will take some time soon to read it in detail. I think currently there is a lot of code violating what is proposed, so that's clearly needed πŸ‘ 
   
   I think it also makes sense to keep track of the amount of `unsafe` that is used, and try to reduce it as much as possible by introducing safe APIs that are as performant. An example might be: introducing a safe iterator to avoid bounds checking. 
   
   Also I think the unsafe code can /should be more tested, e.g. by having more `debug_asserts` etc. checking for invariants and checking for undefined behavior in the CI.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #8901:
URL: https://github.com/apache/arrow/pull/8901


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r541866432



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.
+
+### Considerations when introducing `unsafe`
+
+Usage of `unsafe` in this crate *must*:
+
+* not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
+* have code documentation for why a `safe` is not used / possible (e.g. `// 30% performance degradation if the safe counterpart is used, see bench X`)
+* have code documentation about which invariant the user needs to enforce to ensure no undefined behavior (e.g. `// this buffer must be constructed according to the arrow specification`)
+* if applicable, have the necessary `assert`s (not `debug_assert`!) of invariants.
+

Review comment:
       I think places that do use unsafe, should have their invariants in a debug assert (as it's often something you want to skip for release builds).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743803119


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=h1) Report
   > Merging [#8901](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=desc) (6251d00) into [master](https://codecov.io/gh/apache/arrow/commit/2816f37ff01cfd31101b3ee6cc8574cc9246dd1b?el=desc) (2816f37) will **decrease** coverage by `0.18%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8901/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8901      +/-   ##
   ==========================================
   - Coverage   76.99%   76.81%   -0.19%     
   ==========================================
     Files         174      181       +7     
     Lines       40392    40987     +595     
   ==========================================
   + Hits        31099    31483     +384     
   - Misses       9293     9504     +211     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vYnVpbGRlci5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/dfschema.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZGZzY2hlbWEucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/parquet/src/bin/parquet-rowcount.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9iaW4vcGFycXVldC1yb3djb3VudC5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/parquet/src/bin/parquet-schema.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9iaW4vcGFycXVldC1zY2hlbWEucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [...ion-testing/src/bin/arrow-json-integration-test.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctanNvbi1pbnRlZ3JhdGlvbi10ZXN0LnJz) | `0.00% <0.00%> (ΓΈ)` | |
   | ... and [9 more](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=footer). Last update [2816f37...6251d00](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r541865867



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       Maybe 10% is a bit arbitrary and depends on context? Maybe it should.bd demonstrated / be likely that it has an impact on real world usage of arrow?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] timClicks commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
timClicks commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-747059687


   Thanks all for working hard to keep Arrow as robust as possible


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r541902004



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+

Review comment:
       ```suggestion
   For two real world examples of where `unsafe` has consumed time in the past in this project see [#8545](https://github.com/apache/arrow/pull/8645) and [8829](https://github.com/apache/arrow/pull/8829)
   ```

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.

Review comment:
       https://doc.rust-lang.org/reference/behavior-considered-undefined.html might be a reasonable citation, though it doesn't quantify the sources of unsafetly.

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+

Review comment:
       ```suggestion
   Generally, `unsafe` should only be used when a `safe` counterpart is not available and there is no `safe` way to achieve additional performance in that area. The following is a summary of the current components of the crate that require `unsafe`:
   
   ```

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.
+
+### Considerations when introducing `unsafe`
+
+Usage of `unsafe` in this crate *must*:
+
+* not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
+* have code documentation for why a `safe` is not used / possible (e.g. `// 30% performance degradation if the safe counterpart is used, see bench X`)
+* have code documentation about which invariant the user needs to enforce to ensure no undefined behavior (e.g. `// this buffer must be constructed according to the arrow specification`)
+* if applicable, have the necessary `assert`s (not `debug_assert`!) of invariants.
+

Review comment:
       I also don't fully understand the mandate about `assert` -- I suggest removing the last bullet point as I think it may be more confusing than helpful

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance

Review comment:
       I suggest being more explicit here -- maybe "Omitting bounds checking for performance" or something

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.

Review comment:
       stylistically, I think we could remove the 'Usage of `unsafe` for ...' sentences. The rationale for use of unsafe is already explained above so this repetition seems redundant to me.
   
   I also think such wording may give the impression that `unsafe` is always allowed for these operations, when I think the intent is "unsafe can be used as a last resort"

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.
+
+### Considerations when introducing `unsafe`
+
+Usage of `unsafe` in this crate *must*:
+
+* not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
+* have code documentation for why a `safe` is not used / possible (e.g. `// 30% performance degradation if the safe counterpart is used, see bench X`)

Review comment:
       ```suggestion
   * have code documentation for why `safe` is not used / possible (e.g. `// 30% performance degradation if the safe counterpart is used, see bench X`)
   ```

##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       I agree that this wording would be stronger without a specific bound. I would rather say something like "usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large. Performance benefits should be quantified with a `bench`".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r542627788



##########
File path: rust/arrow/README.md
##########
@@ -93,6 +93,79 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the primary cause of undefined behavior in a program written in Rust `[citation needed]`.
+
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` can only be used when a `safe` counterpart is not available or has performance implications. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+Usage of `unsafe` for the purposes of supporting allocations aligned with cache lines is allowed.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.
+
+Usage of `unsafe` for the purposes of supporting FFI is allowed.
+
+#### IPC
+
+The arrow format declares a IPC protocol, which this crate supports. IPC is equivalent to a FFI in that the rust compiler can't reason about the contract's invariants.
+
+Usage of `unsafe` for the purposes of supporting IPC is allowed.
+
+#### SIMD
+
+The API provided by the `packed_simd` library is currently `unsafe`. However, SIMD offers a significant performance improvement over non-SIMD operations.
+
+Usage of `unsafe` for the purposes of supporting SIMD is allowed.
+
+#### Performance
+
+Some operations are significantly faster when `unsafe` is used.
+
+A common usage of `unsafe` is to offer an API to access the `i`th element of an array (e.g. `UInt32Array`).
+This requires accessing the values buffer e.g. `array.buffers()[0]`, picking the slice 
+`[i * size_of<i32>(), (i + 1) * size_of<i32>()]`, and then transmuting it to `i32`. In safe Rust, 
+this operation requires boundary checks that are detrimental to performance.
+
+Usage of `unsafe` for performance reasons is justified only when the performance difference of a publicly available API is estatistically significantly larger than 10%, as demonstrated by a `bench`.

Review comment:
       I like
   
   > usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large (e.g. >~10%)
   
   @ritchie46  I agree with the sentiments that some performance improvements are more important than others, which is why I was suggesting leaving room for interpretation in the writeup




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743802577


   This is funny timing. I just filed ARROW-10889 for this and then saw this PR!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on a change in pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on a change in pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#discussion_r543591924



##########
File path: rust/arrow/README.md
##########
@@ -96,6 +96,71 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the [primary cause of undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) in a program written in Rust.
+For two real world examples of where `unsafe` has consumed time in the past in this project see [#8545](https://github.com/apache/arrow/pull/8645) and [8829](https://github.com/apache/arrow/pull/8829)
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` should only be used when a `safe` counterpart is not available and there is no `safe` way to achieve additional performance in that area. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance (e.g. omit bounds checks, use of pointers to avoid bound checks)
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.

Review comment:
       ```suggestion
   However, Rust's global allocator does not allocate memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
   ```

##########
File path: rust/arrow/README.md
##########
@@ -96,6 +96,71 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the [primary cause of undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) in a program written in Rust.
+For two real world examples of where `unsafe` has consumed time in the past in this project see [#8545](https://github.com/apache/arrow/pull/8645) and [8829](https://github.com/apache/arrow/pull/8829)
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` should only be used when a `safe` counterpart is not available and there is no `safe` way to achieve additional performance in that area. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance (e.g. omit bounds checks, use of pointers to avoid bound checks)
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.

Review comment:
       ```suggestion
   depending on the `DataType`.
   ```

##########
File path: rust/arrow/README.md
##########
@@ -96,6 +96,71 @@ Arrow uses the following features:
 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
 
+## Guidelines in usage of `unsafe`
+
+[`unsafe`](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html) has a high maintenance cost because debugging and testing it is difficult, time consuming, often requires external tools (e.g. `valgrind`), and requires a higher-than-usual attention to details. Undefined behavior is particularly difficult to identify and test, and usage of `unsafe` is the [primary cause of undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) in a program written in Rust.
+For two real world examples of where `unsafe` has consumed time in the past in this project see [#8545](https://github.com/apache/arrow/pull/8645) and [8829](https://github.com/apache/arrow/pull/8829)
+This crate only accepts the usage of `unsafe` code upon careful consideration, and strives to avoid it to the largest possible extent.
+
+### When can `unsafe` be used?
+
+Generally, `unsafe` should only be used when a `safe` counterpart is not available and there is no `safe` way to achieve additional performance in that area. The following is a summary of the current components of the crate that require `unsafe`:
+
+* alloc, dealloc and realloc of buffers along cache lines
+* Interpreting bytes as certain rust types, for access, representation and compute
+* Foreign interfaces (C data interface)
+* Inter-process communication (IPC)
+* SIMD
+* Performance (e.g. omit bounds checks, use of pointers to avoid bound checks)
+
+#### cache-line aligned memory management
+
+The arrow format recommends storing buffers aligned with cache lines, and this crate adopts this behavior.
+However, Rust's global allocator does not allocates memory aligned with cache-lines. As such, many of the low-level operations related to memory management require `unsafe`.
+
+#### Interpreting bytes
+
+The arrow format is specified in bytes (`u8`), which can be logically represented as certain types
+depending on the DataType.
+For many operations, such as access, representation, numerical computation and string manipulation,
+it is often necessary to interpret bytes as other physical types (e.g. `i32`).
+
+Usage of `unsafe` for the purpose of interpreting bytes in their corresponding type (according to the arrow specification) is allowed. Specifically, the pointer to the byte slice must be aligned to the type that it intends to represent and the length of the slice is a multiple of the size of the target type of the transmutation.
+
+#### FFI
+
+The arrow format declares an ABI for zero-copy from and to libraries that implement the specification
+(foreign interfaces). In Rust, receiving and sending pointers via FFI requires usage of `unsafe` due to 
+the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment), from the source code alone as they are part of the FFI contract.

Review comment:
       ```suggestion
   the impossibility of the compiler to derive the invariants (such as lifetime, null pointers, and pointer alignment) from the source code alone as they are part of the FFI contract.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743803119


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=h1) Report
   > Merging [#8901](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=desc) (58d9327) into [master](https://codecov.io/gh/apache/arrow/commit/2816f37ff01cfd31101b3ee6cc8574cc9246dd1b?el=desc) (2816f37) will **decrease** coverage by `0.18%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8901/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8901      +/-   ##
   ==========================================
   - Coverage   76.99%   76.81%   -0.19%     
   ==========================================
     Files         174      181       +7     
     Lines       40392    40987     +595     
   ==========================================
   + Hits        31099    31483     +384     
   - Misses       9293     9504     +211     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vYnVpbGRlci5ycw==) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/dfschema.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZGZzY2hlbWEucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [...ntegration-testing/src/bin/arrow-stream-to-file.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctc3RyZWFtLXRvLWZpbGUucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/parquet\_derive/src/lib.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0X2Rlcml2ZS9zcmMvbGliLnJz) | `0.00% <0.00%> (ΓΈ)` | |
   | [...ntegration-testing/src/bin/arrow-file-to-stream.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctZmlsZS10by1zdHJlYW0ucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | ... and [9 more](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=footer). Last update [2816f37...58d9327](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-745379959


   I saw a tweet yesterday from @timClicks showing an example of how the imageproc project documents their unsafe code and I really like it. I think we should do something similiar. Maybe we could add this to the guidelines @jorgecarleitao ?
   
   ```rust
               // JUSTIFICATION
               //  Benefit
               //      Using checked indexing here makes bench_integral_image_rgb take 1.05x as long
               //      (The results are noisy, but this seems to be reproducible. I've not checked the generated assembly.)
               //  Correctness
               //      x and y are within bounds by definition of in_width and in_height
               let input = unsafe { image.unsafe_get_pixel(x, y) };
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-744626848


   Some mistake on my part. Fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-747047037


   I think this PR has enough approvals / πŸ‘  so I merged it in. We can continue to iterate on it in the future but this is a great base to work from. Thank you @jorgecarleitao and everyone else who contributed. A true team effort πŸ‘ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743803119


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=h1) Report
   > Merging [#8901](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=desc) (ff35ce8) into [master](https://codecov.io/gh/apache/arrow/commit/5353c285c6dfb3381ac0f1c9e7cd63d7fcb8da4a?el=desc) (5353c28) will **increase** coverage by `7.78%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8901/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8901      +/-   ##
   ==========================================
   + Coverage   75.48%   83.26%   +7.78%     
   ==========================================
     Files         181      195      +14     
     Lines       41649    48066    +6417     
   ==========================================
   + Hits        31439    40023    +8584     
   + Misses      10210     8043    -2167     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/parquet/src/data\_type.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9kYXRhX3R5cGUucnM=) | `78.72% <0.00%> (-5.89%)` | :arrow_down: |
   | [rust/parquet/src/util/memory.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL21lbW9yeS5ycw==) | `89.57% <0.00%> (-1.04%)` | :arrow_down: |
   | [rust/parquet/src/file/statistics.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL3N0YXRpc3RpY3MucnM=) | `93.80% <0.00%> (-0.60%)` | :arrow_down: |
   | [rust/arrow/src/datatypes.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzLnJz) | `76.45% <0.00%> (-0.41%)` | :arrow_down: |
   | [rust/parquet/src/encodings/rle.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvcmxlLnJz) | `92.68% <0.00%> (-0.35%)` | :arrow_down: |
   | [rust/parquet/src/schema/parser.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvcGFyc2VyLnJz) | `89.86% <0.00%> (-0.34%)` | :arrow_down: |
   | [rust/arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZmZpLnJz) | `70.28% <0.00%> (-0.34%)` | :arrow_down: |
   | [rust/parquet/src/record/api.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9yZWNvcmQvYXBpLnJz) | `98.01% <0.00%> (-0.14%)` | :arrow_down: |
   | [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `83.09% <0.00%> (-0.09%)` | :arrow_down: |
   | [rust/arrow/src/csv/writer.rs](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3dyaXRlci5ycw==) | `78.82% <0.00%> (ΓΈ)` | |
   | ... and [98 more](https://codecov.io/gh/apache/arrow/pull/8901/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=footer). Last update [48fee66...ff35ce8](https://codecov.io/gh/apache/arrow/pull/8901?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-743806817


   It would be really cool to have a GitHub Action that detected `unsafe` in a PR and auto-posted a link to this documentation, as well as tagging the PR with an `unsafe` label. I do not know how feasible that is though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8901: ARROW-10889: [Rust] [Proposal] Add guidelines about usage of `unsafe`

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8901:
URL: https://github.com/apache/arrow/pull/8901#issuecomment-744624330


   @jorgecarleitao  I am not sure what has happened to this PR but it now seemingly includes many changes that are unrelated to the readme improvements:
   <img width="1384" alt="Screen Shot 2020-12-14 at 1 24 21 PM" src="https://user-images.githubusercontent.com/490673/102119836-bc543980-3e0f-11eb-93dd-6a8d84c215ed.png">
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org