You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 06:13:38 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

jorgecarleitao opened a new pull request #9212:
URL: https://github.com/apache/arrow/pull/9212


   This PR removes any risk of boolean values to be converted to bytes via `ToByteSlice` by explicitly making `ArrowNativeType` be only used in types whose in-memory representation in Rust equates to the in-memory representation in Arrow. `bool` in Rust is a byte and in Arrow it is a bit.
   
   Overall, the direction of this PR is to have the traits represent one aspect of the type. In this case, `ArrowNativeType` is currently 
   * a type that has the same in memory representation (ToByteSlice is implemented for it)
   * a json serializable type
   * something that can be casted to/from `usize`.
   
   This poses a problem because:
   
   1. bools are serializable, not castable to usize, have different memory representation
   2. fixed size (iX, uX) are serializable, castable to usize, have the same memory representation
   3. fixed floating (f32, f64) are serializable, not castable to usize, have the same memory representation
   
   however, they all implement `ArrowNativeType`.
   
   This PR focus on splitting the json-serializable part of it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-761773213


   I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging  changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more [discussion](https://lists.apache.org/list.html?dev@arrow.apache.org) in the mailing list 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-761894775


   I did not see this, but had the same idea at around the same time :) Another motivation for this change is that this could make `Buffer::typed_data` safe and allow us to remove the `num::Num` trait bound from that method.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-761516510


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=h1) Report
   > Merging [#9212](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=desc) (1cd33d0) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **increase** coverage by `0.00%`.
   > The diff coverage is `80.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9212/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9212   +/-   ##
   =======================================
     Coverage   81.61%   81.61%           
   =======================================
     Files         215      215           
     Lines       51867    51866    -1     
   =======================================
     Hits        42329    42329           
   + Misses       9538     9537    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9212/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `66.87% <0.00%> (+0.14%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/9212/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.56% <100.00%> (ø)` | |
   | [rust/arrow/src/datatypes.rs](https://codecov.io/gh/apache/arrow/pull/9212/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzLnJz) | `78.47% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=footer). Last update [eaa7b7a...1cd33d0](https://codecov.io/gh/apache/arrow/pull/9212?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#discussion_r559493254



##########
File path: rust/arrow/src/datatypes.rs
##########
@@ -199,11 +199,18 @@ pub struct Field {
     metadata: Option<BTreeMap<String, String>>,
 }
 
-pub trait ArrowNativeType:
-    fmt::Debug + Send + Sync + Copy + PartialOrd + FromStr + Default + 'static
-{
+/// Trait declaring any type that is serializable to JSON. This includes all primitive types (bool, i32, etc.).
+pub trait JsonSerializable: 'static {
     fn into_json_value(self) -> Option<Value>;
+}
 
+/// Trait expressing a Rust type that has the same in-memory representation
+/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits).
+/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers
+/// as is.
+pub trait ArrowNativeType:
+    fmt::Debug + Send + Sync + Copy + PartialOrd + FromStr + Default + JsonSerializable

Review comment:
       This is a nice approach (breaking out the `JsonSerializable` part) 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann edited a comment on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
jhorstmann edited a comment on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-761894775


   I did not see this, but had the same idea at around the same time :) This matches now also the distinction between PrimitiveArray and BooleanArray.
   
   Another motivation for this change is that this could make `Buffer::typed_data` safe and allow us to remove the `num::Num` trait bound from that method.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9212:
URL: https://github.com/apache/arrow/pull/9212


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-761498541


   https://issues.apache.org/jira/browse/ARROW-11265


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#issuecomment-763555968


   I merged this branch locally into master and re-ran the tests. Things looked good. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#discussion_r558882578



##########
File path: rust/arrow/src/util/integration_util.rs
##########
@@ -449,12 +449,10 @@ impl ArrowJsonBatch {
                     for i in 0..col.len() {
                         if col.is_null(i) {
                             validity.push(1);

Review comment:
       This is unrelated, but unless I'm reading this incorrectly (likely am), I would have thought that we'd `validity.push(0)` here for null values.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#discussion_r559251445



##########
File path: rust/arrow/src/datatypes.rs
##########
@@ -199,11 +199,18 @@ pub struct Field {
     metadata: Option<BTreeMap<String, String>>,
 }
 
-pub trait ArrowNativeType:
-    fmt::Debug + Send + Sync + Copy + PartialOrd + FromStr + Default + 'static
-{
+/// Trait declaring any type that is serializable to JSON. This includes all primitive types (bool, i32, etc.).
+pub trait JsonSerializable: 'static {
     fn into_json_value(self) -> Option<Value>;
+}
 
+/// Trait expressing a Rust type that has the same in-memory representation
+/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits).
+/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers

Review comment:
       Does the in-memory format require little-endian? I'd expect conversion to happen when reading data into memory, which should make memcpy work on bot LE and BE machines. Assuming absence of other endianness bugs of course.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on a change in pull request #9212: ARROW-11265: [Rust] Made bool not ArrowNativeType

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on a change in pull request #9212:
URL: https://github.com/apache/arrow/pull/9212#discussion_r559286943



##########
File path: rust/arrow/src/datatypes.rs
##########
@@ -199,11 +199,18 @@ pub struct Field {
     metadata: Option<BTreeMap<String, String>>,
 }
 
-pub trait ArrowNativeType:
-    fmt::Debug + Send + Sync + Copy + PartialOrd + FromStr + Default + 'static
-{
+/// Trait declaring any type that is serializable to JSON. This includes all primitive types (bool, i32, etc.).
+pub trait JsonSerializable: 'static {
     fn into_json_value(self) -> Option<Value>;
+}
 
+/// Trait expressing a Rust type that has the same in-memory representation
+/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits).
+/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers

Review comment:
       It does not require it but is [little-endian by default](https://arrow.apache.org/docs/format/Columnar.html#byte-order-endianness).
   
   I recall someone wanting to add big-endian support to the C++ impl.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org