You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 07:18:05 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9291: ARROW-11345: Made most ops not rely on `value(i)`

jorgecarleitao opened a new pull request #9291:
URL: https://github.com/apache/arrow/pull/9291


   # Problem
   
   Currently, a lot of our code relies on `PrimitiveArray::value(usize)`, however, that method is highly `unsafe` as any `usize` larger than the length of the array allows to read arbitrary memory regions due to the lack of protections.
   
   # This PR:
   
   This PR changes many of our code to not rely on it for this operation, replacing by a safe alternative. This PR is expected to improve the performance of the touched kernels as we already have alternatives to efficiently create an array out of an iterator.
   
   todo: benchmark


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9291: ARROW-11345: Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#discussion_r562432512



##########
File path: rust/arrow/src/array/ord.rs
##########
@@ -66,7 +68,9 @@ where
 {
     let left = left.as_any().downcast_ref::<PrimitiveArray<T>>().unwrap();
     let right = right.as_any().downcast_ref::<PrimitiveArray<T>>().unwrap();
-    Box::new(move |i, j| cmp_nans_last(&left.value(i), &right.value(j)))
+    let left = left.values();
+    let right = right.values();
+    Box::new(move |i, j| cmp_nans_last(&left[i], &right[j]))

Review comment:
       this will be slower, but at this point we have no guarantee that `i` and `j` are within bounds.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9291: ARROW-11345: Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-765189910


   https://issues.apache.org/jira/browse/ARROW-11345


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-765203292


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=h1) Report
   > Merging [#9291](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=desc) (032b2ba) into [master](https://codecov.io/gh/apache/arrow/commit/437c8c944acb3479b76804f041f5f8cbce309fa7?el=desc) (437c8c9) will **increase** coverage by `0.00%`.
   > The diff coverage is `97.67%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9291/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9291   +/-   ##
   =======================================
     Coverage   81.88%   81.89%           
   =======================================
     Files         215      215           
     Lines       52988    52972   -16     
   =======================================
   - Hits        43391    43379   -12     
   + Misses       9597     9593    -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/equal\_json.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWxfanNvbi5ycw==) | `92.16% <85.71%> (-0.26%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <100.00%> (ø)` | |
   | [rust/arrow/src/array/ord.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvb3JkLnJz) | `62.50% <100.00%> (+1.07%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.20% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=footer). Last update [437c8c9...032b2ba](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9291: ARROW-11345: Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-765203292


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=h1) Report
   > Merging [#9291](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=desc) (beacc5d) into [master](https://codecov.io/gh/apache/arrow/commit/c413566b34bd0c13a01a68148bd78df1bdec3c10?el=desc) (c413566) will **increase** coverage by `0.12%`.
   > The diff coverage is `97.67%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9291/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9291      +/-   ##
   ==========================================
   + Coverage   81.61%   81.74%   +0.12%     
   ==========================================
     Files         215      215              
     Lines       52508    52486      -22     
   ==========================================
   + Hits        42854    42903      +49     
   + Misses       9654     9583      -71     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/equal\_json.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWxfanNvbi5ycw==) | `92.16% <85.71%> (-0.26%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.52% <100.00%> (ø)` | |
   | [rust/arrow/src/array/ord.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvb3JkLnJz) | `62.50% <100.00%> (+1.07%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.20% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/arrow/src/array/data.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZGF0YS5ycw==) | `78.86% <0.00%> (-18.45%)` | :arrow_down: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <0.00%> (-0.42%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <0.00%> (ø)` | |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `74.44% <0.00%> (+0.07%)` | :arrow_up: |
   | [rust/arrow/src/record\_batch.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvcmVjb3JkX2JhdGNoLnJz) | `84.32% <0.00%> (+0.35%)` | :arrow_up: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `92.56% <0.00%> (+0.45%)` | :arrow_up: |
   | ... and [8 more](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=footer). Last update [c413566...beacc5d](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-809733266


   @jorgecarleitao  is this PR something that you plan to clean up and merge? Or should we close this PR as you have switched to working on arrow 2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9291: ARROW-11345: Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-765190026


   cc @tyrelr .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9291:
URL: https://github.com/apache/arrow/pull/9291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#discussion_r567897431



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -569,29 +569,27 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Time64(_), Int64) => cast_array_data::<Int64Type>(array, to_type.clone()),
         (Date32(DateUnit::Day), Date64(DateUnit::Millisecond)) => {
             let date_array = array.as_any().downcast_ref::<Date32Array>().unwrap();
-            let mut b = Date64Builder::new(array.len());
-            for i in 0..array.len() {
-                if array.is_null(i) {
-                    b.append_null()?;
-                } else {
-                    b.append_value(date_array.value(i) as i64 * MILLISECONDS_IN_DAY)?;
-                }
-            }
 
-            Ok(Arc::new(b.finish()) as ArrayRef)
+            // todo: can be optimized by computing this for the whole values buffer and

Review comment:
       we can open a JIRA to track this. I think 'unary' can also work (or was this before the cast improvement?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#discussion_r567897431



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -569,29 +569,27 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Time64(_), Int64) => cast_array_data::<Int64Type>(array, to_type.clone()),
         (Date32(DateUnit::Day), Date64(DateUnit::Millisecond)) => {
             let date_array = array.as_any().downcast_ref::<Date32Array>().unwrap();
-            let mut b = Date64Builder::new(array.len());
-            for i in 0..array.len() {
-                if array.is_null(i) {
-                    b.append_null()?;
-                } else {
-                    b.append_value(date_array.value(i) as i64 * MILLISECONDS_IN_DAY)?;
-                }
-            }
 
-            Ok(Arc::new(b.finish()) as ArrayRef)
+            // todo: can be optimized by computing this for the whole values buffer and

Review comment:
       we can open a JIRA to track this. I think 'unary' can also work (or was this before the cast improvement?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9291:
URL: https://github.com/apache/arrow/pull/9291#issuecomment-765203292


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=h1) Report
   > Merging [#9291](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=desc) (032b2ba) into [master](https://codecov.io/gh/apache/arrow/commit/437c8c944acb3479b76804f041f5f8cbce309fa7?el=desc) (437c8c9) will **increase** coverage by `0.00%`.
   > The diff coverage is `97.67%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9291/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9291   +/-   ##
   =======================================
     Coverage   81.88%   81.89%           
   =======================================
     Files         215      215           
     Lines       52988    52972   -16     
   =======================================
   - Hits        43391    43379   -12     
   + Misses       9597     9593    -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/equal\_json.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWxfanNvbi5ycw==) | `92.16% <85.71%> (-0.26%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <100.00%> (ø)` | |
   | [rust/arrow/src/array/ord.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvb3JkLnJz) | `62.50% <100.00%> (+1.07%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.20% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9291/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=footer). Last update [437c8c9...032b2ba](https://codecov.io/gh/apache/arrow/pull/9291?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org