You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/16 08:04:04 UTC

[GitHub] [arrow-rs] ritchie46 opened a new pull request #296: fix invalid null handling in filter

ritchie46 opened a new pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296


   # Which issue does this PR close?
   
   This fixes #295.
   
   This is a very simple solution to the problem and doesn't change any filtering behavior (though we may simplify the filtering if this PR is ok). In case of `null` values in a boolean mask I do a `mask` & `null_bitmask` operation and create a new `boolean` predicate that has no null values (all nulls are `false`), due to the AND operation.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841826066


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#296](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (973a10c) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3665296b4e985a840ad21815b839e5fa3fc462d5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3665296) will **increase** coverage by `0.02%`.
   > The diff coverage is `89.44%`.
   
   > :exclamation: Current head 973a10c differs from pull request most recent head 5fc668f. Consider uploading reports for the commit 5fc668f to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/296/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #296      +/-   ##
   ==========================================
   + Coverage   82.49%   82.52%   +0.02%     
   ==========================================
     Files         162      162              
     Lines       43980    44029      +49     
   ==========================================
   + Hits        36283    36336      +53     
   + Misses       7697     7693       -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `85.29% <ø> (ø)` | |
   | [arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2ZmaS5ycw==) | `82.61% <87.15%> (+1.38%)` | :arrow_up: |
   | [arrow/src/array/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2ZmaS5ycw==) | `100.00% <100.00%> (+14.86%)` | :arrow_up: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.98% <100.00%> (+0.58%)` | :arrow_up: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3665296...5fc668f](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-845228943


   Rerunning the CI as part of it showed red for some reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841785929


   This looks cool! Also if we can simplify / speed up filters this way, that would be very interesting, there is still quite some optimization potential in the filter kernel I believe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633096611



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I used `buffer_bin_and`, I assume it does a "binary and" operation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-843000835


   I checked out the branch to have a look with a bit more context. The logic looks good and makes this kernel a lot easier to use. The test looks good and should cover exactly this problem.
   
   One minor thing: There is a doc comment with a warning about null values above the `filter` function, which is now out of data.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-843538046


   One small future improvement might be not creating a new array for filters, but computing / passing the changed buffer instead.
   
   But the removal of undefined behavior for this kernel is really good I would say :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-842661510


   @jhorstmann  / @Dandandan  do you think this one is ready to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841826066


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#296](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b51d484) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3665296b4e985a840ad21815b839e5fa3fc462d5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3665296) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/296/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #296   +/-   ##
   =======================================
     Coverage   82.49%   82.50%           
   =======================================
     Files         162      162           
     Lines       43980    44002   +22     
   =======================================
   + Hits        36283    36305   +22     
     Misses       7697     7697           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.98% <100.00%> (+0.58%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3665296...b51d484](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633094815



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       Good catch. 

##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I used `buffer_bin_and`, I assume it does a "binary and" operation?

##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I used `buffer_bin_and`, I assume it does a "binary and" operation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633061233



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();
+        let mask = filter.values();
+        let new_mask = mask.bitand(&null_bitmap)?;
+
+        let array_data = ArrayData::builder(DataType::Boolean)
+            .len(filter.len())
+            .add_buffer(new_mask)
+            .build();
+        let filter = BooleanArray::from(array_data);
+        // fully syntax, because we have an argument with the same name

Review comment:
       ```suggestion
           // fully qualified syntax, because we have an argument with the same name
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633093885



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I think this needs to take the offset of the array into account, similar to https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/util.rs#L45




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841826066


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#296](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b51d484) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3665296b4e985a840ad21815b839e5fa3fc462d5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3665296) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/296/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #296   +/-   ##
   =======================================
     Coverage   82.49%   82.50%           
   =======================================
     Files         162      162           
     Lines       43980    44002   +22     
   =======================================
   + Hits        36283    36305   +22     
     Misses       7697     7697           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.98% <100.00%> (+0.58%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3665296...b51d484](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841785929


   This looks cool! Also if we can simplify / speed up filters this way, that would be very interesting, there is still quite some optimization potential in the filter kernel I believe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633094815



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       Good catch. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r634773487



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -581,4 +602,27 @@ mod tests {
         assert_eq!(chunks, vec![(1, 62), (63, 124), (125, 130)]);
         assert_eq!(filter_count, 61 + 61 + 5);
     }
+
+    #[test]
+    fn test_null_mask() -> Result<()> {
+        use crate::compute::kernels::comparison;
+        let a: PrimitiveArray<Int64Type> =
+            PrimitiveArray::from(vec![Some(1), Some(2), None]);
+        let mask0 = comparison::eq(&a, &a)?;
+        let out0 = filter(&a, &mask0)?;
+        let out_arr0 = out0
+            .as_any()
+            .downcast_ref::<PrimitiveArray<Int64Type>>()
+            .unwrap();
+
+        let mask1 = BooleanArray::from(vec![Some(true), Some(true), None]);
+        let out1 = filter(&a, &mask1)?;
+        let out_arr1 = out1
+            .as_any()
+            .downcast_ref::<PrimitiveArray<Int64Type>>()
+            .unwrap();
+        assert_eq!(mask0, mask1);
+        assert_eq!(out_arr0, out_arr1);

Review comment:
       This check and test makes sense to me (that the result of filtering using the output of `eq` should equal the output of filtering with a boolean mask that has nulls) 👍 
   
   I also ran this test with the change in this PR commented out and it failed (as expected) in this way:
   
   ```
   ---- compute::kernels::filter::tests::test_null_mask stdout ----
   thread 'compute::kernels::filter::tests::test_null_mask' panicked at 'assertion failed: `(left == right)`
     left: `PrimitiveArray<Int64>
   [
     1,
     2,
     null,
   ]`,
    right: `PrimitiveArray<Int64>
   [
     1,
     2,
   ]`', arrow/src/compute/kernels/filter.rs:606:9
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   So 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633093885



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I think this needs to take the offset of the array into account, similar to https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/util.rs#L45




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan merged pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633096611



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();

Review comment:
       I used `buffer_bin_and`, I assume it does a "binary and" operation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841826066


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#296](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b51d484) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3665296b4e985a840ad21815b839e5fa3fc462d5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3665296) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/296/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #296   +/-   ##
   =======================================
     Coverage   82.49%   82.50%           
   =======================================
     Files         162      162           
     Lines       43980    44002   +22     
   =======================================
   + Hits        36283    36305   +22     
     Misses       7697     7697           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.98% <100.00%> (+0.58%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3665296...b51d484](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on a change in pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#discussion_r633061233



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -221,6 +223,23 @@ pub fn build_filter(filter: &BooleanArray) -> Result<Filter> {
 /// # }
 /// ```
 pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
+    if filter.null_count() > 0 {
+        // this greatly simplifies subsequent filtering code
+        // now we only have a boolean mask to deal with
+        let array_data = filter.data_ref();
+        let null_bitmap = array_data.null_buffer().unwrap();
+        let mask = filter.values();
+        let new_mask = mask.bitand(&null_bitmap)?;
+
+        let array_data = ArrayData::builder(DataType::Boolean)
+            .len(filter.len())
+            .add_buffer(new_mask)
+            .build();
+        let filter = BooleanArray::from(array_data);
+        // fully syntax, because we have an argument with the same name

Review comment:
       ```suggestion
           // fully qualified syntax, because we have an argument with the same name
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #296: fix invalid null handling in filter

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #296:
URL: https://github.com/apache/arrow-rs/pull/296#issuecomment-841826066


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#296](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b51d484) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3665296b4e985a840ad21815b839e5fa3fc462d5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3665296) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/296/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #296   +/-   ##
   =======================================
     Coverage   82.49%   82.50%           
   =======================================
     Files         162      162           
     Lines       43980    44002   +22     
   =======================================
   + Hits        36283    36305   +22     
     Misses       7697     7697           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/296/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.98% <100.00%> (+0.58%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3665296...b51d484](https://codecov.io/gh/apache/arrow-rs/pull/296?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org