You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/17 00:26:33 UTC

[GitHub] [arrow] velvia opened a new pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

velvia opened a new pull request #8688:
URL: https://github.com/apache/arrow/pull/8688


   This PR implements the NULLIF() SQL function in DataFusion.  It is implemented as a BuiltInScalarFunction, with a boolean kernel at the core which creates a new array with a modified null bitmap from the original array, based on the result of a boolean expression.   When an input data item is equal to the right side in NULLIF(), then the item's nullity becomes set in the output array.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528303267



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       You could probably avoid the special case in the first option by always using `bit_slice` on the data buffers, but calculating the offset based on `ArrowPrimitiveType::get_bit_width`. That and using `bit_slice` for the boolean and null bitmaps looks like the cleanest solution to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525618283



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +516,20 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        assert_eq!(15, res.value(0));
+        assert_eq!(true, res.is_null(1));
+        assert_eq!(true, res.is_null(2)); // comp true, slot 2 turned into null
+        assert_eq!(1, res.value(3));
+        // Even though comp array / right is null, should still pass through original value
+        assert_eq!(9, res.value(4));
+        assert_eq!(false, res.is_null(4)); // comp true, slot 2 turned into null

Review comment:
       So looks like the problem with the above is somehow the equality test fails:
   
   ```
   thread 'compute::kernels::boolean::tests::test_nullif_int_array' panicked at 'assertion failed: `(left == right)`
     left: `PrimitiveArray<Int32>
   [
     15,
     null,
     null,
     1,
     9,
   ]`,
    right: `PrimitiveArray<Int32>
   [
     15,
     null,
     null,
     1,
     9,
   ]`', arrow/src/compute/kernels/boolean.rs:537:9
   ```
   
   My guess is that the equality test compares the data() elements, and the data buffer elements for the null ones are not identical, even though it should not matter (null elements should just check the null bitmap).
   
   Some possibilities:
   - Create a different equality method, maybe a test Trait like `SemanticallyEqual` or something
   - revert back to the old way of testing these arrays
   ???
   @jorgecarleitao 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526473704



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       An offset is just a shift of where we consider reading a buffer. The easiest way to motivate it: suppose that we want to slice a primitive array of 100 elements over the range `[25..75]`. The expensive way of doing this is a `memcopy` of 50 elements. The cheap way is to declare a new array with len=50 elements starting at 25 and increase the refcount of buffer (`Arc::clone` in Rust).
   
   The "starting at 25" is an offset of 25 "slots", or `25 * size_of::<T>()` bytes for a specific type T.
   
   For bit buffers (nullability), the offset is measured in bits, not bytes. Feel free to ping e.g. @vertexclique, @alamb or @jhorstmann , that lately have been working (a lot) on bits - they likely know exactly the API you need for this. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530547648



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -223,6 +224,101 @@ pub fn is_not_null(input: &Array) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+// NOTE: For now this only supports Primitive Arrays.  Although the code could be made generic, the issue
+// is that currently the bitmap operations result in a final bitmap which is aligned to bit 0, and thus
+// the left array's data needs to be sliced to a new offset, and for non-primitive arrays shifting the
+// data might be too complicated.   In the future, to avoid shifting left array's data, we could instead
+// shift the final bitbuffer to the right, prepending with 0's instead.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,
+    right: &BooleanArray,
+) -> Result<PrimitiveArray<T>>
+where
+    T: ArrowNumericType,
+{
+    if left.len() != right.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform comparison operation on arrays of different length"
+                .to_string(),
+        ));
+    }
+    let left_data = left.data();
+    let right_data = right.data();
+
+    // If left has no bitmap, create a new one with all values set for nullity op later
+    // left=0 (null)   right=null       output bitmap=null
+    // left=0          right=1          output bitmap=null
+    // left=1 (set)    right=null       output bitmap=set   (passthrough)
+    // left=1          right=1 & comp=true    output bitmap=null
+    // left=1          right=1 & comp=false   output bitmap=set
+    //
+    // Thus: result = left null bitmap & (!right_values | !right_bitmap)
+    //              OR left null bitmap & !(right_values & right_bitmap)
+    //
+    // Do the right expression !(right_values & right_bitmap) first since there are two steps
+    // TRICK: convert BooleanArray buffer as a bitmap for faster operation
+    let right_combo_buffer = match right.data().null_bitmap() {
+        Some(right_bitmap) => {
+            // NOTE: right values and bitmaps are combined and stay at bit offset right.offset()
+            (&right.values() & &right_bitmap.bits).ok().map(|b| b.not())
+        }
+        None => Some(!&right.values()),
+    };
+
+    // AND of original left null bitmap with right expression
+    // Here we take care of the possible offsets of the left and right arrays all at once.
+    let modified_null_buffer = match left_data.null_bitmap() {
+        Some(left_null_bitmap) => match right_combo_buffer {
+            Some(rcb) => Some(buffer_bin_and(
+                &left_null_bitmap.bits,
+                left_data.offset(),
+                &rcb,
+                right_data.offset(),
+                left_data.len(),
+            )),
+            None => Some(
+                left_null_bitmap
+                    .bits
+                    .bit_slice(left_data.offset(), left.len()),
+            ),
+        },
+        None => right_combo_buffer,

Review comment:
       Ah yes, thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8688:
URL: https://github.com/apache/arrow/pull/8688


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530590352



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       Cleaned up.   :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530546823



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       Ok, let me see if I can improve that test or upload new data (like a subset of the original)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528249460



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @velvia In the current version you could use the `bitwise_bin_op_helper`from `buffer.rs` if you make that public. That calls your bitwise operation with two `u64` as an input and takes care of the offset alignment.
   
   Could you clarify what you mean by needing to "right shift"? The helper function takes the offset in bits so that should not be needed.
   
   In our internal codebase we have similar functions also for 3 or 4 inputs, which would come in handy for you usecase, since you could calculate the result without the temporary `right_combo_buffer`. So you could also think about copying that function and modifying to take a function of 3 parameters instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525610462



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -149,6 +150,64 @@ pub fn is_not_null(input: &ArrayRef) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,

Review comment:
       Hmmm, I'm trying this, but the return type becomes more complicated....    and as you pointed out, it doesn't help in the end. :-p




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528409836



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       > What is missing to construct the final nullif is the above combined bitmap but at offset 2, so it can be aligned with the original left data. I need to insert two bits in the LSB of the resulting bitmap so it is aligned with the original data. AFAICT there isn't anything that can do this in the current code.
   
   That's why we need to have bit-slice iterator views on the code, rather than copying slices like suggested before. Bit domain view exposed in the PR that I created can give you what you need by force aligning the msb. You can reuse the code everywhere, without `actually` copying anything.
   
   That was one of the ideas why I introduced that PR here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526498177



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       I don't see an API to shift bits forward, which I think would need to make this work....  (unless we make it work only when both offsets are the same, in which case it doesn't matter anyways)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r524889515



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -149,6 +150,64 @@ pub fn is_not_null(input: &ArrayRef) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,

Review comment:
       If I understood correctly, because we only use `left.data()`, if we replace `PrimitiveArray<T>` by `impl Array` and remove the generic `T`, this is generalized for all data types. :)
   
   Note that this will not address this in DataFusion, as it seems that we can only compute comparison boolean arrays from primitives atm.

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]
+    fn nullif_int32() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![Some(1), Some(2), None, None, Some(3), None, None, Some(4), Some(5)]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(2)));
+        let lit_array = literal_expr.evaluate(&batch)?;
+
+        let result = nullif_func(&[a.clone(), lit_array])?;
+
+        // Results should be: Some(1), None, None, None, Some(3), None
+        assert_eq!(result.len(), a_len);
+
+        let result = result
+            .as_any()
+            .downcast_ref::<Int32Array>()
+            .expect("failed to downcast to Int32Array");
+
+        assert_eq!(1, result.value(0));
+        assert_eq!(true, result.is_null(1));    // 2==2, slot 1 turned into null
+        assert_eq!(true, result.is_null(2));
+        assert_eq!(true, result.is_null(3));
+        assert_eq!(3, result.value(4));
+        assert_eq!(true, result.is_null(5));
+        assert_eq!(true, result.is_null(6));
+        assert_eq!(5, result.value(8));
+        Ok(())

Review comment:
       Same here: easier to debug if there is a single `assert_eq`, as we can see the whole array in the message.

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]

Review comment:
       why `skip`?

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]
+    fn nullif_int32() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![Some(1), Some(2), None, None, Some(3), None, None, Some(4), Some(5)]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(2)));
+        let lit_array = literal_expr.evaluate(&batch)?;
+
+        let result = nullif_func(&[a.clone(), lit_array])?;
+
+        // Results should be: Some(1), None, None, None, Some(3), None
+        assert_eq!(result.len(), a_len);
+
+        let result = result
+            .as_any()
+            .downcast_ref::<Int32Array>()
+            .expect("failed to downcast to Int32Array");
+
+        assert_eq!(1, result.value(0));
+        assert_eq!(true, result.is_null(1));    // 2==2, slot 1 turned into null
+        assert_eq!(true, result.is_null(2));
+        assert_eq!(true, result.is_null(3));
+        assert_eq!(3, result.value(4));
+        assert_eq!(true, result.is_null(5));
+        assert_eq!(true, result.is_null(6));
+        assert_eq!(5, result.value(8));
+        Ok(())
+    }
+
+    #[test]
+    #[rustfmt::skip]
+    // Ensure that arrays with no nulls can also invoke NULLIF() correctly
+    fn nullif_int32_nonulls() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![1, 3, 10, 7, 8, 1, 2, 4, 5]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(1)));
+        let lit_array = literal_expr.evaluate(&batch)?;

Review comment:
       I think that this can be simplified to `let lit_array = Int32Array::from(vec![1; a.len()]);`.

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]
+    fn nullif_int32() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![Some(1), Some(2), None, None, Some(3), None, None, Some(4), Some(5)]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(2)));
+        let lit_array = literal_expr.evaluate(&batch)?;
+
+        let result = nullif_func(&[a.clone(), lit_array])?;
+
+        // Results should be: Some(1), None, None, None, Some(3), None

Review comment:
       This suggests using `let expected = UInt32::from(vec![Some(1), None, None, None, Some(3), None]);`

##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       What do you think if we run against a controlled dataset instead of `aggregate_test_100`? I can't tell whether `expected` is correct. Wouldn't it make it easier if we use something like the `query_not` test is doing, on which it creates a temporary table and runs against that?

##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +516,20 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        assert_eq!(15, res.value(0));
+        assert_eq!(true, res.is_null(1));
+        assert_eq!(true, res.is_null(2)); // comp true, slot 2 turned into null
+        assert_eq!(1, res.value(3));
+        // Even though comp array / right is null, should still pass through original value
+        assert_eq!(9, res.value(4));
+        assert_eq!(false, res.is_null(4)); // comp true, slot 2 turned into null

Review comment:
       ```suggestion
           let expected = Int32Array::from(vec![
               Some(15),
               None,
               None, // comp true, slot 2 turned into null
               Some(1),
               // Even though comp array / right is null, should still pass through original value
               // comp true, slot 2 turned into null
               Some(9),
           ]);
           
           assert_eq!(expected, res)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-734975494


   @jorgecarleitao I've just gone through this, and don't have anything else to add. Can I merge it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525058248



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       Better to make these trait bounds with a good comment about how these are selected. I didn't understand:
   > /// The order of these types correspond to the order on which coercion applies
   
   Can you explain it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525618710



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]

Review comment:
       No real reason.... I've removed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-734971636


   How does it get merged from here?  Sorry unclear on the process, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526442585



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        let expected = Int32Array::from(vec![
+            Some(15),
+            None,
+            None, // comp true, slot 2 turned into null
+            Some(1),
+            // Even though comp array / right is null, should still pass through original value
+            // comp true, slot 2 turned into null
+            Some(9),
+        ]);
+
+        assert_array_eq::<Int32Type>(expected, Arc::new(res));

Review comment:
       Ah, never mind, I see what you mean that we are passing in `Some(left.len())`....   will see about how to derive the null count from the new bitmap.   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528282671



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @jhorstmann  Ok let me try to explain.
   
   The nullif function takes the original array, which let's say has offset=2.   It uses the result of another boolean array, which let's say has offset=3.   The nullif kernel must combine both the original nullif null bitmap at offset=2 with the boolean array values and null bitmaps (both at offset=3), and after combining, produce a new array with a combined bitmap at offset=2 (so we can produce a new array, with the same data as the original left array, but a modified nullity bitmap).
   
   Using `bitwise_bin_op_helper` (actually I can just use `bit_slice()` on original bitmaps, though it's a bit slower) takes care of the first part - basically it takes two bitmaps of different alignments, aligns them and does bit operations on 64-bit aligned words.  What I end up with is combined(left_bitmap, right_bitmap) aligned at offset 0.
   
   What is missing to construct the final nullif is the above combined bitmap but at offset 2, so it can be aligned with the original left data.   I need to insert two bits in the LSB of the resulting bitmap so it is aligned with the original data.   AFAICT there isn't anything that can do this in the current code.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530549403



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       it is more important to test the different cases here than have a large dataset. You are already providing excellent coverage with the other tests, this is more like an end-to-end test to verify the result in integration, so we do not need to cover many situations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-728504277


   @andygrove @nevi-me would love to hear your feedback on this....  this addition has been useful to us internally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526344874



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        let expected = Int32Array::from(vec![
+            Some(15),
+            None,
+            None, // comp true, slot 2 turned into null
+            Some(1),
+            // Even though comp array / right is null, should still pass through original value
+            // comp true, slot 2 turned into null
+            Some(9),
+        ]);
+
+        assert_array_eq::<Int32Type>(expected, Arc::new(res));

Review comment:
       How is the null_count derived?  Shouldn't it look at the nullity bitmap?  If it is actually checking that the data is zero, then we'd need to modify the array data as part of the nullif() implementation, which would be more expensive and possibly hard to do in a generic way, no?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525084383



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       That is interesting. I would be interested in knowing what is the issue with the current implementation and why type algebra definitions should be used instead. Could you first introduce a proposal with the design e.g. on a google docs, before the PR? In DataFusion we have been doing that for larger changes to avoiding committing to an implementation before some general agreement.

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       That is interesting. I would be interested in knowing what is the issue with the current implementation and why type algebra definitions should be used instead. Could you first introduce a proposal with the design e.g. on a google docs, before the PR? In DataFusion we have been doing that for larger changes to avoid committing to an implementation before some general agreement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-734979662


   @nevi-me , I went through it already. Thanks a lot!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525621811



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]
+    fn nullif_int32() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![Some(1), Some(2), None, None, Some(3), None, None, Some(4), Some(5)]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(2)));
+        let lit_array = literal_expr.evaluate(&batch)?;
+
+        let result = nullif_func(&[a.clone(), lit_array])?;
+
+        // Results should be: Some(1), None, None, None, Some(3), None
+        assert_eq!(result.len(), a_len);
+
+        let result = result
+            .as_any()
+            .downcast_ref::<Int32Array>()
+            .expect("failed to downcast to Int32Array");
+
+        assert_eq!(1, result.value(0));
+        assert_eq!(true, result.is_null(1));    // 2==2, slot 1 turned into null
+        assert_eq!(true, result.is_null(2));
+        assert_eq!(true, result.is_null(3));
+        assert_eq!(3, result.value(4));
+        assert_eq!(true, result.is_null(5));
+        assert_eq!(true, result.is_null(6));
+        assert_eq!(5, result.value(8));
+        Ok(())
+    }
+
+    #[test]
+    #[rustfmt::skip]
+    // Ensure that arrays with no nulls can also invoke NULLIF() correctly
+    fn nullif_int32_nonulls() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![1, 3, 10, 7, 8, 1, 2, 4, 5]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(1)));
+        let lit_array = literal_expr.evaluate(&batch)?;

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526346099



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       It's a good question @jorgecarleitao and thanks so much for digging into it.    I had found some mention of offsets but haven't completely grokked what was going on there, looks like will have to dive into it a little bit.
   
   I'll rebase and look into the data() discrepancy above and the details of the array equality comparison, and report back.  cheers and thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526511029



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       In my PR if you use `slicing` method and then write a method to align bit slices with smth along the lines:
   ```
           unsafe {
               let (_, data, _) = self.bit_slice.align_to::<u8>();
               let dest = data.as_slice();
            }
   ```
   you will get two aligned middle bit domains. then you can check bit slices however you want to.
   
   slicing method:
   https://github.com/apache/arrow/pull/8664/files#diff-715ddfbc534281523a73117d3bf4986d69d8a375dd320bc19f83298d3ba7ebd6R52
   
   Reference for that method in backing library: https://docs.rs/bitvec/0.19.4/bitvec/slice/struct.BitSlice.html#method.align_to
   
   If you want subslice of bits from the given slice, for example for this thing that you said:
   > After the operation, "right shift" the resulting bitmap so it is based on left_offset again (since the resulting data is based on left array)
   
   You can do `.copy_from_bitslice` and copy into correct position in the middle domain.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526624296



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @velvia, take a look at `bitwise_unary_op_helper` on `src/buffer.rs`. I think that inside it, there is functionality relevant to this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-728568531


   https://issues.apache.org/jira/browse/ARROW-10330


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526508679



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       To give more details on the issue:  we have basically a left_bitmap and a right_bitmap.  The left_bitmap is at left_offset_bits bits, and the right bitmap is at right_offset_bits.
   
   If left_offset_bits == right_offset_bits, then everything is easy, the resulting bitmap in fact does not need to be shifted.
   
   If left_offset_bits != right_offset_bits, one can do:
   * Use bit_slice() on left_bitmap and on right_bitmap, the result is 0-based and can be operated on
   * After the operation, "right shift" the resulting bitmap so it is based on left_offset again (since the resulting data is based on left array)
   
   An optimization of the above is to shift right_bitmap so it lines up with left_bitmap.
   
   I see `bit_slice()`, but I don't see anything in the Buffer API to "right shift".....
   
   @vertexclique @alamb any ideas?  Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526498177



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       I don't see an API to shift bits forward, which I think would need to make this work....
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526496812



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       I think I figured it out....   though an example would be great.  Basically, to do bitmap/buffer ops on offsets, you have to use like `bit_slice()` to get a normalized map (based on 0), then shift it back.   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-734998072


   Thanks everyone!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525621539



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +516,20 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        assert_eq!(15, res.value(0));
+        assert_eq!(true, res.is_null(1));
+        assert_eq!(true, res.is_null(2)); // comp true, slot 2 turned into null
+        assert_eq!(1, res.value(3));
+        // Even though comp array / right is null, should still pass through original value
+        assert_eq!(9, res.value(4));
+        assert_eq!(false, res.is_null(4)); // comp true, slot 2 turned into null

Review comment:
       Actually I'm going to borrow `assert_array_eq` from the expressions.rs test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-734972509


   @velvia , just needs time from a committer: typically over the weekend I run through all PRs in order of appearenace and merge them if ready. Thanks a for the patience.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525084383



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       That is interesting. I would be interested in knowing what is the issue with the current implementation and why type algebra definitions should be used instead. Could first introduce a proposal with the design e.g. on a google docs, before the PR? In DataFusion we have been doing that for larger changes to avoiding committing to an implementation before some general agreement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525066482



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       > Better to make these trait bounds with a good comment about how these are selected
   
   AFAIK these cannot be trait bounds because logical and physical planning is dynamically typed. 
   
   In this case, this is enumerating all valid types that can be (dynamically) passed to the function. If someone tries to call this function with e.g. a `ListArray`, the logical planner will error with a description that this function does not support that type.
   
   The order here matters because when a function is planned to be called with type `X` that is not supported by the function, the physical planner will try to (lossless) cast that type to a valid type for that functions, and it does so in the order of this array. In general these should be ordered from less informative to most informative, as to guarantee that the execution uses the smallest supported type (when casting is required).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528052210



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @vertexclique ... I had a quick look at the PR but slicing() method was not defined in it.  Is it from bitvec crate or something?
   
   @jorgecarleitao well `bitwise_unary_op_helper` (and `bitwise_bin_op_helper` also) both use `bit_chunks()` which basically gives you an iterator of 64-bit chunks starting at offset.... essentially "left-shifting" into 0 position.
   
   I need a reverse operation, which is take 64-bit chunks and right shift them.  It can be kind of simulated by "pre-pending" 64 0-bits into a buffer, then using bit_chunks() again to get a shift into whatever alignment is needed, and finally writing out the final buffer, though all of that is slightly convoluted.
   
   I'm wondering if it's worth trying that out, or just wait for @vertexclique 's PR.
   
   Actually, what's better than shift-shift-zip-operate-rshift would be simply to shift the right bitmap into the left's position, then use a mask to operate on both of them.  That reduces the number of shifts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525818191



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        let expected = Int32Array::from(vec![
+            Some(15),
+            None,
+            None, // comp true, slot 2 turned into null
+            Some(1),
+            // Even though comp array / right is null, should still pass through original value
+            // comp true, slot 2 turned into null
+            Some(9),
+        ]);
+
+        assert_array_eq::<Int32Type>(expected, Arc::new(res));

Review comment:
       Thanks for your changes, @velvia 
   
   If `assert_eq!` is complaining, then we need to investigate why, because that is our canonical way of proving that two arrays are equal.
   
   I think that the issue that `assert_eq` is flagging is that the function `nullif` is returning an arrow array that is out of spec: the `null_count` is equal to the number of elements (due to `Some(left.len()))`), but the actual null count of that array is 2. The array equality takes into account that number (something that `assert_array_eq` does not).
   
   For reference, the way I use to check for these is to debug print the data:
   
   ```rust
   println!("{:?}", expected.data());
   println!("{:?}", result.data());
   ```
   

##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       I wrote the following test with arrays with an offset, as they are typical sources of errors, but I was unable to make it pass (even after fixing the null count):
   
   ```rust
       #[test]
       fn test_nullif_int_array_offset() {
           let a = Int32Array::from(vec![None, Some(15), Some(8), Some(1), Some(9)]);
           let a = a.slice(1, 3); // Some(15), Some(8), Some(1)
           let comp =
               BooleanArray::from(vec![Some(false), Some(false), Some(false), None, Some(true), Some(false), None]);
           let comp = comp.slice(2, 3); // Some(false), None, Some(true)
           let comp = comp.as_any().downcast_ref::<BooleanArray>().unwrap();
           let res = nullif(a.as_ref(), comp).unwrap();
   
           let expected = Arc::new(Int32Array::from(vec![
               Some(15), // False => keep it
               Some(8), // None => keep it
               None, // true => None
           ])) as ArrayRef;
           assert_eq!(&expected, &res)
       }
   ```
   
   Do you know what happens when the arrays have offsets?
   

##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -149,6 +150,64 @@ pub fn is_not_null(input: &ArrayRef) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,

Review comment:
       I think that it would help because on the `arrow` crate this would be a complete implementation. There are people that use `arrow` (but not DataFusion), that would still benefit.
   
   The following would work (I tested it locally :)), if you have the energy:
   
   ```rust
   pub fn nullif(
       left: &impl Array,
       right: &BooleanArray,
   ) -> Result<ArrayRef>
   ...
   
       // Construct new array with same values but modified null bitmap
       let data = ArrayData::new(
           left.data_type().clone(),
           left.len(),
           None, // Note how I am using `None` here, to make `new` compute the number.
           modified_null_buffer,
           left.offset(),
           left_data.buffers().to_vec(),
           left_data.child_data().to_vec(),
       );
       Ok(make_array(Arc::new(data)))
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528249460



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @velvia In the current version you could use the `bitwise_bin_op_helper`from `buffer.rs` if you make that public. That calls your bitwise operation with two `u64` as an input and takes care of the offset alignment.
   
   In our internal codebase we have similar functions also for 3 or 4 inputs, which would come in handy for you usecase, since you could calculate the result without the temporary `right_combo_buffer`. So you could also think about copying that function and modifying to take a function of 3 parameters instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525076153



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       Oh, I see, in our private project at work, I have used type algebra definitions to not do these. For now, this can go like how it is, but later I can open a type algebra pr to convert all these to castability.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530107076



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       Difficult to tell what it is happening here.⬆️




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525621412



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -2604,6 +2678,61 @@ mod tests {
         )
     }
 
+    #[test]
+    #[rustfmt::skip]
+    fn nullif_int32() -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);
+        let a = Int32Array::from(vec![Some(1), Some(2), None, None, Some(3), None, None, Some(4), Some(5)]);
+        let a = Arc::new(a);
+        let a_len = a.len();
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![a.clone()])?;
+
+        let literal_expr = lit(ScalarValue::Int32(Some(2)));
+        let lit_array = literal_expr.evaluate(&batch)?;
+
+        let result = nullif_func(&[a.clone(), lit_array])?;
+
+        // Results should be: Some(1), None, None, None, Some(3), None
+        assert_eq!(result.len(), a_len);
+
+        let result = result
+            .as_any()
+            .downcast_ref::<Int32Array>()
+            .expect("failed to downcast to Int32Array");
+
+        assert_eq!(1, result.value(0));
+        assert_eq!(true, result.is_null(1));    // 2==2, slot 1 turned into null
+        assert_eq!(true, result.is_null(2));
+        assert_eq!(true, result.is_null(3));
+        assert_eq!(3, result.value(4));
+        assert_eq!(true, result.is_null(5));
+        assert_eq!(true, result.is_null(6));
+        assert_eq!(5, result.value(8));
+        Ok(())

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528302736



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       @velvia Thanks, I got it now. It was already quite late when I looked at this yesterday :)
   
   Reusing the input array including its offset is a tricky problem, it's also not always beneficial from a performance standpoint, the offset might start at a much larger number and you probably do not want to create a null buffer with that much unused space in front. I see two possible solutions then:
   
   - Use the `buffer.slice(offset_bytes)` method for also getting a data slice that can then be used with an offset of 0. Requires a special case for primitive type boolean, for which you'd need to use `bit_slice(offset_in_bits)`.
   
   - Create a copy of the input data if the offset is not 0.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r531184824



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -149,6 +150,64 @@ pub fn is_not_null(input: &ArrayRef) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,

Review comment:
       @jorgecarleitao see my comment about why I can't make this more generic for now....  so I'm marking this as resolved.  However everything else should be addressed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526511029



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       In my PR if you use `slicing` method and then write a method to align bit slices with smth along the lines:
   ```
           unsafe {
               let (_, data, _) = self.bit_slice.align_to::<u8>();
               let dest = data.as_slice();
            }
   ```
   you will get two aligned middle bit domains. then you can check bit slices however you want to.
   
   slicing method:
   https://github.com/apache/arrow/pull/8664/files#diff-715ddfbc534281523a73117d3bf4986d69d8a375dd320bc19f83298d3ba7ebd6R52




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528960800



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       Thanks everyone for their comments.  
   
   It seems to me that taking a slice of the data buffer could potentially be even more expensive than shifting the null buffer, if one is not able to take a cheap slice clone (ie bits are not aligned) -  but perhaps this is not common since most types are 8/16/32 bits wide.   It is easier to implement for now, so I will try @jhorstmann 's suggestion using `bit_slice()` on the data buffers - would rather get in something that works sooner and iterate later when @vertexclique 's PR is merged.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526511029



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       In my PR if you use `slicing` method and then write a method to align bit slices with smth along the lines:
   ```
           unsafe {
               let (_, data, _) = self.bit_slice.align_to::<u8>();
               let dest = data.as_slice();
            }
   ```
   you will get two aligned middle bit domains. then you can check bit slices however you want to.
   
   slicing method:
   https://github.com/apache/arrow/pull/8664/files#diff-715ddfbc534281523a73117d3bf4986d69d8a375dd320bc19f83298d3ba7ebd6R52
   
   Reference for that method: https://docs.rs/bitvec/0.19.4/bitvec/slice/struct.BitSlice.html#method.align_to

##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       In my PR if you use `slicing` method and then write a method to align bit slices with smth along the lines:
   ```
           unsafe {
               let (_, data, _) = self.bit_slice.align_to::<u8>();
               let dest = data.as_slice();
            }
   ```
   you will get two aligned middle bit domains. then you can check bit slices however you want to.
   
   slicing method:
   https://github.com/apache/arrow/pull/8664/files#diff-715ddfbc534281523a73117d3bf4986d69d8a375dd320bc19f83298d3ba7ebd6R52
   
   Reference for that method in backing library: https://docs.rs/bitvec/0.19.4/bitvec/slice/struct.BitSlice.html#method.align_to




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525085824



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       Oh definitely will do, give me some time to wrap my head up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525066482



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       > Better to make these trait bounds with a good comment about how these are selected
   
   AFAIK these cannot be trait bounds because logical and physical planning is dynamically typed. 
   
   In this case, this is enumerating all valid types that can be (dynamically) passed to the function. If someone tries to call this function with e.g. a `ListArray`, the logical planner will error with a description that this function does not support that type.
   
   The order here matters because when a function is planned to be called with type `X` that is not supported by the function, the physical planner will try to (lossless) cast that type to a valid type for that functions, and it does so in the order of this array. In general these should be ordered from fastest to slowest (in the eyes of the implementation), so that the cast chooses the type with the fastest implementation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530215752



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -223,6 +224,101 @@ pub fn is_not_null(input: &Array) -> Result<BooleanArray> {
     Ok(BooleanArray::from(Arc::new(data)))
 }
 
+/// Copies original array, setting null bit to true if a secondary comparison boolean array is set to true.
+/// Typically used to implement NULLIF.
+// NOTE: For now this only supports Primitive Arrays.  Although the code could be made generic, the issue
+// is that currently the bitmap operations result in a final bitmap which is aligned to bit 0, and thus
+// the left array's data needs to be sliced to a new offset, and for non-primitive arrays shifting the
+// data might be too complicated.   In the future, to avoid shifting left array's data, we could instead
+// shift the final bitbuffer to the right, prepending with 0's instead.
+pub fn nullif<T>(
+    left: &PrimitiveArray<T>,
+    right: &BooleanArray,
+) -> Result<PrimitiveArray<T>>
+where
+    T: ArrowNumericType,
+{
+    if left.len() != right.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform comparison operation on arrays of different length"
+                .to_string(),
+        ));
+    }
+    let left_data = left.data();
+    let right_data = right.data();
+
+    // If left has no bitmap, create a new one with all values set for nullity op later
+    // left=0 (null)   right=null       output bitmap=null
+    // left=0          right=1          output bitmap=null
+    // left=1 (set)    right=null       output bitmap=set   (passthrough)
+    // left=1          right=1 & comp=true    output bitmap=null
+    // left=1          right=1 & comp=false   output bitmap=set
+    //
+    // Thus: result = left null bitmap & (!right_values | !right_bitmap)
+    //              OR left null bitmap & !(right_values & right_bitmap)
+    //
+    // Do the right expression !(right_values & right_bitmap) first since there are two steps
+    // TRICK: convert BooleanArray buffer as a bitmap for faster operation
+    let right_combo_buffer = match right.data().null_bitmap() {
+        Some(right_bitmap) => {
+            // NOTE: right values and bitmaps are combined and stay at bit offset right.offset()
+            (&right.values() & &right_bitmap.bits).ok().map(|b| b.not())
+        }
+        None => Some(!&right.values()),
+    };
+
+    // AND of original left null bitmap with right expression
+    // Here we take care of the possible offsets of the left and right arrays all at once.
+    let modified_null_buffer = match left_data.null_bitmap() {
+        Some(left_null_bitmap) => match right_combo_buffer {
+            Some(rcb) => Some(buffer_bin_and(
+                &left_null_bitmap.bits,
+                left_data.offset(),
+                &rcb,
+                right_data.offset(),
+                left_data.len(),
+            )),
+            None => Some(
+                left_null_bitmap
+                    .bits
+                    .bit_slice(left_data.offset(), left.len()),
+            ),
+        },
+        None => right_combo_buffer,

Review comment:
       I think the right side also needs to be sliced because in the ArrayData later it will be accessed with an offset of 0. It could be more efficient to do the slicing before computing the `!(values & bits)`, but it probably would not be a big difference.
   
   Otherwise this looks good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r525066482



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1385,6 +1385,80 @@ pub fn binary(
     Ok(Arc::new(BinaryExpr::new(l, op, r)))
 }
 
+/// Invoke a compute kernel on a primitive array and a Boolean Array
+macro_rules! compute_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
+        let ll = $LEFT
+            .as_any()
+            .downcast_ref::<$DT>()
+            .expect("compute_op failed to downcast array");
+        let rr = $RIGHT
+            .as_any()
+            .downcast_ref::<BooleanArray>()
+            .expect("compute_op failed to downcast array");
+        Ok(Arc::new($OP(&ll, &rr)?))
+    }};
+}
+
+/// Binary op between primitive and boolean arrays
+macro_rules! primitive_bool_array_op {
+    ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
+        match $LEFT.data_type() {
+            DataType::Int8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int8Array),
+            DataType::Int16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int16Array),
+            DataType::Int32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int32Array),
+            DataType::Int64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Int64Array),
+            DataType::UInt8 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt8Array),
+            DataType::UInt16 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt16Array),
+            DataType::UInt32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt32Array),
+            DataType::UInt64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, UInt64Array),
+            DataType::Float32 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float32Array),
+            DataType::Float64 => compute_bool_array_op!($LEFT, $RIGHT, $OP, Float64Array),
+            other => Err(DataFusionError::Internal(format!(
+                "Unsupported data type {:?} for NULLIF/primitive/boolean operator",
+                other
+            ))),
+        }
+    }};
+}
+
+///
+/// Implements NULLIF(expr1, expr2)
+/// Args: 0 - left expr is any array
+///       1 - if the left is equal to this expr2, then the result is NULL, otherwise left value is passed.
+///
+pub fn nullif_func(args: &[ArrayRef]) -> Result<ArrayRef> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "{:?} args were supplied but NULLIF takes exactly two args",
+            args.len(),
+        )));
+    }
+
+    // Get args0 == args1 evaluated and produce a boolean array
+    let cond_array = binary_array_op!(args[0], args[1], eq)?;
+
+    // Now, invoke nullif on the result
+    primitive_bool_array_op!(args[0], *cond_array, nullif)
+}
+
+/// Currently supported types by the nullif function.
+/// The order of these types correspond to the order on which coercion applies
+/// This should thus be from least informative to most informative
+pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[
+    DataType::Boolean,
+    DataType::UInt8,
+    DataType::UInt16,
+    DataType::UInt32,
+    DataType::UInt64,
+    DataType::Int8,
+    DataType::Int16,
+    DataType::Int32,
+    DataType::Int64,
+    DataType::Float32,
+    DataType::Float64,
+];

Review comment:
       > Better to make these trait bounds with a good comment about how these are selected
   
   AFAIK these cannot be trait bounds because logical and physical planning are dynamically typed. 
   
   In this case, this is enumerating all valid types that can be (dynamically) passed to the function. If someone tries to call this function with e.g. a `ListArray`, the logical planner will error with a description that this function does not support that type.
   
   The order here matters because when a function is planned to be called with type `X` that is not supported by the function, the physical planner will try to (lossless) cast that type to a valid type for that functions, and it does so in the order of this array. In general these should be ordered from less informative to most informative, as to guarantee that the execution uses the smallest supported type (when casting is required).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r528960800



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+

Review comment:
       Thanks everyone for their comments.  
   
   It seems to me that taking a slice of the data buffer would be even more expensive than shifting the null buffer?  However it is easier to implement for now, so I will try @jhorstmann 's suggestion using `bit_slice()` on the data buffers - would rather get in something that works sooner and iterate later when @vertexclique 's PR is merged.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r526462964



##########
File path: rust/arrow/src/compute/kernels/boolean.rs
##########
@@ -457,4 +517,42 @@ mod tests {
         assert_eq!(true, res.value(2));
         assert_eq!(false, res.value(3));
     }
+
+    fn assert_array_eq<T: ArrowNumericType>(
+        expected: PrimitiveArray<T>,
+        actual: ArrayRef,
+    ) {
+        let actual = actual
+            .as_any()
+            .downcast_ref::<PrimitiveArray<T>>()
+            .expect("Actual array should unwrap to type of expected array");
+
+        for i in 0..expected.len() {
+            if expected.is_null(i) {
+                assert!(actual.is_null(i));
+            } else {
+                assert_eq!(expected.value(i), actual.value(i));
+            }
+        }
+    }
+
+    #[test]
+    fn test_nullif_int_array() {
+        let a = Int32Array::from(vec![Some(15), None, Some(8), Some(1), Some(9)]);
+        let comp =
+            BooleanArray::from(vec![Some(false), None, Some(true), Some(false), None]);
+        let res = nullif(&a, &comp).unwrap();
+
+        let expected = Int32Array::from(vec![
+            Some(15),
+            None,
+            None, // comp true, slot 2 turned into null
+            Some(1),
+            // Even though comp array / right is null, should still pass through original value
+            // comp true, slot 2 turned into null
+            Some(9),
+        ]);
+
+        assert_array_eq::<Int32Type>(expected, Arc::new(res));

Review comment:
       The null_count is fixed... so the test can use `assert_eq!`...  now onto the offsets, which is slightly trickier




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#discussion_r530549403



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -508,6 +508,26 @@ async fn csv_query_avg_multi_batch() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn csv_query_nullif_divide_by_0() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx)?;
+    let sql = "SELECT c8/nullif(c7, 0) FROM aggregate_test_100";
+    let actual: Vec<_> = execute(&mut ctx, sql)
+        .await
+        .iter()
+        .map(|x| x[0].clone())
+        .collect();
+    let actual = actual.join("\n");
+    let expected = "1722\n92\n46\n679\n165\n146\n149\n93\n2211\n6495\n307\n139\n253\n123\n21\n84\n98\n13\n230\n\
+       277\n1\n986\n414\n144\n210\n0\n172\n165\n25\n97\n335\n558\n350\n369\n511\n245\n345\n8\n139\n55\n318\n2614\n\
+       1792\n16\n345\n123\n176\n1171\n20\n199\n147\n115\n335\n23\n847\n94\n315\n391\n176\n282\n459\n197\n978\n281\n\
+       27\n26\n281\n8124\n3\n430\n510\n61\n67\n17\n1601\n362\n202\n50\n10\n346\n258\n664\nNULL\n22\n164\n448\n365\n\
+       1640\n671\n203\n2087\n10060\n1015\n913\n9840\n16\n496\n264\n38\n1";
+    assert_eq!(expected, actual);
+    Ok(())
+}

Review comment:
       it is more important to test the main cases here than have a large dataset. You are already providing excellent coverage with the other tests, this is more like an end-to-end test to verify the result in integration, so we do not need to cover many situations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] velvia commented on pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

Posted by GitBox <gi...@apache.org>.
velvia commented on pull request #8688:
URL: https://github.com/apache/arrow/pull/8688#issuecomment-732451343


   @jorgecarleitao and others: 
   - All outstanding issues should be resolved now, including the nonzero array offsets
   - Rebased to latest master as of today
   - I did not make the nullif kernel more generic, the reason is that I took @vertexclique 's suggestion to slice the left array data buffer, which allows us to work with the shifted/0-based null combined bitmaps.  The result of this is that we must assume that the data buffers are primitive, otherwise shifting/slicing the data buffers might not work for non primitive types or be more complex.  Let me know what you think.  
   The alternative is to re-shift the combined bitmaps back to `left.offset()`, which might have space implications, but would be more generic (would not require any change of the left array's data buffers), and would have to wait for the other PR as well most likely....  I personally would prefer not to wait   :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org