You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/28 02:55:11 UTC

[GitHub] [arrow-datafusion] WinkerDu opened a new pull request, #2364: introduce `ScalarValue::Null` to df

WinkerDu opened a new pull request, #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2363 .
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   To solve Null constants issues listed in #1184 , and since [/apache/arrow-rs#1572](https://github.com/apache/arrow-rs/pull/1572 ) Null casted from and to most of types in arrow-rs kernel, it's reasonable that introduce Null type to df for type coercion.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   Introduce `ScalarValue::Null` type to df
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   No.
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1113993162

   This looks great @WinkerDu  -- thank you -- I am working to get arrow 13 -- see  https://github.com/apache/arrow-datafusion/pull/2382. It should only be a few more days now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2364: introduce `ScalarValue::Null` to df

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#discussion_r861280076


##########
Cargo.toml:
##########
@@ -38,3 +38,6 @@ exclude = ["datafusion-cli"]
 [profile.release]
 codegen-units = 1
 lto = true
+
+[patch.crates-io]

Review Comment:
   BTW arrow-rs 13.0.0 with these changes should be released sometime early next wee



##########
datafusion/common/src/scalar.rs:
##########
@@ -836,6 +844,10 @@ impl ScalarValue {
                     ScalarValue::iter_to_decimal_array(scalars, precision, scale)?;
                 Arc::new(decimal_array)
             }
+            DataType::Null => {
+                let null_array = ScalarValue::iter_to_null_array(scalars)?;

Review Comment:
   Can you use https://docs.rs/arrow/12.0.0/arrow/array/fn.new_null_array.html instead? 
   
   ```rust
   new_null_array(DataType::Null)
   ``` 
   



##########
datafusion/common/src/scalar.rs:
##########
@@ -39,6 +39,8 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc};
 /// This is the single-valued counter-part of arrow’s `Array`.
 #[derive(Clone)]
 pub enum ScalarValue {
+    /// represents null
+    Null,

Review Comment:
   πŸ‘ 



##########
datafusion/common/src/scalar.rs:
##########
@@ -325,6 +329,8 @@ impl std::hash::Hash for ScalarValue {
                 v.hash(state);
                 t.hash(state);
             }
+            // stable hash for Null value

Review Comment:
   πŸ‘ 



##########
datafusion/common/src/scalar.rs:
##########
@@ -170,6 +172,7 @@ impl PartialEq for ScalarValue {
             (IntervalMonthDayNano(_), _) => false,
             (Struct(v1, t1), Struct(v2, t2)) => v1.eq(v2) && t1.eq(t2),
             (Struct(_, _), _) => false,
+            (Null, _) => false,

Review Comment:
   I think we need to also add the postive case:
   
   ```suggestion
               (Null, Null) => true,
               (Null, _) => false,
   ```



##########
datafusion/common/src/scalar.rs:
##########
@@ -39,6 +39,8 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc};
 /// This is the single-valued counter-part of arrow’s `Array`.
 #[derive(Clone)]
 pub enum ScalarValue {
+    /// represents null

Review Comment:
   ```suggestion
       /// represents `DataType::Null` (castable to/from any other type)
   ```



##########
datafusion/common/src/scalar.rs:
##########
@@ -1522,6 +1550,7 @@ impl ScalarValue {
                 eq_array_primitive!(array, index, IntervalMonthDayNanoArray, val)
             }
             ScalarValue::Struct(_, _) => unimplemented!(),
+            ScalarValue::Null => false,

Review Comment:
   πŸ€”  



##########
datafusion/expr/src/binary_rule.rs:
##########
@@ -590,8 +590,30 @@ fn eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
     numerical_coercion(lhs_type, rhs_type)
         .or_else(|| dictionary_coercion(lhs_type, rhs_type))
         .or_else(|| temporal_coercion(lhs_type, rhs_type))
+        .or_else(|| null_coercion(lhs_type, rhs_type))
 }
 
+/// coercion rules from NULL type. Since NULL can be casted to most of types in arrow,
+/// either lhs or rhs is NULL, if NULL can be casted to type of the other side, the coecion is valid.
+fn null_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {

Review Comment:
   very cool



##########
datafusion/core/src/sql/planner.rs:
##########
@@ -1445,7 +1445,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                         SQLExpr::Value(Value::Number(n, _)) => parse_sql_number(&n),
                         SQLExpr::Value(Value::SingleQuotedString(s)) => Ok(lit(s)),
                         SQLExpr::Value(Value::Null) => {
-                            Ok(Expr::Literal(ScalarValue::Utf8(None)))
+                            Ok(Expr::Literal(ScalarValue::Null))

Review Comment:
   πŸŽ‰ 



##########
datafusion/physical-expr/src/expressions/nullif.rs:
##########
@@ -17,7 +17,7 @@
 
 use std::sync::Arc;
 
-use crate::expressions::binary::{eq_decimal, eq_decimal_scalar};
+use crate::expressions::binary::{eq_decimal, eq_decimal_scalar, eq_null};

Review Comment:
   is this needed?



##########
datafusion/common/src/scalar.rs:
##########
@@ -270,6 +273,7 @@ impl PartialOrd for ScalarValue {
                 }
             }
             (Struct(_, _), _) => None,
+            (Null, _) => None,

Review Comment:
   Two nulls are always equal I think
   
   ```suggestion
               (Null, Null) => Some(Equal),
               (Null, _) => None,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1119834678

   πŸŽ‰  we are going to have real null support 😍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#discussion_r864844502


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -121,25 +121,25 @@ async fn case_when_else_with_null_contant() -> Result<()> {
         FROM t1";
     let actual = execute_to_batches(&ctx, sql).await;
     let expected = vec![
-        "+----------------------------------------------------------------------------------------------+",
-        "| CASE WHEN #t1.c1 = Utf8(\"a\") THEN Int64(1) WHEN Utf8(NULL) THEN Int64(2) ELSE Int64(999) END |",
-        "+----------------------------------------------------------------------------------------------+",
-        "| 1                                                                                            |",
-        "| 999                                                                                          |",
-        "| 999                                                                                          |",
-        "| 999                                                                                          |",
-        "+----------------------------------------------------------------------------------------------+",
+        "+----------------------------------------------------------------------------------------+",
+        "| CASE WHEN #t1.c1 = Utf8(\"a\") THEN Int64(1) WHEN NULL THEN Int64(2) ELSE Int64(999) END |",

Review Comment:
   This is the key difference here. Very nice πŸ‘ 



##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -1336,6 +1360,28 @@ fn is_distinct_from_utf8<OffsetSize: StringOffsetSizeTrait>(
         .collect())
 }
 
+fn is_distinct_from_null(left: &NullArray, _right: &NullArray) -> Result<BooleanArray> {
+    let length = left.len();
+    make_boolean_array(length, false)
+}
+
+fn is_not_distinct_from_null(
+    left: &NullArray,
+    _right: &NullArray,
+) -> Result<BooleanArray> {
+    let length = left.len();
+    make_boolean_array(length, true)
+}
+
+pub fn eq_null(left: &NullArray, _right: &NullArray) -> Result<BooleanArray> {
+    let length = left.len();
+    make_boolean_array(length, false)
+}

Review Comment:
   I don't think this is correct -- specifically I think the resulting boolean array should be full of nulls not false
   
   Perhaps something like: 
   
   ```rust
       std::iter::repeat(left.len(), None).collect()
   ```



##########
datafusion/core/tests/sql/joins.rs:
##########
@@ -829,7 +829,13 @@ async fn inner_join_nulls() {
     let sql = "SELECT * FROM (SELECT null AS id1) t1
             INNER JOIN (SELECT null AS id2) t2 ON id1 = id2";
 
-    let expected = vec!["++", "++"];
+    let expected = vec![

Review Comment:
   This answer is not correct -- there should be no rows that match.
   
   This is because the join should produce rows where `id1 = id2` evaluates to `true` 
   
   However, `null = null` evaluates to `null` 🀯 
   
   Here is the query in postgres:
   
   ```sql
   alamb=# SELECT * FROM (SELECT null AS id1) t1
               INNER JOIN (SELECT null AS id2) t2 ON id1 = id2
   alamb-# ;
    id1 | id2 
   -----+-----
   (0 rows)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1116907464

   Since #2382 is merged, I switch this pr to `open` status for code reviewing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#discussion_r865389201


##########
datafusion/core/src/sql/planner.rs:
##########
@@ -1445,7 +1445,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                         SQLExpr::Value(Value::Number(n, _)) => parse_sql_number(&n),
                         SQLExpr::Value(Value::SingleQuotedString(s)) => Ok(lit(s)),
                         SQLExpr::Value(Value::Null) => {
-                            Ok(Expr::Literal(ScalarValue::Utf8(None)))
+                            Ok(Expr::Literal(ScalarValue::Null))

Review Comment:
   Well this explains a lot of odd type coercion errors I have been seeing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: introduce `ScalarValue::Null` to df

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1111719300

   cc @alamb @andygrove @yjshen @xudong963 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1113626308

   cc @alamb  PTAL, thank you ❀️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1114817283

   > This looks great @WinkerDu -- thank you -- I am working to get arrow 13 -- see #2382. It should only be a few more days now.
   
   Thanks @alamb look forward to it !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1120132022

   Thank you all @alamb @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#discussion_r867029985


##########
datafusion/core/tests/sql/joins.rs:
##########
@@ -829,7 +829,11 @@ async fn inner_join_nulls() {
     let sql = "SELECT * FROM (SELECT null AS id1) t1
             INNER JOIN (SELECT null AS id2) t2 ON id1 = id2";
 
-    let expected = vec!["++", "++"];
+    #[rustfmt::skip]

Review Comment:
   πŸ‘ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1118998134

   cc @alamb  PTAL, thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: introduce `ScalarValue::Null` to df

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1111693447

   For someone reviews this pr,
   since `arrow = { version = "12" }` doesn't contain patch [/apache/arrow-rs#1572](https://github.com/apache/arrow-rs/pull/1572), thanks for @alamb suggestion, I create `winkerdu/test_null_cast` in my private repo like:
   ```
   git checkout master -b winkerdu/test_null_cast
   git reset --hard dbc47e030c3038d40dbef39fbf3b39ae41f9e98a                 # prepare for version 12.0.0 release
   git cherry-pick b50cc737efa2a4f2b1e27c0f3da3bc0403c6a2b6
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
WinkerDu commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1116966804

   cc @alamb @andygrove @yjshen @xudong963 
   Please have a review, thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2364:
URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1117943448

   This is looking great. Thanks @WinkerDu! I don't have any additional comments beyond what @alamb has already raised


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org