You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/29 04:58:16 UTC

[GitHub] [arrow-datafusion] AssHero opened a new pull request, #2810: use non-null then result data type as return type for CaseExpr

AssHero opened a new pull request, #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810

   # Which issue does this PR close?
   
   Closes #2798 
   
   # Rationale for this change
   use non-null then/else expr 's data type as return data type for CaseExpr.
   
   For this query:  select case when b is null then null else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;
   
   We use data type of else expr(b) as return data type.
   
   For this query:  select case when b is null then null when b >= 3 then b + 1 else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;
   
   We use data type of then expr(b + 1 ) as return data type.
   
   # What changes are included in this PR?
   modify data_type function of CaseExpr in datafusion/physical-expr/src/expressions/case.rs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910507550


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -228,7 +238,23 @@ impl PhysicalExpr for CaseExpr {
     }
 
     fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
-        self.when_then_expr[0].1.data_type(input_schema)
+        // since all then results have the same data type, we can choose any one as the
+        // return data type except for the null.
+        let mut data_type = DataType::Null;
+        for i in 0..self.when_then_expr.len() {
+            data_type = self.when_then_expr[i].1.data_type(input_schema)?;
+            if !data_type.equals_datatype(&DataType::Null) {
+                break;
+            }
+        }
+        // if all then results are null, we use data type of else expr instead if possible.

Review Comment:
   if they are all null, the output type is also null I would think. 



##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -138,7 +138,12 @@ impl CaseExpr {
             let then_value = self.when_then_expr[i]
                 .1
                 .evaluate_selection(batch, &when_match)?;
-            let then_value = then_value.into_array(batch.num_rows());
+            let then_value = match then_value {
+                ColumnarValue::Scalar(value) if value.is_null() => {
+                    new_null_array(&return_type, batch.num_rows())
+                }

Review Comment:
   I don't understand why this code is needed -- I would have expected that `then_value.into_array()` would have worked 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2810: Fix data type calculation for `CaseExpr` s with `NULLs`

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#issuecomment-1170621950

   Thanks again @AssHero  -- keep the PRs rolling πŸš‹ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910506837


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -117,7 +117,7 @@ impl CaseExpr {
     ///     [ELSE result]
     /// END
     fn case_when_with_expr(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
-        let return_type = self.when_then_expr[0].1.data_type(&batch.schema())?;
+        let return_type = self.data_type(&*batch.schema())?;

Review Comment:
   πŸ‘ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2810: use non-null then result data type as return type for CaseExpr

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#issuecomment-1169559340

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2810](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (38e472d) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/839a61896f0fc3f617a79da1dd4018b2aa6af283?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (839a618) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2810      +/-   ##
   ==========================================
   + Coverage   85.20%   85.22%   +0.01%     
   ==========================================
     Files         274      274              
     Lines       48666    48701      +35     
   ==========================================
   + Hits        41468    41506      +38     
   + Misses       7198     7195       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Ξ” | |
   |---|---|---|
   | [datafusion/core/tests/sql/expr.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3NxbC9leHByLnJz) | `99.83% <100.00%> (+<0.01%)` | :arrow_up: |
   | [datafusion/physical-expr/src/expressions/case.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9waHlzaWNhbC1leHByL3NyYy9leHByZXNzaW9ucy9jYXNlLnJz) | `92.08% <100.00%> (+0.76%)` | :arrow_up: |
   | [datafusion/core/src/physical\_plan/metrics/value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL21ldHJpY3MvdmFsdWUucnM=) | `86.93% <0.00%> (-0.51%)` | :arrow_down: |
   | [datafusion/common/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb21tb24vc3JjL3NjYWxhci5ycw==) | `74.94% <0.00%> (+0.11%)` | :arrow_up: |
   | [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `74.31% <0.00%> (+0.19%)` | :arrow_up: |
   | [datafusion/expr/src/expr\_schema.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9leHByX3NjaGVtYS5ycw==) | `69.59% <0.00%> (+0.67%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [839a618...38e472d](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910509404


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -138,7 +138,12 @@ impl CaseExpr {
             let then_value = self.when_then_expr[i]
                 .1
                 .evaluate_selection(batch, &when_match)?;
-            let then_value = then_value.into_array(batch.num_rows());
+            let then_value = match then_value {
+                ColumnarValue::Scalar(value) if value.is_null() => {
+                    new_null_array(&return_type, batch.num_rows())
+                }

Review Comment:
   Update: I removed it and the tests errored 🀣 
   
   ```
   ---- sql::expr::case_expr_with_null stdout ----
   thread 'sql::expr::case_expr_with_null' panicked at 'called `Result::unwrap()` on an `Err` value: "ArrowError(InvalidArgumentError(\"arguments need to have the same data type\")) at Executing physical plan for 'select case when b is null then null else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;': ProjectionExec { expr: [(CaseExpr { expr: None, when_then_expr: [(IsNullExpr { arg: Column { name: \"b\", index: 0 } }, Literal { value: NULL })], else_expr: Some(Column { name: \"b\", index: 0 }) }, \"CASE WHEN #a.b IS NULL THEN NULL ELSE #a.b END\")], schema: Schema { fields: [Field { name: \"CASE WHEN #a.b IS NULL THEN NULL ELSE #a.b END\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"b\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {
 } }, input: ProjectionExec { expr: [(Column { name: \"b\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column2\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column2\", index: 1 }, \"column2\")], schema: Schema { fields: [Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ValuesExec { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: 
 {} }, data: [RecordBatch { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int64>\n[\n  1,\n  2,\n], PrimitiveArray<Int64>\n[\n  null,\n  3,\n]], row_count: 2 }] }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }"', datafusion/core/tests/sql/mod.rs:641:10
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2810: Fix data type calculation for `CaseExpr` s with `NULLs`

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org