You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/29 04:58:16 UTC
[GitHub] [arrow-datafusion] AssHero opened a new pull request, #2810: use non-null then result data type as return type for CaseExpr
AssHero opened a new pull request, #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810
# Which issue does this PR close?
Closes #2798
# Rationale for this change
use non-null then/else expr 's data type as return data type for CaseExpr.
For this query: select case when b is null then null else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;
We use data type of else expr(b) as return data type.
For this query: select case when b is null then null when b >= 3 then b + 1 else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;
We use data type of then expr(b + 1 ) as return data type.
# What changes are included in this PR?
modify data_type function of CaseExpr in datafusion/physical-expr/src/expressions/case.rs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910507550
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -228,7 +238,23 @@ impl PhysicalExpr for CaseExpr {
}
fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
- self.when_then_expr[0].1.data_type(input_schema)
+ // since all then results have the same data type, we can choose any one as the
+ // return data type except for the null.
+ let mut data_type = DataType::Null;
+ for i in 0..self.when_then_expr.len() {
+ data_type = self.when_then_expr[i].1.data_type(input_schema)?;
+ if !data_type.equals_datatype(&DataType::Null) {
+ break;
+ }
+ }
+ // if all then results are null, we use data type of else expr instead if possible.
Review Comment:
if they are all null, the output type is also null I would think.
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -138,7 +138,12 @@ impl CaseExpr {
let then_value = self.when_then_expr[i]
.1
.evaluate_selection(batch, &when_match)?;
- let then_value = then_value.into_array(batch.num_rows());
+ let then_value = match then_value {
+ ColumnarValue::Scalar(value) if value.is_null() => {
+ new_null_array(&return_type, batch.num_rows())
+ }
Review Comment:
I don't understand why this code is needed -- I would have expected that `then_value.into_array()` would have worked
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #2810: Fix data type calculation for `CaseExpr` s with `NULLs`
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#issuecomment-1170621950
Thanks again @AssHero -- keep the PRs rolling π
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910506837
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -117,7 +117,7 @@ impl CaseExpr {
/// [ELSE result]
/// END
fn case_when_with_expr(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
- let return_type = self.when_then_expr[0].1.data_type(&batch.schema())?;
+ let return_type = self.data_type(&*batch.schema())?;
Review Comment:
π
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2810: use non-null then result data type as return type for CaseExpr
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#issuecomment-1169559340
# [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#2810](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (38e472d) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/839a61896f0fc3f617a79da1dd4018b2aa6af283?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (839a618) will **increase** coverage by `0.01%`.
> The diff coverage is `100.00%`.
```diff
@@ Coverage Diff @@
## master #2810 +/- ##
==========================================
+ Coverage 85.20% 85.22% +0.01%
==========================================
Files 274 274
Lines 48666 48701 +35
==========================================
+ Hits 41468 41506 +38
+ Misses 7198 7195 -3
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Ξ | |
|---|---|---|
| [datafusion/core/tests/sql/expr.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3NxbC9leHByLnJz) | `99.83% <100.00%> (+<0.01%)` | :arrow_up: |
| [datafusion/physical-expr/src/expressions/case.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9waHlzaWNhbC1leHByL3NyYy9leHByZXNzaW9ucy9jYXNlLnJz) | `92.08% <100.00%> (+0.76%)` | :arrow_up: |
| [datafusion/core/src/physical\_plan/metrics/value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL21ldHJpY3MvdmFsdWUucnM=) | `86.93% <0.00%> (-0.51%)` | :arrow_down: |
| [datafusion/common/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb21tb24vc3JjL3NjYWxhci5ycw==) | `74.94% <0.00%> (+0.11%)` | :arrow_up: |
| [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `74.31% <0.00%> (+0.19%)` | :arrow_up: |
| [datafusion/expr/src/expr\_schema.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2810/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9leHByX3NjaGVtYS5ycw==) | `69.59% <0.00%> (+0.67%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [839a618...38e472d](https://codecov.io/gh/apache/arrow-datafusion/pull/2810?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2810: use non-null then result data type as return type for CaseExpr
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810#discussion_r910509404
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -138,7 +138,12 @@ impl CaseExpr {
let then_value = self.when_then_expr[i]
.1
.evaluate_selection(batch, &when_match)?;
- let then_value = then_value.into_array(batch.num_rows());
+ let then_value = match then_value {
+ ColumnarValue::Scalar(value) if value.is_null() => {
+ new_null_array(&return_type, batch.num_rows())
+ }
Review Comment:
Update: I removed it and the tests errored π€£
```
---- sql::expr::case_expr_with_null stdout ----
thread 'sql::expr::case_expr_with_null' panicked at 'called `Result::unwrap()` on an `Err` value: "ArrowError(InvalidArgumentError(\"arguments need to have the same data type\")) at Executing physical plan for 'select case when b is null then null else b end from (select a,b from (values (1,null),(2,3)) as t (a,b)) a;': ProjectionExec { expr: [(CaseExpr { expr: None, when_then_expr: [(IsNullExpr { arg: Column { name: \"b\", index: 0 } }, Literal { value: NULL })], else_expr: Some(Column { name: \"b\", index: 0 }) }, \"CASE WHEN #a.b IS NULL THEN NULL ELSE #a.b END\")], schema: Schema { fields: [Field { name: \"CASE WHEN #a.b IS NULL THEN NULL ELSE #a.b END\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"b\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {
} }, input: ProjectionExec { expr: [(Column { name: \"b\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column2\", index: 0 }, \"b\")], schema: Schema { fields: [Field { name: \"b\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column2\", index: 1 }, \"column2\")], schema: Schema { fields: [Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ValuesExec { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata:
{} }, data: [RecordBatch { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int64>\n[\n 1,\n 2,\n], PrimitiveArray<Int64>\n[\n null,\n 3,\n]], row_count: 2 }] }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }"', datafusion/core/tests/sql/mod.rs:641:10
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb merged pull request #2810: Fix data type calculation for `CaseExpr` s with `NULLs`
Posted by GitBox <gi...@apache.org>.
alamb merged PR #2810:
URL: https://github.com/apache/arrow-datafusion/pull/2810
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org