You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/11 16:14:22 UTC
[GitHub] [arrow-datafusion] alamb opened a new issue, #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`
alamb opened a new issue, #2873:
URL: https://github.com/apache/arrow-datafusion/issues/2873
**Describe the bug**
For a `DictionaryArray` `col` evaluating an expression like
```sql
CASE
WHEN col IS NULL THEN ''
ELSE col
END
```
Generates an error:
```
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(InvalidArgumentError("arguments need to have the same data type"))', src/main.rs:45:82
```
**To Reproduce**
```rust
use std::sync::Arc;
use datafusion::arrow::datatypes::Int32Type;
use datafusion::prelude::*;
use datafusion::arrow::array::DictionaryArray;
use datafusion::datasource::MemTable;
use datafusion::logical_plan::{LogicalPlanBuilder, provider_as_source, when};
use datafusion::physical_plan::collect;
use datafusion::error::Result;
use datafusion::arrow::{self, record_batch::RecordBatch};
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
let host: DictionaryArray<Int32Type> = vec![Some("host1"), None, Some("host2")].into_iter().collect();
let batch = RecordBatch::try_from_iter(vec![
("host", Arc::new(host) as _),
]).unwrap();
let t = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();
let expr = when(col("host").is_null(), lit(""))
.otherwise(col("host"))
.unwrap();
let projection = None;
let builder = LogicalPlanBuilder::scan(
"cpu_load_short",
provider_as_source(Arc::new(t)),
projection
).unwrap();
let logical_plan = builder
.project(vec![expr])
.unwrap()
.build()
.unwrap();
// manually optimize the plan
let physical_plan = ctx.create_physical_plan(&logical_plan).await.unwrap();
let results: Vec<RecordBatch> = collect(physical_plan, ctx.task_ctx()).await.unwrap();
// format the results
println!("Results:\n\n{}", arrow::util::pretty::pretty_format_batches(&results).unwrap());
Ok(())
}
```
```
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(InvalidArgumentError("arguments need to have the same data type"))', src/main.rs:45:82
stack backtrace:
0: rust_begin_unwind
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
2: core::result::unwrap_failed
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/result.rs:1785:5
3: core::result::Result<T,E>::unwrap
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/result.rs:1078:23
4: rust_arrow_playground::main::{{closure}}
at ./src/main.rs:45:37
5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
6: tokio::park::thread::CachedParkThread::block_on::{{closure}}
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/park/thread.rs:263:54
7: tokio::coop::with_budget::{{closure}}
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:102:9
8: std::thread::local::LocalKey<T>::try_with
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:445:16
9: std::thread::local::LocalKey<T>::with
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:421:9
10: tokio::coop::with_budget
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:95:5
11: tokio::coop::budget
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:72:5
12: tokio::park::thread::CachedParkThread::block_on
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/park/thread.rs:263:31
13: tokio::runtime::enter::Enter::block_on
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/enter.rs:151:13
14: tokio::runtime::thread_pool::ThreadPool::block_on
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/mod.rs:90:9
15: tokio::runtime::Runtime::block_on
at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/mod.rs:482:43
16: rust_arrow_playground::main
at ./src/main.rs:49:5
17: core::ops::function::FnOnce::call_once
at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
**Expected behavior**
The test passes with this output:
```
+------------------------------------------------------------------------------------+
| CASE WHEN #cpu_load_short.host IS NULL THEN Utf8("") ELSE #cpu_load_short.host END |
+------------------------------------------------------------------------------------+
| host1 |
| |
| host2 |
+------------------------------------------------------------------------------------+
```
**Additional context**
This test used to pass. The last commit it passed was 57f47ab9230a9a12b3244191dcf1623f8b69fd61
It appears to fail starting of da392f4b3d77ad5fec0018a50146746a0efabac6 (aka came in via https://github.com/apache/arrow-datafusion/pull/2819) which makes sense given the change.
Found while debugging upgrade into IOx: https://github.com/influxdata/influxdb_iox/pull/5079
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2873:
URL: https://github.com/apache/arrow-datafusion/issues/2873#issuecomment-1180690791
The issue appear to be that the constant `''` is (correctly) cast to a `Dictionary(Int32, Utf8)` but then when converting it to a `ScalarValue` that optimization is lost. I have a fix and I will file a ticket for a better fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`
Posted by GitBox <gi...@apache.org>.
alamb closed issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`
URL: https://github.com/apache/arrow-datafusion/issues/2873
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org