You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/11 16:14:22 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`

alamb opened a new issue, #2873:
URL: https://github.com/apache/arrow-datafusion/issues/2873

   **Describe the bug**
   For a `DictionaryArray` `col` evaluating an expression like
   
   ```sql
   CASE 
     WHEN col IS NULL THEN '' 
     ELSE col
   END
   ```
   
   Generates an error:
   
   
   ```
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(InvalidArgumentError("arguments need to have the same data type"))', src/main.rs:45:82
   ```
   
   
   **To Reproduce**
   
   
   ```rust
   use std::sync::Arc;
   
   use datafusion::arrow::datatypes::Int32Type;
   use datafusion::prelude::*;
   use datafusion::arrow::array::DictionaryArray;
   use datafusion::datasource::MemTable;
   use datafusion::logical_plan::{LogicalPlanBuilder, provider_as_source, when};
   use datafusion::physical_plan::collect;
   use datafusion::error::Result;
   use datafusion::arrow::{self, record_batch::RecordBatch};
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
   
       let host: DictionaryArray<Int32Type> = vec![Some("host1"), None, Some("host2")].into_iter().collect();
   
       let batch = RecordBatch::try_from_iter(vec![
           ("host", Arc::new(host) as _),
       ]).unwrap();
   
       let t = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();
   
   
       let expr = when(col("host").is_null(), lit(""))
           .otherwise(col("host"))
           .unwrap();
   
       let projection = None;
       let builder = LogicalPlanBuilder::scan(
           "cpu_load_short",
           provider_as_source(Arc::new(t)),
           projection
       ).unwrap();
   
   
       let logical_plan = builder
           .project(vec![expr])
           .unwrap()
           .build()
           .unwrap();
   
       // manually optimize the plan
       let physical_plan = ctx.create_physical_plan(&logical_plan).await.unwrap();
       let results: Vec<RecordBatch> = collect(physical_plan, ctx.task_ctx()).await.unwrap();
   
       // format the results
       println!("Results:\n\n{}", arrow::util::pretty::pretty_format_batches(&results).unwrap());
       Ok(())
   }
   
   ```
   
   
   ```
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(InvalidArgumentError("arguments need to have the same data type"))', src/main.rs:45:82
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
      1: core::panicking::panic_fmt
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
      2: core::result::unwrap_failed
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/result.rs:1785:5
      3: core::result::Result<T,E>::unwrap
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/result.rs:1078:23
      4: rust_arrow_playground::main::{{closure}}
                at ./src/main.rs:45:37
      5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
      6: tokio::park::thread::CachedParkThread::block_on::{{closure}}
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/park/thread.rs:263:54
      7: tokio::coop::with_budget::{{closure}}
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:102:9
      8: std::thread::local::LocalKey<T>::try_with
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:445:16
      9: std::thread::local::LocalKey<T>::with
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:421:9
     10: tokio::coop::with_budget
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:95:5
     11: tokio::coop::budget
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:72:5
     12: tokio::park::thread::CachedParkThread::block_on
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/park/thread.rs:263:31
     13: tokio::runtime::enter::Enter::block_on
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/enter.rs:151:13
     14: tokio::runtime::thread_pool::ThreadPool::block_on
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/mod.rs:90:9
     15: tokio::runtime::Runtime::block_on
                at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/mod.rs:482:43
     16: rust_arrow_playground::main
                at ./src/main.rs:49:5
     17: core::ops::function::FnOnce::call_once
                at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
   
   ```
   
   **Expected behavior**
   
   The test passes with this output:
   
   ```
   +------------------------------------------------------------------------------------+
   | CASE WHEN #cpu_load_short.host IS NULL THEN Utf8("") ELSE #cpu_load_short.host END |
   +------------------------------------------------------------------------------------+
   | host1                                                                              |
   |                                                                                    |
   | host2                                                                              |
   +------------------------------------------------------------------------------------+
   ```
   
   
   
   **Additional context**
   
   This test used to pass. The last commit it passed was 57f47ab9230a9a12b3244191dcf1623f8b69fd61
   
   
   It appears to fail starting of da392f4b3d77ad5fec0018a50146746a0efabac6 (aka came in via https://github.com/apache/arrow-datafusion/pull/2819) which makes sense given the change.
   
   
   Found while debugging upgrade into IOx: https://github.com/influxdata/influxdb_iox/pull/5079
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2873:
URL: https://github.com/apache/arrow-datafusion/issues/2873#issuecomment-1180690791

   The issue appear to be that the constant `''` is (correctly) cast to a `Dictionary(Int32, Utf8)` but then when converting it to a `ScalarValue` that optimization is lost.  I have a fix and I will file a ticket for a better fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`

Posted by GitBox <gi...@apache.org>.
alamb closed issue #2873: Error with CASE and DictionaryArrays: `ArrowError(InvalidArgumentError("arguments need to have the same data type"))`
URL: https://github.com/apache/arrow-datafusion/issues/2873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org