You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/30 16:02:04 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue, #2653: `ScalarValue::to_array_of_size` panics computing statistics for nested parquet file

tustvold opened a new issue, #2653:
URL: https://github.com/apache/arrow-datafusion/issues/2653

   **Describe the bug**
   
   ```
   let ctx = SessionContext::new();
   
   let mut options = ParquetReadOptions::default()
       .parquet_pruning(true)
       .to_listing_options(2);
   
   // Disable stats collection
   options.collect_stat = true;
   
   ctx.register_listing_table("patient", "/home/raphael/Downloads/part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet", options, None).await.unwrap();
   
   let df = ctx.sql("SELECT patient.meta FROM patient LIMIT 10").await.unwrap();
   df.show().await.unwrap();
   ```
   
   Where part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet is the [parquet file](https://github.com/apache/arrow-datafusion/files/8626500/part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet.zip) provided by @kesavkolla in https://github.com/apache/arrow-datafusion/issues/2439
   
   Panics with
   
   ```
   called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))
   thread 'physical_plan::file_format::parquet::tests::temp' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))', datafusion/common/src/scalar.rs:1206:18
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
      1: core::panicking::panic_fmt
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
      2: core::result::unwrap_failed
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
      3: core::result::Result<T,E>::unwrap
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
      4: datafusion_common::scalar::ScalarValue::to_array_of_size
                at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1198:22
      5: datafusion_common::scalar::ScalarValue::to_array_of_size::{{closure}}
                at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1253:45
      6: core::iter::adapters::map::map_fold::{{closure}}
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:84:28
      7: core::iter::traits::iterator::Iterator::fold
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:2362:21
      8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:124:9
      9: core::iter::traits::iterator::Iterator::for_each
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:779:9
     10: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_extend.rs:40:17
     11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter_nested.rs:62:9
     12: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter.rs:33:9
     13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/mod.rs:2554:9
     14: core::iter::traits::iterator::Iterator::collect
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:1784:9
     15: datafusion_common::scalar::ScalarValue::to_array_of_size
                at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1248:48
     16: datafusion_common::scalar::ScalarValue::to_array
                at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:658:9
     17: datafusion::datasource::get_statistics_with_limit::{{closure}}
                at ./src/datasource/mod.rs:75:56
     18: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
     19: datafusion::datasource::listing::table::ListingTable::list_files_for_scan::{{closure}}
                at ./src/datasource/listing/table.rs:394:67
     20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
     21: <datafusion::datasource::listing::table::ListingTable as datafusion::datasource::datasource::TableProvider>::scan::{{closure}}
                at ./src/datasource/listing/table.rs:310:53
     22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
     23: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
     24: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
                at ./src/physical_plan/planner.rs:392:64
     25: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
     26: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
     27: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
                at ./src/physical_plan/planner.rs:623:84
     28: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
     29: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
     30: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
   ```
   
   Setting `options.collect_stat = false` eliminates the panic
   
   **Expected behavior**
   
   The above should not panic
   
   **Additional context**
   
   Follow on for https://github.com/apache/arrow-datafusion/issues/2453 which is fixed by https://github.com/apache/arrow-datafusion/pull/2631
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #2653: `ScalarValue::to_array_of_size` panics computing statistics for nested parquet file

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2653:
URL: https://github.com/apache/arrow-datafusion/issues/2653#issuecomment-1145789429

   Huzzah, can confirm :tada:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] AssHero commented on issue #2653: `ScalarValue::to_array_of_size` panics computing statistics for nested parquet file

Posted by GitBox <gi...@apache.org>.
AssHero commented on issue #2653:
URL: https://github.com/apache/arrow-datafusion/issues/2653#issuecomment-1145623700

   I think the merge request #2671 already fix this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed issue #2653: `ScalarValue::to_array_of_size` panics computing statistics for nested parquet file

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #2653: `ScalarValue::to_array_of_size` panics computing statistics for nested parquet file
URL: https://github.com/apache/arrow-datafusion/issues/2653


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org