You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Morgan Cassels (Jira)" <ji...@apache.org> on 2021/06/18 17:07:00 UTC
[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple
batches from parquet with string list column
[ https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Morgan Cassels updated ARROW-13120:
-----------------------------------
Description:
This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns.
{code:java}
#[test]
fn failing_test() {
let parquet_file_reader = get_test_reader("test.parquet");
let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
let mut record_batches = Vec::new();
let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
for batch in record_batch_reader {
record_batches.push(batch);
}
}
{code}
{code:java}
---- arrow::arrow_reader::tests::failing_test stdout ----
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
was:
This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns.
```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
letmutrecord_batches = Vec::new();
letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
---- arrow::arrow_reader::tests::failing_test stdout ----
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
> [Rust][Parquet] Cannot read multiple batches from parquet with string list column
> ---------------------------------------------------------------------------------
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Morgan Cassels
> Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns.
>
>
> {code:java}
> #[test]
> fn failing_test() {
> let parquet_file_reader = get_test_reader("test.parquet");
> let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
> let mut record_batches = Vec::new();
> let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
> for batch in record_batch_reader {
> record_batches.push(batch);
> }
> }
> {code}
>
> {code:java}
> ---- arrow::arrow_reader::tests::failing_test stdout ----
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)