You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/26 20:38:00 UTC
[jira] [Updated] (ARROW-11799) [Rust] String and Binary arrays
created with incorrect length from unbound iterator
[ https://issues.apache.org/jira/browse/ARROW-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-11799:
-----------------------------------
Labels: pull-request-available (was: )
> [Rust] String and Binary arrays created with incorrect length from unbound iterator
> -----------------------------------------------------------------------------------
>
> Key: ARROW-11799
> URL: https://issues.apache.org/jira/browse/ARROW-11799
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Affects Versions: 3.0.0
> Reporter: Yordan Pavlov
> Assignee: Yordan Pavlov
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.
> Here is a simple example:
> ```
> // iterator that doesn't declare (upper) size bound
> let string_iter = (0..).scan(0usize, |pos, i| {
> if *pos < 10 {
> *pos += 1;
> Some(Some(format!("value {}", i)))
> }
> else {
> // actually returns up to 10 values
> None
> }
> })
> // limited using take()
> .take(100);
> let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
> assert_eq!(lower_size_bound, 0);
> // the upper bound, defined by take above, is 100
> assert_eq!(upper_size_bound, Some(100));
> let string_array: StringArray = string_iter.collect();
> // but the actual number of items in the array is 10
> assert_eq!(string_array.len(), 10);
> ```
> Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)