You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/26 20:38:00 UTC

[jira] [Updated] (ARROW-11799) [Rust] String and Binary arrays created with incorrect length from unbound iterator

     [ https://issues.apache.org/jira/browse/ARROW-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-11799:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Rust] String and Binary arrays created with incorrect length from unbound iterator
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-11799
>                 URL: https://issues.apache.org/jira/browse/ARROW-11799
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>    Affects Versions: 3.0.0
>            Reporter: Yordan Pavlov
>            Assignee: Yordan Pavlov
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.
> Here is a simple example:
> ```
>  // iterator that doesn't declare (upper) size bound
>         let string_iter = (0..).scan(0usize, |pos, i| { 
>             if *pos < 10 {
>                 *pos += 1;
>                 Some(Some(format!("value {}", i)))
>             }
>             else {
>                 // actually returns up to 10 values
>                 None
>             }
>         })
>         // limited using take()
>         .take(100);
>         let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
>         assert_eq!(lower_size_bound, 0);
>         // the upper bound, defined by take above, is 100
>         assert_eq!(upper_size_bound, Some(100));
>         let string_array: StringArray = string_iter.collect();
>         // but the actual number of items in the array is 10
>         assert_eq!(string_array.len(), 10);
> ```
> Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)