You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Sietse Brouwer (Jira)" <ji...@apache.org> on 2020/09/30 20:52:00 UTC
[jira] [Updated] (ARROW-6774) [Rust] Reading parquet file is slow
[ https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sietse Brouwer updated ARROW-6774:
----------------------------------
Attachment: data.py
> [Rust] Reading parquet file is slow
> -----------------------------------
>
> Key: ARROW-6774
> URL: https://issues.apache.org/jira/browse/ARROW-6774
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Affects Versions: 0.15.0
> Reporter: Adam Lippai
> Priority: Major
> Attachments: data.py, main.rs
>
>
> Using the example at [https://github.com/apache/arrow/tree/master/rust/parquet] is slow.
> The snippet
> {code:none}
> let reader = SerializedFileReader::new(file).unwrap();
> let mut iter = reader.get_row_iter(None).unwrap();
> let start = Instant::now();
> while let Some(record) = iter.next() {}
> let duration = start.elapsed();
> println!("{:?}", duration);
> {code}
> Runs for 17sec for a ~160MB parquet file.
> If there is a more effective way to load a parquet file, it would be nice to add it to the readme.
> P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)