You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Remi Dettai (Jira)" <ji...@apache.org> on 2020/04/15 17:08:00 UTC
[jira] [Commented] (ARROW-7681) [Rust] Explicitly seeking a
BufReader will discard the internal buffer
[ https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084251#comment-17084251 ]
Remi Dettai commented on ARROW-7681:
------------------------------------
I've proposed a fix in [https://github.com/apache/arrow/pull/6949] that uses a modified version of BufReader instead of nightly methods.
I've tested it on a large parquet of mine:
* on fast disk, both versions take ~30s to read the column
* on a slow mount, the the old version takes ~160s and the fixed one still takes ~30s (still CPU bounded)
> [Rust] Explicitly seeking a BufReader will discard the internal buffer
> ----------------------------------------------------------------------
>
> Key: ARROW-7681
> URL: https://issues.apache.org/jira/browse/ARROW-7681
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Max Burke
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> This behavior was observed in the Parquet Rust file reader (parquet/src/util/io.rs).
>
> Pull request: [https://github.com/apache/arrow/pull/6280]
>
> From the Rust documentation for BufReader:
>
> "Seeking always discards the internal buffer, even if the seek position would otherwise fall within it. This guarantees that calling {{.into_inner()}} immediately after a seek yields the underlying reader at the same position."
>
> [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)