You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Remi Dettai (Jira)" <ji...@apache.org> on 2020/04/15 17:08:00 UTC

[jira] [Commented] (ARROW-7681) [Rust] Explicitly seeking a BufReader will discard the internal buffer

    [ https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084251#comment-17084251 ] 

Remi Dettai commented on ARROW-7681:
------------------------------------

I've proposed a fix in [https://github.com/apache/arrow/pull/6949] that uses a modified version of BufReader instead of nightly methods.

 

I've tested it on a large parquet of mine:
 * on fast disk, both versions take ~30s to read the column
 * on a slow mount, the the old version takes ~160s and the fixed one still takes ~30s (still CPU bounded)

> [Rust] Explicitly seeking a BufReader will discard the internal buffer
> ----------------------------------------------------------------------
>
>                 Key: ARROW-7681
>                 URL: https://issues.apache.org/jira/browse/ARROW-7681
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>            Reporter: Max Burke
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This behavior was observed in the Parquet Rust file reader (parquet/src/util/io.rs).
>  
> Pull request: [https://github.com/apache/arrow/pull/6280]
>  
> From the Rust documentation for BufReader:
>  
> "Seeking always discards the internal buffer, even if the seek position would otherwise fall within it. This guarantees that calling {{.into_inner()}} immediately after a seek yields the underlying reader at the same position."
>  
> [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)