You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/03/25 18:34:00 UTC

[jira] [Updated] (ARROW-12090) [C++] Expose CSV I/O readahead as a read option

     [ https://issues.apache.org/jira/browse/ARROW-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace updated ARROW-12090:
--------------------------------
    Summary: [C++] Expose CSV I/O readahead as a read option  (was: [C++] Expose CSV block level readahead as a read option)

> [C++] Expose CSV I/O readahead as a read option
> -----------------------------------------------
>
>                 Key: ARROW-12090
>                 URL: https://issues.apache.org/jira/browse/ARROW-12090
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Minor
>
> All of the CSV readers today base their I/O readahead on the parallelism of the executor (or 2 for the serial reader).  This is a reasonable default if the I/O is homogeneous but better values could presumably be used for some situations.
> For example, if most files are buffered in RAM (and the reader is CPU bound for these files) but some files are not, then you would want the readahead to be large enough to read the unbuffered files while the CPU bound work is being done (assuming you are even lucky enough for things to be scheduled in that way)
> This isn't likely to be much benefit in most situations though and it does add yet another option so I'm not really motivated to do this work until such a situation arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)