You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/07/16 21:53:00 UTC

[jira] [Resolved] (ARROW-11889) [C++] Add parallelism to streaming CSV reader

     [ https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace resolved ARROW-11889.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 10568
[https://github.com/apache/arrow/pull/10568]

> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
>                 Key: ARROW-11889
>                 URL: https://issues.apache.org/jira/browse/ARROW-11889
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently the streaming CSV reader does not allow for much parallelism.  It doesn't allow for reading more than one segment at once (useful in S3) and it doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios although it's possible this is mostly mitigated in cases where there are many more files than cores (as per-file parallelism will occupy all the cores anyways).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)