You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/03/06 01:30:00 UTC

[jira] [Created] (ARROW-11889) [C++] Add parallelism to streaming CSV reader

Weston Pace created ARROW-11889:
-----------------------------------

             Summary: [C++] Add parallelism to streaming CSV reader
                 Key: ARROW-11889
                 URL: https://issues.apache.org/jira/browse/ARROW-11889
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Currently the streaming CSV reader does not allow for much parallelism.  It doesn't allow for reading more than one segment at once (useful in S3) and it doesn't allow for column fan-out for parsing & converting.

It seems both of these options would speed up CSV reading in some scenarios although it's possible this is mostly mitigated in cases where there are many more files than cores (as per-file parallelism will occupy all the cores anyways).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)