You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/12/24 18:10:00 UTC

[jira] [Updated] (ARROW-11012) [Rust] [DataFusion] Make write_csv and write_parquet concurrent

     [ https://issues.apache.org/jira/browse/ARROW-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Grove updated ARROW-11012:
-------------------------------
    Fix Version/s: 3.0.0

> [Rust] [DataFusion] Make write_csv and write_parquet concurrent
> ---------------------------------------------------------------
>
>                 Key: ARROW-11012
>                 URL: https://issues.apache.org/jira/browse/ARROW-11012
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>             Fix For: 3.0.0
>
>
> ExecutionContext.write_csv and write_parquet currently iterate over the output partitions and execute one at a time and write the results out. We should run these as tokio tasks so they can run concurrently. This should, in theory, help with memory usage when the plan contains repartition operators.
> We may want to add a configuration option so we can choose between serial and parallel writes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)