You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-8599) [C++][Parquet] Optional parallel processing when writing Parquet files

     [ https://issues.apache.org/jira/browse/ARROW-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-8599:
----------------------------------

    Assignee:     (was: Weston Pace)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++][Parquet] Optional parallel processing when writing Parquet files
> ----------------------------------------------------------------------
>
>                 Key: ARROW-8599
>                 URL: https://issues.apache.org/jira/browse/ARROW-8599
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> If we permit encoded columns in row groups to be buffered in memory rather than immediately written out to the {{OutputStream}}, then we can use multiple threads for the encoding / compression of columns. Combined with a separate thread to take the encoded columns and write them out to disk, this should yield substantially improved file write times.
> This could be enabled through an option since it would increase memory use when writing. The memory use can be somewhat constrained by limiting the size of row groups



--
This message was sent by Atlassian Jira
(v8.20.10#820010)