You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/05/27 15:58:01 UTC

[jira] [Updated] (PARQUET-1634) [C++] Factor out data/dictionary page writes to allow for page buffering

     [ https://issues.apache.org/jira/browse/PARQUET-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou updated PARQUET-1634:
------------------------------------
    Fix Version/s:     (was: cpp-4.0.0)
                   cpp-5.0.0

> [C++] Factor out data/dictionary page writes to allow for page buffering 
> -------------------------------------------------------------------------
>
>                 Key: PARQUET-1634
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1634
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: cpp-5.0.0
>
>
> Logic that eagerly writes out data pages is hard-coded into the column writer implementation
> https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L565
> For higher-latency file systems like Amazon S3, it makes more sense to buffer pages in memory and write them in larger batches (and preferably asynchronously). We should refactor this logic so we have the ability to choose rather than have the behavior hard-coded



--
This message was sent by Atlassian Jira
(v8.3.4#803005)