You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/08/06 21:39:00 UTC

[jira] [Created] (PARQUET-1634) [C++] Factor out data/dictionary page writes to allow for page buffering

Wes McKinney created PARQUET-1634:
-------------------------------------

             Summary: [C++] Factor out data/dictionary page writes to allow for page buffering 
                 Key: PARQUET-1634
                 URL: https://issues.apache.org/jira/browse/PARQUET-1634
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cpp
            Reporter: Wes McKinney
             Fix For: cpp-1.6.0


Logic that eagerly writes out data pages is hard-coded into the column writer implementation

https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L565

For higher-latency file systems like Amazon S3, it makes more sense to buffer pages in memory and write them in larger batches (and preferably asynchronously). We should refactor this logic so we have the ability to choose rather than have the behavior hard-coded



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)