You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/08/06 21:39:00 UTC
[jira] [Created] (PARQUET-1634) [C++] Factor out data/dictionary
page writes to allow for page buffering
Wes McKinney created PARQUET-1634:
-------------------------------------
Summary: [C++] Factor out data/dictionary page writes to allow for page buffering
Key: PARQUET-1634
URL: https://issues.apache.org/jira/browse/PARQUET-1634
Project: Parquet
Issue Type: Improvement
Components: parquet-cpp
Reporter: Wes McKinney
Fix For: cpp-1.6.0
Logic that eagerly writes out data pages is hard-coded into the column writer implementation
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L565
For higher-latency file systems like Amazon S3, it makes more sense to buffer pages in memory and write them in larger batches (and preferably asynchronously). We should refactor this logic so we have the ability to choose rather than have the behavior hard-coded
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)