You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "nileema shingte (JIRA)" <ji...@apache.org> on 2019/07/15 21:39:00 UTC

[jira] [Comment Edited] (PARQUET-1022) [C++] Append mode in parquet-cpp

    [ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885634#comment-16885634 ] 

nileema shingte edited comment on PARQUET-1022 at 7/15/19 9:38 PM:
-------------------------------------------------------------------

[~xhochy] [~wesmckinn] - Ability to concat the parquet files is something we've wanted for some time too. When we generate parquet files partitioned by an expression, we often end up with tiny files and would like to add a post-processing step to concat these files together.

Is there a plan to add this ability to the library any time soon? 

If not, it would be great if someone can provide a somewhat detailed pseudocode (expanding on what [~xhochy]) mentioned in the comment above) as a guideline for conditions/scenarios that need to be handled with extra care, so we can contribute this as a PR. 


was (Author: nileema):
[~xhochy] [~wesmckinn] - Ability to concat the parquet files is something we've wanted for some time too. When we generate parquet files partitioned by an expression, we often end up with tiny files and would like to add a post-processing step to concat these files together.

Is there a plan to add this ability to the library any time soon? 

If not, it would be great if someone can provide a somewhat details pseudocode (expanding on what [~xhochy]) mentioned in the comment above) as a guideline for conditions/scenarios that need to be handled with extra care, so we can contribute this as a PR. 

> [C++] Append mode in parquet-cpp
> --------------------------------
>
>                 Key: PARQUET-1022
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1022
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp
>    Affects Versions: cpp-1.1.0
>            Reporter: yugu
>            Assignee: Wes McKinney
>            Priority: Major
>
> As said, currently trying to work out a append feature for parquet files in c++.
> (been searching through repo etc, can't find example tho..)
> Current solution is to (assume no schema changes that is):
> Read in metadata
> Change metadata based on appended rows+ original rows
> Append a new row group (or multiple row group writer)
> Write the new rows.
> ---
> The problem is that, is approached this way, the original last row group may not be complete filled. Was wondering if there is a fix or I'm using the api wrong...
> Thanks ! : D



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)