You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Daniel Weeks (JIRA)" <ji...@apache.org> on 2015/02/09 20:36:34 UTC

[jira] [Commented] (PARQUET-177) MemoryManager ensure minimum Column Chunk size

    [ https://issues.apache.org/jira/browse/PARQUET-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312708#comment-14312708 ] 

Daniel Weeks commented on PARQUET-177:
--------------------------------------

Based on review comments, the limit is not based on row group size, but on an estimated minimum column chunk size.  This will help ensure that a "reasonable" amount of data will be written per column (default is page size).

> MemoryManager ensure minimum Column Chunk size
> ----------------------------------------------
>
>                 Key: PARQUET-177
>                 URL: https://issues.apache.org/jira/browse/PARQUET-177
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.6.0rc2
>            Reporter: Daniel Weeks
>            Assignee: Daniel Weeks
>            Priority: Minor
>             Fix For: parquet-mr_1.6.0
>
>
> The memory manager currently has no limit to how small it will make row groups.  This is problematic because jobs that have a large number of writers can result in tiny row groups that hurt performance.
> The following patch will allow a configurable minimum size before killing the job.  Default is currently no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)