You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Daniel Weeks (JIRA)" <ji...@apache.org> on 2015/02/09 20:36:34 UTC
[jira] [Commented] (PARQUET-177) MemoryManager ensure minimum
Column Chunk size
[ https://issues.apache.org/jira/browse/PARQUET-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312708#comment-14312708 ]
Daniel Weeks commented on PARQUET-177:
--------------------------------------
Based on review comments, the limit is not based on row group size, but on an estimated minimum column chunk size. This will help ensure that a "reasonable" amount of data will be written per column (default is page size).
> MemoryManager ensure minimum Column Chunk size
> ----------------------------------------------
>
> Key: PARQUET-177
> URL: https://issues.apache.org/jira/browse/PARQUET-177
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.6.0rc2
> Reporter: Daniel Weeks
> Assignee: Daniel Weeks
> Priority: Minor
> Fix For: parquet-mr_1.6.0
>
>
> The memory manager currently has no limit to how small it will make row groups. This is problematic because jobs that have a large number of writers can result in tiny row groups that hurt performance.
> The following patch will allow a configurable minimum size before killing the job. Default is currently no limit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)