You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/09 09:52:00 UTC

[jira] [Commented] (PARQUET-869) Min/Max record counts for block size checks are not configurable

    [ https://issues.apache.org/jira/browse/PARQUET-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192753#comment-17192753 ] 

ASF GitHub Bot commented on PARQUET-869:
----------------------------------------

panthony edited a comment on pull request #470:
URL: https://github.com/apache/parquet-mr/pull/470#issuecomment-682457362


   Same here, we have 1 or 2 columns that can vary widely in size (few Kbs up to 10Mb) and we often stumble upon an OutOfMemory error because it didn't check the buffered rows in time.
   
   Being able to adjust the checks frequency would be a huge help 👍 
   
   I have a [rebased branch](https://github.com/cogniteev/parquet-mr/tree/PARQUET-869-configurable-row-group-min-max-record-check) against master if anyone interested


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Min/Max record counts for block size checks are not configurable
> ----------------------------------------------------------------
>
>                 Key: PARQUET-869
>                 URL: https://issues.apache.org/jira/browse/PARQUET-869
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Pradeep Gollakota
>            Priority: Major
>
> While the min/max record counts for page size check are configurable via ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK and ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK configs and via ParquetProperties directly, the min/max record counts for block size check are hard coded inside InternalParquetRecordWriter.
> These two settings should also be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)