You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Gabor Szadovszky (JIRA)" <ji...@apache.org> on 2018/09/07 11:57:00 UTC

[jira] [Created] (PARQUET-1414) Limit page size based on maximum row count

Gabor Szadovszky created PARQUET-1414:
-----------------------------------------

             Summary: Limit page size based on maximum row count
                 Key: PARQUET-1414
                 URL: https://issues.apache.org/jira/browse/PARQUET-1414
             Project: Parquet
          Issue Type: Improvement
            Reporter: Gabor Szadovszky
            Assignee: Gabor Szadovszky


For column index based filtering it is important to have enough pages for a column. In case of a perfectly matching encoding for the suitable data it can happen that all of the values can be encoded in one page (e.g. a column of an ascending counter).

With this improvement we would be able to limit the pages by the maximum number of rows to be written in it so we would have enough pages for every column. A good default value should be benchmarked. For initial, we can use 10k.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)