You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Zoltan Ivanfi (JIRA)" <ji...@apache.org> on 2018/01/10 14:14:00 UTC

[jira] [Created] (PARQUET-1190) Use the same default page size across different language bindings

Zoltan Ivanfi created PARQUET-1190:
--------------------------------------

             Summary: Use the same default page size across different language bindings
                 Key: PARQUET-1190
                 URL: https://issues.apache.org/jira/browse/PARQUET-1190
             Project: Parquet
          Issue Type: Task
            Reporter: Zoltan Ivanfi


Currently there are many different page size recommandations/defaults in use:
* [parquet-format|https://github.com/apache/parquet-format#configurations] recommends 8 KB.
* [parquet-mr|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L46] uses 1 MB.
* [Impala|https://github.com/apache/impala/blob/daff8eb0ca19aa612c9fc7cc2ddd647735b31266/be/src/exec/hdfs-parquet-table-writer.h#L83] uses 64 KB.

These values (and other language bindings not listed above) should be consistent.

To pick a sensible new value, we may need to do some measurements. Because of this, we shall wait for column indexes to be implemented before picking a new value.

The new default page size does not necessarily have to be a single value any more, we have several options:
* A single default page size, as before.
* Different page size defaults depending on the type.
* Using a specified number of values instead of data size (e.g., every page contains 10000 values).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)