You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Claire McGinty (Jira)" <ji...@apache.org> on 2023/09/19 14:36:00 UTC

[jira] [Created] (PARQUET-2350) Create Configuration key for enabling Byte Stream Split Encoding in ParquetWRiter

Claire McGinty created PARQUET-2350:
---------------------------------------

             Summary: Create Configuration key for enabling Byte Stream Split Encoding in ParquetWRiter
                 Key: PARQUET-2350
                 URL: https://issues.apache.org/jira/browse/PARQUET-2350
             Project: Parquet
          Issue Type: Improvement
            Reporter: Claire McGinty


All of the properties in [ParquetWriter|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java] have an associated Configuration key (for example, [ParquetOutputFormat.DICTIONARY_PAGE_SIZE|https://github.com/apache/parquet-mr/blob/910bcc4edc2d707670e02e9ceadd98dacd9f08d2/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L140] corresponds to ParquetWriter#withDictionaryPageSize), except for `ParquetWriter#withByteStreamSplitEncoding`.

 

Can we add a Configuration key for this? Happy to make a PR, given some input on naming convention (`parquet.encoding.bytestreamsplit.enabled` maybe?)

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)