You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/11/13 09:15:00 UTC

[jira] [Commented] (PARQUET-1685) Truncate the stored min and max for String statistics to reduce the footer size

    [ https://issues.apache.org/jira/browse/PARQUET-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973152#comment-16973152 ] 

ASF GitHub Bot commented on PARQUET-1685:
-----------------------------------------

gszadovszky commented on pull request #696: PARQUET-1685: Truncate Min/Max for Statistics
URL: https://github.com/apache/parquet-mr/pull/696
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Truncate the stored min and max for String statistics to reduce the footer size 
> --------------------------------------------------------------------------------
>
>                 Key: PARQUET-1685
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1685
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.10.1
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>
> Iceberg has a cool feature that truncates the stored min, max statistics to minimize the metadata size. We can borrow to truncate them in Parquet also to reduce the size of the footer, or even the page header. Here is the code in IceBerg [https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/util/UnicodeUtil.java]. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)