You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/03/27 14:25:33 UTC

[GitHub] [parquet-format] mapleFU commented on a diff in pull request #197: PARQUET-2261: Proposal for unencoded/uncompressed statistics

mapleFU commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1149327881


##########
src/main/thrift/parquet.thrift:
##########
@@ -223,6 +223,17 @@ struct Statistics {
     */
    5: optional binary max_value;
    6: optional binary min_value;
+   /** The number of bytes the row/group or page would take if encoded with plain-encoding */
+   7: optional i64 plain_encoded_bytes;
+   /** 
+     * When present there is expected to be one element corresponding to each repetition (i.e. size=max repetition_leve) 
+     * where each element represens the count of the number of times that level occurs in the page/column chunk.
+     */
+   8: optional list<i64> repetition_level_histogram;

Review Comment:
   Seems It can help pushdown some filter on List/Map, and helping constructing the list. It's great, but I think maybe we need some samples? Because it's a bit hard to understand how to make full use of it. Like some rules in <Storing and Querying Tree-Structured Records in Dremel> ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org