You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "emkornfield (via GitHub)" <gi...@apache.org> on 2023/03/26 05:51:18 UTC

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: Initial proposal for unencoded/uncompressed statistics

emkornfield commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1148481707


##########
src/main/thrift/parquet.thrift:
##########
@@ -223,6 +223,17 @@ struct Statistics {
     */
    5: optional binary max_value;
    6: optional binary min_value;
+   /** The number of bytes the row/group or page would take if encoded with plain-encoding */
+   7: optional i64 plain_encoded_bytes;

Review Comment:
   I'm open to either approach.  IIUC the suggestion here to change the name to something like:
   ```
   /** Optionally set.  But only  set for byte array columns to help applications determine total unencoded/uncompressed size of the page.
      * This is equivalent to PlainEncoding(values) - (num_values_encoded * 4) (i.e. it doesn't include the size
      * needed to record the lengths of the bytes) nor does it include any size to account for nulls.
      */
   encoded_byte_array_data_bytes
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org