You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2021/04/22 15:04:21 UTC

[parquet-testing] branch master updated: PARQUET-1998: Add LZ_RAW compressed files

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git


The following commit(s) were added to refs/heads/master by this push:
     new ddd8989  PARQUET-1998: Add LZ_RAW compressed files
ddd8989 is described below

commit ddd898958803cb89b7156c6350584d1cda0fe8de
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Tue Mar 23 16:29:57 2021 +0100

    PARQUET-1998: Add LZ_RAW compressed files
    
    The files were generated with Parquet C++.
    
    * `lz4_raw_compressed.parquet` contains the same data as `hadoop_lz4_compressed.parquet`.
    * `lz4_raw_compressed_larger.parquet` contains the same data as `hadoop_lz4_compressed_larger.parquet`.
    
    Here are the file contents for `lz4_raw_compressed.parquet`:
    ```
    File Name: parquet-testing/data/lz4_raw_compressed.parquet
    Version: 1.0
    Created By: parquet-cpp version 1.5.1-SNAPSHOT
    Total rows: 4
    Number of RowGroups: 1
    Number of Real Columns: 3
    Number of Columns: 3
    Number of Selected Columns: 3
    Column 0: c0 (INT64)
    Column 1: c1 (BYTE_ARRAY)
    Column 2: v11 (DOUBLE)
    --- Row Group: 0 ---
    --- Total Bytes: 251 ---
    --- Total Compressed Bytes: 238 ---
    --- Rows: 4 ---
    Column 0
      Values: 4, Null Values: 0, Distinct Values: 0
      Max: 1593604801, Min: 1593604800
      Compression: LZ4_RAW, Encodings: PLAIN RLE
      Uncompressed Size: 93, Compressed Size: 85
    Column 1
      Values: 4, Null Values: 0, Distinct Values: 0
      Max: def, Min: abc
      Compression: LZ4_RAW, Encodings: PLAIN RLE
      Uncompressed Size: 59, Compressed Size: 58
    Column 2
      Values: 4, Null Values: 0, Distinct Values: 0
      Max: 42.125, Min: 7.7
      Compression: LZ4_RAW, Encodings: PLAIN RLE
      Uncompressed Size: 99, Compressed Size: 95
    --- Values ---
    c0                            |c1                            |v11                           |
    1593604800                    |abc                           |42.000000                     |
    1593604800                    |def                           |7.700000                      |
    1593604801                    |abc                           |42.125000                     |
    1593604801                    |def                           |7.700000                      |
    ```
    
    Here are the partial file contents for `lz4_raw_compressed_larger.parquet`:
    ```
    File Name: parquet-testing/data/lz4_raw_compressed_larger.parquet
    Version: 1.0
    Created By: parquet-cpp version 1.5.1-SNAPSHOT
    Total rows: 10000
    Number of RowGroups: 1
    Number of Real Columns: 1
    Number of Columns: 1
    Number of Selected Columns: 1
    Column 0: a (BYTE_ARRAY/UTF8)
    --- Row Group: 0 ---
    --- Total Bytes: 400103 ---
    --- Total Compressed Bytes: 380480 ---
    --- Rows: 10000 ---
    Column 0
      Values: 10000, Null Values: 0, Distinct Values: 0
      Max: ffffe6a0-e0c0-4e65-a9d4-f7f4c176aea2, Min: 00087de7-10df-4979-94cf-79279f9745ce
      Compression: LZ4_RAW, Encodings: PLAIN RLE
      Uncompressed Size: 400103, Compressed Size: 380480
    --- Values ---
    a                             |
    c7ce6bef-d5b0-4863-b199-8ea8c7fb117b|
    e8fb9197-cb9f-4118-b67f-fbfa65f61843|
    885136e1-0aa1-4fdb-8847-63d87b07c205|
    ce7b2019-8ebe-4906-a74d-0afa2409e5df|
    a9ee2527-821b-4b71-a926-03f73c3fc8b7|
    [...]
    ```
---
 data/lz4_raw_compressed.parquet        | Bin 0 -> 797 bytes
 data/lz4_raw_compressed_larger.parquet | Bin 0 -> 380836 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/data/lz4_raw_compressed.parquet b/data/lz4_raw_compressed.parquet
new file mode 100644
index 0000000..4f78711
Binary files /dev/null and b/data/lz4_raw_compressed.parquet differ
diff --git a/data/lz4_raw_compressed_larger.parquet b/data/lz4_raw_compressed_larger.parquet
new file mode 100644
index 0000000..b83c59e
Binary files /dev/null and b/data/lz4_raw_compressed_larger.parquet differ