You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2021/04/22 15:04:21 UTC
[parquet-testing] branch master updated: PARQUET-1998: Add LZ_RAW
compressed files
This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new ddd8989 PARQUET-1998: Add LZ_RAW compressed files
ddd8989 is described below
commit ddd898958803cb89b7156c6350584d1cda0fe8de
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Tue Mar 23 16:29:57 2021 +0100
PARQUET-1998: Add LZ_RAW compressed files
The files were generated with Parquet C++.
* `lz4_raw_compressed.parquet` contains the same data as `hadoop_lz4_compressed.parquet`.
* `lz4_raw_compressed_larger.parquet` contains the same data as `hadoop_lz4_compressed_larger.parquet`.
Here are the file contents for `lz4_raw_compressed.parquet`:
```
File Name: parquet-testing/data/lz4_raw_compressed.parquet
Version: 1.0
Created By: parquet-cpp version 1.5.1-SNAPSHOT
Total rows: 4
Number of RowGroups: 1
Number of Real Columns: 3
Number of Columns: 3
Number of Selected Columns: 3
Column 0: c0 (INT64)
Column 1: c1 (BYTE_ARRAY)
Column 2: v11 (DOUBLE)
--- Row Group: 0 ---
--- Total Bytes: 251 ---
--- Total Compressed Bytes: 238 ---
--- Rows: 4 ---
Column 0
Values: 4, Null Values: 0, Distinct Values: 0
Max: 1593604801, Min: 1593604800
Compression: LZ4_RAW, Encodings: PLAIN RLE
Uncompressed Size: 93, Compressed Size: 85
Column 1
Values: 4, Null Values: 0, Distinct Values: 0
Max: def, Min: abc
Compression: LZ4_RAW, Encodings: PLAIN RLE
Uncompressed Size: 59, Compressed Size: 58
Column 2
Values: 4, Null Values: 0, Distinct Values: 0
Max: 42.125, Min: 7.7
Compression: LZ4_RAW, Encodings: PLAIN RLE
Uncompressed Size: 99, Compressed Size: 95
--- Values ---
c0 |c1 |v11 |
1593604800 |abc |42.000000 |
1593604800 |def |7.700000 |
1593604801 |abc |42.125000 |
1593604801 |def |7.700000 |
```
Here are the partial file contents for `lz4_raw_compressed_larger.parquet`:
```
File Name: parquet-testing/data/lz4_raw_compressed_larger.parquet
Version: 1.0
Created By: parquet-cpp version 1.5.1-SNAPSHOT
Total rows: 10000
Number of RowGroups: 1
Number of Real Columns: 1
Number of Columns: 1
Number of Selected Columns: 1
Column 0: a (BYTE_ARRAY/UTF8)
--- Row Group: 0 ---
--- Total Bytes: 400103 ---
--- Total Compressed Bytes: 380480 ---
--- Rows: 10000 ---
Column 0
Values: 10000, Null Values: 0, Distinct Values: 0
Max: ffffe6a0-e0c0-4e65-a9d4-f7f4c176aea2, Min: 00087de7-10df-4979-94cf-79279f9745ce
Compression: LZ4_RAW, Encodings: PLAIN RLE
Uncompressed Size: 400103, Compressed Size: 380480
--- Values ---
a |
c7ce6bef-d5b0-4863-b199-8ea8c7fb117b|
e8fb9197-cb9f-4118-b67f-fbfa65f61843|
885136e1-0aa1-4fdb-8847-63d87b07c205|
ce7b2019-8ebe-4906-a74d-0afa2409e5df|
a9ee2527-821b-4b71-a926-03f73c3fc8b7|
[...]
```
---
data/lz4_raw_compressed.parquet | Bin 0 -> 797 bytes
data/lz4_raw_compressed_larger.parquet | Bin 0 -> 380836 bytes
2 files changed, 0 insertions(+), 0 deletions(-)
diff --git a/data/lz4_raw_compressed.parquet b/data/lz4_raw_compressed.parquet
new file mode 100644
index 0000000..4f78711
Binary files /dev/null and b/data/lz4_raw_compressed.parquet differ
diff --git a/data/lz4_raw_compressed_larger.parquet b/data/lz4_raw_compressed_larger.parquet
new file mode 100644
index 0000000..b83c59e
Binary files /dev/null and b/data/lz4_raw_compressed_larger.parquet differ