You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/06/20 19:31:04 UTC
[GitHub] [parquet-site] westonpace commented on a diff in pull request #34: PARQUET-2310: implementation status
westonpace commented on code in PR #34:
URL: https://github.com/apache/parquet-site/pull/34#discussion_r1235735459
##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,178 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+
+### Physical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| BOOLEAN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT32 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT64 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT96 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FLOAT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DOUBLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Logical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| STRING | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ENUM | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UUID | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8 and 16 bit signed INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8, 16, 32, 64 bit unsigned INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DATE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INTERVAL | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| JSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LIST | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| MAP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UNKNOWN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Encoding
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| PLAIN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| PLAIN_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BIT_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BINARY_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_LENGTH_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Compression
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| UNCOMPRESSED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| SNAPPY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| GZIP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZO | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BROTLI | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ZSTD | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Other format level features
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| xxHash Bloom filters | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value, max_value | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Offset index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+High level data API-s for parquet feature usage
+===============================================
+
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Format | C++ | Python | Java | Go | Rust |
+| | | | | | |
++==============================================+=======+========+========+=======+=======+
+| Hive-style partitioning | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| External column data | | | | | |
Review Comment:
What is this?
##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,178 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+
+### Physical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| BOOLEAN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT32 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT64 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT96 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FLOAT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DOUBLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Logical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| STRING | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ENUM | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UUID | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8 and 16 bit signed INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8, 16, 32, 64 bit unsigned INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DATE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INTERVAL | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| JSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LIST | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| MAP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UNKNOWN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Encoding
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| PLAIN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| PLAIN_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BIT_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BINARY_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_LENGTH_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Compression
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| UNCOMPRESSED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| SNAPPY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| GZIP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZO | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BROTLI | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ZSTD | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Other format level features
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| xxHash Bloom filters | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value, max_value | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Offset index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+High level data API-s for parquet feature usage
+===============================================
+
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Format | C++ | Python | Java | Go | Rust |
+| | | | | | |
++==============================================+=======+========+========+=======+=======+
+| Hive-style partitioning | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| External column data | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup Sorting column | | | | | |
Review Comment:
Also, what does this mean?
##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,178 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+
+### Physical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| BOOLEAN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT32 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT64 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT96 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FLOAT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DOUBLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Logical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| STRING | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ENUM | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UUID | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8 and 16 bit signed INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8, 16, 32, 64 bit unsigned INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DATE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INTERVAL | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| JSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LIST | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| MAP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UNKNOWN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Encoding
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| PLAIN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| PLAIN_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BIT_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BINARY_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_LENGTH_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Compression
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| UNCOMPRESSED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| SNAPPY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| GZIP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZO | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BROTLI | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ZSTD | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Other format level features
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| xxHash Bloom filters | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value, max_value | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Offset index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+High level data API-s for parquet feature usage
+===============================================
+
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Format | C++ | Python | Java | Go | Rust |
+| | | | | | |
++==============================================+=======+========+========+=======+=======+
+| Hive-style partitioning | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column | | | | | |
Review Comment:
I think these two features could probably be left out. I can see how the others are high-level API features but they are still very much "exposing parquet capabilities". I think hive-style partitioning is completely unrelated to parquet the format though.
##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,178 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+
+### Physical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| BOOLEAN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT32 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT64 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INT96 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FLOAT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DOUBLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Logical types
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Data type | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| STRING | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ENUM | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UUID | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8 and 16 bit signed INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| 8, 16, 32, 64 bit unsigned INT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DATE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIME (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT32) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| TIMESTAMP (INT64) | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| INTERVAL | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| JSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BSON | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LIST | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| MAP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| UNKNOWN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Encoding
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| PLAIN | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| PLAIN_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE_DICTIONARY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BIT_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BINARY_PACKED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_LENGTH_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| DELTA_BYTE_ARRAY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Compression
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| UNCOMPRESSED | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| SNAPPY | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| GZIP | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZO | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BROTLI | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4 | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ZSTD | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+### Other format level features
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| | C++ | Python | Java | Go | Rust |
+| | | | | | |
++===========================================+=======+========+========+=======+=======+
+| xxHash Bloom filters | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value, max_value | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Offset index | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption | | | | | |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+High level data API-s for parquet feature usage
+===============================================
+
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Format | C++ | Python | Java | Go | Rust |
+| | | | | | |
++==============================================+=======+========+========+=======+=======+
+| Hive-style partitioning | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| External column data | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup Sorting column | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Read / Write RowGroup metadata and data (1) | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup pruning using statistics | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Read / Write page metadata and data (2) | | | | | |
++----------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using projection pushdown | | | | | |
Review Comment:
When I think "projection pushdown" I think "column selection". Is that what is being discussed here? I guess I wouldn't associate it with "page pruning" in my mind.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org