You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/06/12 09:35:50 UTC

[GitHub] [arrow] tustvold commented on a diff in pull request #36027: GH-36028: [Docs][Parquet] Detailed parquet format support and parquet integration status

tustvold commented on code in PR #36027:
URL: https://github.com/apache/arrow/pull/36027#discussion_r1226372851


##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    | Rust  |
+|                                           |       |        |        |       |       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Brotli, LZ4, ZSTD                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW                                   |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Hive-style partitioning                   |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| File metadata                             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup metadata                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column metadata                           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Chunk metadta                             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Sorting column                            |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ColumnIndex statistics                    |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page statistics                           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| xxHash based bloom filter                 |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length                       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption                        |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| External column data                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Nanosecond support                        |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Complete Delta encoding support           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Complete RLE support                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup pruning using statistics         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup pruning using bloom filter       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using projection pushdown    |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using statistics             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using bloom filter           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Partition append / delete                 |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup append / delete                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page append / delete                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum                       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel partition processing             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel RowGroup processing              |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel Page processing                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Storage-aware defaults (1)                |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Adaptive concurrency (2)                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Adaptive IO when pruning used (3)         |       |        |        |       |       |

Review Comment:
   Perhaps just a "Vectorized IO Pushdown". I believe there are efforts to add such an API to parquet-mr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org