You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by we...@apache.org on 2022/07/04 14:32:28 UTC
[parquet-testing] branch master updated: add test file for page index filter. (#25)
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new aafd3fc add test file for page index filter. (#25)
aafd3fc is described below
commit aafd3fc9df431c2625a514fb46626e5614f1d199
Author: Yang Jiang <ji...@163.com>
AuthorDate: Mon Jul 4 22:32:23 2022 +0800
add test file for page index filter. (#25)
* add test file for page index filter.
* add link
---
data/README.md | 22 ++++++++++++----------
data/alltypes_tiny_pages.parquet | Bin 0 -> 454233 bytes
data/alltypes_tiny_pages_plain.parquet | Bin 0 -> 811756 bytes
3 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/data/README.md b/data/README.md
index b1227d4..970c37b 100644
--- a/data/README.md
+++ b/data/README.md
@@ -19,16 +19,18 @@
# Test data files for Parquet compatibility and regression testing
-| File | Description |
-|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| delta_byte_array.parquet | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details. |
-| delta_length_byte_array.parquet | string columns with DELTA_LENGTH_BYTE_ARRAY encoding. |
-| delta_binary_packed.parquet | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details. |
-| delta_encoding_required_column.parquet | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details. |
-| delta_encoding_optional_column.parquet | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details. |
-| nested_structs.rust.parquet | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452) |
-| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats. |
-|null_list.parquet | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case. |
+| File | Description |
+|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| delta_byte_array.parquet | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details. |
+| delta_length_byte_array.parquet | string columns with DELTA_LENGTH_BYTE_ARRAY encoding. |
+| delta_binary_packed.parquet | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details. |
+| delta_encoding_required_column.parquet | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details. |
+| delta_encoding_optional_column.parquet | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details. |
+| nested_structs.rust.parquet | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452) |
+| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats. |
+| null_list.parquet | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case. |
+| alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
+| alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
TODO: Document what each file is in the table above.
diff --git a/data/alltypes_tiny_pages.parquet b/data/alltypes_tiny_pages.parquet
new file mode 100644
index 0000000..90019d1
Binary files /dev/null and b/data/alltypes_tiny_pages.parquet differ
diff --git a/data/alltypes_tiny_pages_plain.parquet b/data/alltypes_tiny_pages_plain.parquet
new file mode 100644
index 0000000..68d4dcb
Binary files /dev/null and b/data/alltypes_tiny_pages_plain.parquet differ