You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by we...@apache.org on 2022/07/04 14:32:28 UTC

[parquet-testing] branch master updated: add test file for page index filter. (#25)

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git


The following commit(s) were added to refs/heads/master by this push:
     new aafd3fc  add test file for page index filter. (#25)
aafd3fc is described below

commit aafd3fc9df431c2625a514fb46626e5614f1d199
Author: Yang Jiang <ji...@163.com>
AuthorDate: Mon Jul 4 22:32:23 2022 +0800

    add test file for page index filter. (#25)
    
    * add test file for page index filter.
    
    * add link
---
 data/README.md                         |  22 ++++++++++++----------
 data/alltypes_tiny_pages.parquet       | Bin 0 -> 454233 bytes
 data/alltypes_tiny_pages_plain.parquet | Bin 0 -> 811756 bytes
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/data/README.md b/data/README.md
index b1227d4..970c37b 100644
--- a/data/README.md
+++ b/data/README.md
@@ -19,16 +19,18 @@
 
 # Test data files for Parquet compatibility and regression testing
 
-| File                                         | Description                                                                                                                                                     |
-|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| delta_byte_array.parquet                     | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details.                                                      |
-| delta_length_byte_array.parquet              | string columns with DELTA_LENGTH_BYTE_ARRAY encoding.                                                                                                           |
-| delta_binary_packed.parquet                  | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details.                                    |
-| delta_encoding_required_column.parquet       | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details.                  |
-| delta_encoding_optional_column.parquet       | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details.                  |
-| nested_structs.rust.parquet                  | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452) |
-| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats.                                               |
-|null_list.parquet                             | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case.                 |
+| File                                         | Description                                                                                                                                                      |
+|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| delta_byte_array.parquet                     | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details.                                                       |
+| delta_length_byte_array.parquet              | string columns with DELTA_LENGTH_BYTE_ARRAY encoding.                                                                                                            |
+| delta_binary_packed.parquet                  | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details.                                     |
+| delta_encoding_required_column.parquet       | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details.                   |
+| delta_encoding_optional_column.parquet       | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details.                   |
+| nested_structs.rust.parquet                  | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452)  |
+| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats.                                                |
+| null_list.parquet                       | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case.                       |
+| alltypes_tiny_pages.parquet             | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
+| alltypes_tiny_pages_plain.parquet       | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet).           |
 
 TODO: Document what each file is in the table above.
 
diff --git a/data/alltypes_tiny_pages.parquet b/data/alltypes_tiny_pages.parquet
new file mode 100644
index 0000000..90019d1
Binary files /dev/null and b/data/alltypes_tiny_pages.parquet differ
diff --git a/data/alltypes_tiny_pages_plain.parquet b/data/alltypes_tiny_pages_plain.parquet
new file mode 100644
index 0000000..68d4dcb
Binary files /dev/null and b/data/alltypes_tiny_pages_plain.parquet differ