You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2022/12/08 16:44:44 UTC

[parquet-testing] branch master updated: ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git


The following commit(s) were added to refs/heads/master by this push:
     new de7570a  ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)
de7570a is described below

commit de7570a865af017add78432e4c045912c213ae24
Author: Gang Wu <us...@gmail.com>
AuthorDate: Fri Dec 9 00:44:17 2022 +0800

    ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)
---
 data/README.md                       |   1 +
 data/fixed_length_byte_array.md      |  73 +++++++++++++++++++++++++++++++++++
 data/fixed_length_byte_array.parquet | Bin 0 -> 4335 bytes
 3 files changed, 74 insertions(+)

diff --git a/data/README.md b/data/README.md
index 398a88c..4bb59c2 100644
--- a/data/README.md
+++ b/data/README.md
@@ -32,6 +32,7 @@
 | alltypes_tiny_pages.parquet             | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
 | alltypes_tiny_pages_plain.parquet       | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet).           |
 | rle_boolean_encoding.parquet            | option boolean columns with RLE encoding                                                                                                                         |
+| fixed_length_byte_array.parquet                | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See [fixed_length_byte_array.md](fixed_length_byte_array.md) for details.                        |
 | datapage_v1-uncompressed-checksum.parquet      | uncompressed INT32 columns in v1 data pages with a matching CRC        |
 | datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC          |
 | datapage_v1-corrupt-checksum.parquet           | uncompressed INT32 columns in v1 data pages with a mismatching CRC     |
diff --git a/data/fixed_length_byte_array.md b/data/fixed_length_byte_array.md
new file mode 100644
index 0000000..a0d98ac
--- /dev/null
+++ b/data/fixed_length_byte_array.md
@@ -0,0 +1,73 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+`fixed_length_byte_array.parquet` is generated by parquet-mr version 1.13.0-SNAPSHOT.
+
+It has a single column of fixed length byte array type with size 4.
+
+In total there are 1000 values written in the descending order with some random nulls.
+
+# File Metadata (from parquet-cli meta command)
+```
+File path:  fixed_length_byte_array.parquet
+Created by: parquet-mr version 1.13.0-SNAPSHOT (build d057b39d93014fe40f5067ee4a33621e65c91552)
+Properties:
+  writer.model.name: example
+Schema:
+message schema {
+  required fixed_len_byte_array(4) flba_field;
+}
+
+
+Row group 0:  count: 1000  3.84 B records  start: 4  total(compressed): 3.749 kB total(uncompressed):3.749 kB
+--------------------------------------------------------------------------------
+            type      encodings count     avg size   nulls   min / max
+flba_field  FIXED[4] _   _     1000      3.84 B   105     "0x00000001" / "0x000003E8"
+```
+
+# Column Index (from parquet-cli column-index command)
+```
+row-group 0:
+column index for column flba_field:
+Boundary order: DESCENDING
+                      null count  min                                       max
+page-0                         9  0x00000385                                0x000003E8
+page-1                         9  0x00000321                                0x00000384
+page-2                        19  0x000002BD                                0x00000320
+page-3                        10  0x00000259                                0x000002BC
+page-4                        13  0x000001F5                                0x00000258
+page-5                        11  0x00000191                                0x000001F4
+page-6                        11  0x0000012D                                0x00000190
+page-7                         8  0x000000C9                                0x0000012C
+page-8                         9  0x00000065                                0x000000C8
+page-9                         6  0x00000001                                0x00000064
+
+offset index for column flba_field:
+                          offset   compressed size       first row index
+page-0                         4               390                     0
+page-1                       394               390                   100
+page-2                       784               350                   200
+page-3                      1134               386                   300
+page-4                      1520               373                   400
+page-5                      1893               382                   500
+page-6                      2275               382                   600
+page-7                      2657               394                   700
+page-8                      3051               390                   800
+page-9                      3441               402                   900
+```
diff --git a/data/fixed_length_byte_array.parquet b/data/fixed_length_byte_array.parquet
new file mode 100644
index 0000000..e86a886
Binary files /dev/null and b/data/fixed_length_byte_array.parquet differ