You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2022/12/08 16:44:44 UTC
[parquet-testing] branch master updated: ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)
This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new de7570a ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)
de7570a is described below
commit de7570a865af017add78432e4c045912c213ae24
Author: Gang Wu <us...@gmail.com>
AuthorDate: Fri Dec 9 00:44:17 2022 +0800
ARROW-18420: Add fixed_length_byte_array.parquet for page index test (#31)
---
data/README.md | 1 +
data/fixed_length_byte_array.md | 73 +++++++++++++++++++++++++++++++++++
data/fixed_length_byte_array.parquet | Bin 0 -> 4335 bytes
3 files changed, 74 insertions(+)
diff --git a/data/README.md b/data/README.md
index 398a88c..4bb59c2 100644
--- a/data/README.md
+++ b/data/README.md
@@ -32,6 +32,7 @@
| alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| rle_boolean_encoding.parquet | option boolean columns with RLE encoding |
+| fixed_length_byte_array.parquet | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See [fixed_length_byte_array.md](fixed_length_byte_array.md) for details. |
| datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC |
| datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC |
diff --git a/data/fixed_length_byte_array.md b/data/fixed_length_byte_array.md
new file mode 100644
index 0000000..a0d98ac
--- /dev/null
+++ b/data/fixed_length_byte_array.md
@@ -0,0 +1,73 @@
+<!--
+ ~ Licensed to the Apache Software Foundation (ASF) under one
+ ~ or more contributor license agreements. See the NOTICE file
+ ~ distributed with this work for additional information
+ ~ regarding copyright ownership. The ASF licenses this file
+ ~ to you under the Apache License, Version 2.0 (the
+ ~ "License"); you may not use this file except in compliance
+ ~ with the License. You may obtain a copy of the License at
+ ~
+ ~ http://www.apache.org/licenses/LICENSE-2.0
+ ~
+ ~ Unless required by applicable law or agreed to in writing,
+ ~ software distributed under the License is distributed on an
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ~ KIND, either express or implied. See the License for the
+ ~ specific language governing permissions and limitations
+ ~ under the License.
+ -->
+
+`fixed_length_byte_array.parquet` is generated by parquet-mr version 1.13.0-SNAPSHOT.
+
+It has a single column of fixed length byte array type with size 4.
+
+In total there are 1000 values written in the descending order with some random nulls.
+
+# File Metadata (from parquet-cli meta command)
+```
+File path: fixed_length_byte_array.parquet
+Created by: parquet-mr version 1.13.0-SNAPSHOT (build d057b39d93014fe40f5067ee4a33621e65c91552)
+Properties:
+ writer.model.name: example
+Schema:
+message schema {
+ required fixed_len_byte_array(4) flba_field;
+}
+
+
+Row group 0: count: 1000 3.84 B records start: 4 total(compressed): 3.749 kB total(uncompressed):3.749 kB
+--------------------------------------------------------------------------------
+ type encodings count avg size nulls min / max
+flba_field FIXED[4] _ _ 1000 3.84 B 105 "0x00000001" / "0x000003E8"
+```
+
+# Column Index (from parquet-cli column-index command)
+```
+row-group 0:
+column index for column flba_field:
+Boundary order: DESCENDING
+ null count min max
+page-0 9 0x00000385 0x000003E8
+page-1 9 0x00000321 0x00000384
+page-2 19 0x000002BD 0x00000320
+page-3 10 0x00000259 0x000002BC
+page-4 13 0x000001F5 0x00000258
+page-5 11 0x00000191 0x000001F4
+page-6 11 0x0000012D 0x00000190
+page-7 8 0x000000C9 0x0000012C
+page-8 9 0x00000065 0x000000C8
+page-9 6 0x00000001 0x00000064
+
+offset index for column flba_field:
+ offset compressed size first row index
+page-0 4 390 0
+page-1 394 390 100
+page-2 784 350 200
+page-3 1134 386 300
+page-4 1520 373 400
+page-5 1893 382 500
+page-6 2275 382 600
+page-7 2657 394 700
+page-8 3051 390 800
+page-9 3441 402 900
+```
diff --git a/data/fixed_length_byte_array.parquet b/data/fixed_length_byte_array.parquet
new file mode 100644
index 0000000..e86a886
Binary files /dev/null and b/data/fixed_length_byte_array.parquet differ