You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "fatemah (Jira)" <ji...@apache.org> on 2022/08/23 21:30:00 UTC
[jira] [Created] (PARQUET-2175) Skip method skips levels and not rows for repeated fields
fatemah created PARQUET-2175:
--------------------------------
Summary: Skip method skips levels and not rows for repeated fields
Key: PARQUET-2175
URL: https://issues.apache.org/jira/browse/PARQUET-2175
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Reporter: fatemah
The implementation of TypedColumnReader::Skip method with signature:
virtual int64_t Skip(int64_t num_levels_to_skip) = 0;
will skip levels for both repeated fields and non-repeated fields. We want to be able to skip rows for repeated fields, and skipping levels is not that useful.
For example, for the following rows:
message M \{ repeated int32 b = 1 }
rows: {}, \{[10,10]}, \{[20, 20, 20]}
values = \{10, 10, 20, 20, 20};
def_levels = \{0, 1, 1, 1, 1, 1};
rep_levels = \{0, 0, 1, 0, 1, 1};
We want skip(2) to skip the first two rows, so that the next value that we read is 20. However, it will skip the first two levels, and the next value that we read is 10.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)