You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "fatemah (Jira)" <ji...@apache.org> on 2022/08/23 21:30:00 UTC

[jira] [Created] (PARQUET-2175) Skip method skips levels and not rows for repeated fields

fatemah created PARQUET-2175:
--------------------------------

             Summary: Skip method skips levels and not rows for repeated fields
                 Key: PARQUET-2175
                 URL: https://issues.apache.org/jira/browse/PARQUET-2175
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
            Reporter: fatemah


The implementation of TypedColumnReader::Skip method with signature:

virtual int64_t Skip(int64_t num_levels_to_skip) = 0;

will skip levels for both repeated fields and non-repeated fields. We want to be able to skip rows for repeated fields, and skipping levels is not that useful.

For example, for the following rows:

message M \{ repeated int32 b = 1 }

rows: {}, \{[10,10]}, \{[20, 20, 20]}

values = \{10, 10, 20, 20, 20};
def_levels = \{0, 1, 1, 1, 1, 1};
rep_levels = \{0, 0, 1, 0, 1, 1};

We want skip(2) to skip the first two rows, so that the next value that we read is 20. However, it will skip the first two levels, and the next value that we read is 10.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)