You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthew Topol (Jira)" <ji...@apache.org> on 2022/06/09 15:19:00 UTC
[jira] [Resolved] (ARROW-16638) [Go][Parquet] Boolean column reader fails to skip rows
[ https://issues.apache.org/jira/browse/ARROW-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Topol resolved ARROW-16638.
-----------------------------------
Resolution: Fixed
Issue resolved by pull request 13277
[https://github.com/apache/arrow/pull/13277]
> [Go][Parquet] Boolean column reader fails to skip rows
> ------------------------------------------------------
>
> Key: ARROW-16638
> URL: https://issues.apache.org/jira/browse/ARROW-16638
> Project: Apache Arrow
> Issue Type: Bug
> Components: Go
> Reporter: Matt DePero
> Priority: Major
> Labels: pull-request-available
> Fix For: 9.0.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> Skipping values in the go parquet column reader is effectively implemented by reading the target number of rows into scratch space which is then discarded. In the boolean case, [BytesRequired|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L439] returns returns a scratch buffer that allocates one bit per row, however that [same scratch space|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213] is also attempted to be used for `defLvls` and `repLvls` (both int16), which requires two bytes per row. Since the boolean `values` buffer is not large enough to hold the same number of rows worth of def and rep levels, skipping too many rows results in an index out of bounds panic.
>
> Note that for other column types, this does not seem to be an issue since the buffer needed for `values` is always larger than the buffer needed for def and rep levels, however there still seems to be no reason to include any non-nil value to `cr.ReadBatch(...)` for [rep and def lvls|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213] when skipping any column in the reader.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)