You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthew Topol (Jira)" <ji...@apache.org> on 2022/06/09 15:19:00 UTC

[jira] [Resolved] (ARROW-16638) [Go][Parquet] Boolean column reader fails to skip rows

     [ https://issues.apache.org/jira/browse/ARROW-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Topol resolved ARROW-16638.
-----------------------------------
    Resolution: Fixed

Issue resolved by pull request 13277
[https://github.com/apache/arrow/pull/13277]

> [Go][Parquet] Boolean column reader fails to skip rows
> ------------------------------------------------------
>
>                 Key: ARROW-16638
>                 URL: https://issues.apache.org/jira/browse/ARROW-16638
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>            Reporter: Matt DePero
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Skipping values in the go parquet column reader is effectively implemented by reading the target number of rows into scratch space which is then discarded. In the boolean case, [BytesRequired|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L439] returns returns a scratch buffer that allocates one bit per row, however that [same scratch space|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213] is also attempted to be used for `defLvls` and `repLvls` (both int16), which requires two bytes per row. Since the boolean `values` buffer is not large enough to hold the same number of rows worth of def and rep levels, skipping too many rows results in an index out of bounds panic.
>  
> Note that for other column types, this does not seem to be an issue since the buffer needed for `values` is always larger than the buffer needed for def and rep levels, however there still seems to be no reason to include any non-nil value to `cr.ReadBatch(...)` for [rep and def lvls|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213] when skipping any column in the reader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)