You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Xinli Shang (Jira)" <ji...@apache.org> on 2020/08/28 02:24:00 UTC

[jira] [Comment Edited] (PARQUET-1901) Add filter null check for ColumnIndex

    [ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186211#comment-17186211 ] 

Xinli Shang edited comment on PARQUET-1901 at 8/28/20, 2:23 AM:
----------------------------------------------------------------

I have the initial version of Iceberg integration working in my private repo https://github.com/shangxinli/iceberg/commit/4cc9351f8a511a3179cb3ac857541f9116dd8661. It can skip the pages now based on the column index. But it is very initial version and I didn't finalize it yet, also no tests are added. I also didn't get time to address your feedback to idtoAlias comments yet. But I hope it can give you an idea ON what the integration looks like. 


was (Author: shangx@uber.com):
I have the initial version of Iceberg integration working in my private repo https://github.com/shangxinli/iceberg/commit/4cc9351f8a511a3179cb3ac857541f9116dd8661. It can skip the pages now based on the column index. But it is very initial version and I didn't finalize it yet, also no tests are added. I also didn't get time to address your feedback to idtoAlias comments yet. But I hope it can give you AN idea ON what the integration looks like. 

> Add filter null check for ColumnIndex  
> ---------------------------------------
>
>                 Key: PARQUET-1901
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1901
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.11.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 'filter' is assumed to be non-null without checking. It throws NPE when ColumnIndex is enabled(by default) but there is no filter set in the ParquetReadOptions. The call stack is as below. 
>     java.lang.NullPointerException
>         at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
>         at org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
>         at org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)