You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/02/22 15:19:00 UTC

[jira] [Assigned] (IMPALA-11147) Min/max filtering crashes on Parquet file that contains partition columns

     [ https://issues.apache.org/jira/browse/IMPALA-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy reassigned IMPALA-11147:
------------------------------------------

    Assignee: Zoltán Borók-Nagy

> Min/max filtering crashes on Parquet file that contains partition columns
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-11147
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11147
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>
> Impala can crash on a Parquet file that contains the partition columns.
> Data files usually don't contain the partition columns, so Impala don't expect to find such columns in the data files. Unfortunately min/max filtering generates a SEGFAULT when the partition column is present in the data files.
> It happens because FindSkipRangesForPagesWithMinMaxFilters() tries to retrieve the Parquet schema element for a given slot descriptor. When the slot descriptor refers to a partition column, we usually don't find a schema element so we don't try to skip pages.
> But when the partition column is present in the data file, the code tries to calculate the filtered pages in the column. It uses the column reader object corresponding to the column, but this is null for partition columns, hence we get a SEGFAULT.
> The code shouldn't do anything at the page-level for partition columns, as the data in such columns are the same for the whole file and it is already filtered at a higher level.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org