You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/03/13 16:26:00 UTC

[jira] [Commented] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections

    [ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058895#comment-17058895 ] 

Tim Armstrong commented on IMPALA-2272:
---------------------------------------

[~gaborkaszab] this is good to keep in mind.

> Parquet scanner always materializes NULL for empty collections
> --------------------------------------------------------------
>
>                 Key: IMPALA-2272
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2272
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.3.0
>            Reporter: Skye Wanderman-Milne
>            Priority: Minor
>              Labels: nested_types
>
> Currently the Parquet scanner will always materialize a NULL slot for an empty collection, rather than an empty ArrayValue/CollectionValue. It is not currently possible to write a query that exposes this bug (i.e. it's not possible to write a query that distinguishes between an empty and NULL collection), but it will be once we add expressions that take collections as input (e.g. "select array_column is null from tbl").
> We have this bug because the parquet scanner only looks at the repeated field of an array, not the containing group field. To fix it, it will have to consider the def/rep levels of both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org