You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/06/23 00:22:00 UTC

[jira] [Commented] (IMPALA-9496) Allow Struct type in SELECT list for Parquet tables

    [ https://issues.apache.org/jira/browse/IMPALA-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557739#comment-17557739 ] 

ASF subversion and git services commented on IMPALA-9496:
---------------------------------------------------------

Commit 5d021ce5a72060d243ae4c56ad803c2fc686a5ce in impala's branch refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d021ce5a ]

IMPALA-9496: Allow struct type in the select list for Parquet tables

This patch is to extend the support of Struct columns in the select
list to Parquet files as well.

There are some limitation with this patch:
  - Dictionary filtering could work when we have conjuncts on a member
    of a struct, however, if this struct is given in the select list
    then the dictionary filtering is disabled. The reason is that in
    this case there would be a mismatch between the slot/tuple IDs in
    the conjunct between the ones in the select list due to expr
    substitution logic when a struct is in the select list. Solving
    this puzzle would be a nice future performance enhancement. See
    IMPALA-11361.
  - When structs are read in a batched manner it delegates the actual
    reading of the data to the column readers of its children, however,
    would use the simple ReadValue() on these readers instead of the
    batched version. The reason is that calling the batched reader in
    the member column readers would in fact read in batches, but it
    won't handle the case when the parent struct is NULL and would set
    only itself to NULL but not the parent struct. This might also be a
    future performance enhancement. See IMPALA-11363.
  - If there is a struct in the select list then late materialization
    is turned off. The reason is that LM expects the column readers to
    be used through the batched reading interface, however, as said in
    the above bulletpoint currently struct column readers use the
    non-batched reading interface of its children. As a result after
    reading the column readers are not in a state as SkipRows() of LM
    expects and then results in a query failure because it's not able
    to skip the rows for non-filter readers.
    Once IMPALA-11363 is implemented and the struct will also use the
    ReadValueBatch() interface of its children then late
    materialization could be turned on even if structs are in the
    select list. See IMPALA-11364.

Testing:
  - There were a lot of tests already to exercise this functionality
    but they were only run on ORC table. I changed these to cover
    Parquet tables too.

Change-Id: I3e8b4cbc2c4d1dd5fbefb7c87dad8d4e6ac2f452
Reviewed-on: http://gerrit.cloudera.org:8080/18596
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Allow Struct type in SELECT list for Parquet tables
> ---------------------------------------------------
>
>                 Key: IMPALA-9496
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9496
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: complextype
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org