You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/11/03 15:33:00 UTC
[jira] [Commented] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

    [ https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438143#comment-17438143 ] 

ASF subversion and git services commented on IMPALA-9873:
---------------------------------------------------------

Commit ef2a8f6f57c8feb11197d2e632e18e65e05cc4ab in impala's branch refs/heads/master from Amogh Margoor
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ef2a8f6 ]

IMPALA-9873: (addendum) Fix test case for scratch_tuple_batch

Patch contains minor fixes:
1. scratch_tuple_batch test which was causing failure in ASAN
   build (IMPALA-10998).
2. Removing DCHECK which is not needed and gets triggered on
   cancellation tests (IMPALA-11000).

Change-Id: I74ee41718745b8dca26f88082d3f2efe474e3bf9
Reviewed-on: http://gerrit.cloudera.org:8080/17992
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Skip decoding of non-materialised columns in Parquet
> ----------------------------------------------------
>
>                 Key: IMPALA-9873
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9873
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Amogh Margoor
>            Priority: Major
>
> This is a first milestone for lazy materialization in parquet, focusing on avoiding decompression and decoding of columns.
> * Identify columns referenced by predicates and runtime row filters and determine what order the columns need to be materialised in. Probably we want to evaluate static predicates before runtime filters to match current behaviour.
> * Rework this loop so that it alternates between materialising columns and evaluating predicates: https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
> * We probably need to keep track of filtered rows using a new data structure, e.g. bitmap
> * We need to then check that bitmap at each step to see if we skip materialising part or all of the following columns. E.g. if the first N rows were pruned, we can skip forward the remaining readers N rows.
> * This part may be a little tricky - there is the risk of adding overhead compared to the current code.
> * It is probably OK to just materialise the partition columns to start off with - avoiding materialising those is not going to buy that much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org