You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2019/09/12 16:24:00 UTC

[jira] [Updated] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

     [ https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-2138:
----------------------------------
    Attachment: performance_result.txt

> Get rid of unused columns by upstream operators at points of materialization
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-2138
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2138
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>            Reporter: Ippokratis Pandis
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: performance
>         Attachments: 0001-Projection-prototype.patch, performance_result.txt
>
>
> It would be a very good performance improvement if we were able to get rid of columns as soon as we know that they are not going to be used from any other operators upstream. The amount of data we are handling will reduce making the network and I/O (spilling) transfers more efficient. It will also improve cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid of such unused columns. However, there are points of materialization where we copy-out the tuples and we can actually perform these projections. There are multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each operator what's the minimum set of columns that are being referenced by this operator and all the upstream ones. (That minimum set is very easy to be calculated during an additional top-down traversal of the plan.) We also need to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, the FE and the BE changes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org