You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2023/12/19 21:58:00 UTC
[jira] [Updated] (IMPALA-12657) Improve ProcessingCost of ScanNode and NonGroupingAggregator

     [ https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Riza Suminto updated IMPALA-12657:
----------------------------------
    Description: 
Several benchmark run measuring Impala scan performance indicates some costing improvement opportunity around ScanNode and NonGroupingAggregator.

[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple count star query.

Key takeaway:
 # There is a strong correlation between total materialized bytes (row-size * cardinality) with total materialized tuple time per fragment. Row materialization cost should be adjusted to be based on this row-sized instead of equal cost per scan range.
 # NonGroupingAggregator should have much lower cost that GroupingAggregator. In example above, the cost of NonGroupingAggregator dominates the scan fragment even though it only does simple counting instead of hash table operation.

  was:
Several benchmark run measuring Impala scan performance indicates some costing improvement opportunity around ScanNode and NonGroupingAggregator.

[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple count star query.

Key takeaway:
 # There is a strong correlation between total materialized bytes (row-size * cardinality) with total materialized tuple time per fragment. Row materialization cost should be adjusted to be based on this row-sized instead of equal cost per scan fragment.
 # NonGroupingAggregator should have much lower cost that GroupingAggregator. In example above, the cost of NonGroupingAggregator dominates the scan fragment even though it only does simple counting instead of hash table operation.


> Improve ProcessingCost of ScanNode and NonGroupingAggregator
> ------------------------------------------------------------
>
>                 Key: IMPALA-12657
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12657
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>         Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt
>
>
> Several benchmark run measuring Impala scan performance indicates some costing improvement opportunity around ScanNode and NonGroupingAggregator.
> [^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple count star query.
> Key takeaway:
>  # There is a strong correlation between total materialized bytes (row-size * cardinality) with total materialized tuple time per fragment. Row materialization cost should be adjusted to be based on this row-sized instead of equal cost per scan range.
>  # NonGroupingAggregator should have much lower cost that GroupingAggregator. In example above, the cost of NonGroupingAggregator dominates the scan fragment even though it only does simple counting instead of hash table operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org