You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2023/03/06 17:21:00 UTC

[jira] [Created] (IMPALA-11972) Factor in row width during ProcessingCost calculation.

Riza Suminto created IMPALA-11972:
-------------------------------------

             Summary: Factor in row width during ProcessingCost calculation.
                 Key: IMPALA-11972
                 URL: https://issues.apache.org/jira/browse/IMPALA-11972
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 4.3.0
            Reporter: Riza Suminto
            Assignee: Riza Suminto


IMPALA-11604 add ProcessingCost (PC) concept to measure the cost for a distinct PlanNode / DataSink / PlanFragment to process its input rows globally across all of its instances.

We should investigate if the row width should be considered in computing PC for more operators, and if that will make the PC model more accurate. The code in IMPALA-11604 has materialization cost parameter to accommodate PC where row width should factor in. Currently, PC of ScanNode, ExchangeNode, and DataStreamSink has row width factored in through materialization parameter here.

For VARCHAR, we can use some kind of average width stats, if available.  For fixed width columns, we just use the width. In both cases, the unit should be in bytes. The idea of including a width in costing is to make the outcome as precise and less error-prone as possible.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org