You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Thomas Rebele (Jira)" <ji...@apache.org> on 2020/09/28 16:05:00 UTC

[jira] [Comment Edited] (CALCITE-4264) The query planner should take CPU cost into account

    [ https://issues.apache.org/jira/browse/CALCITE-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203325#comment-17203325 ] 

Thomas Rebele edited comment on CALCITE-4264 at 9/28/20, 4:04 PM:
------------------------------------------------------------------

I implemented the proposal and run the tests. It improves some plans slightly. The order of a sort and project is arranged in a way, so that the sort is applied to the input with the fewest columns. This is because the CPU cost of sort takes the number of columns into account. A pull request is available.

Edit: [~vladimirsitnikov], as I understand the rowCost, it specifies how many rows are read from the input. Both Limit(Sort(...)) and LimitSort(...) need to read the whole input, as the last row might be part of the result. So the rowCost should be the same for both.


was (Author: thomas.rebele):
I implemented the proposal and run the tests. It improves some plans slightly. The order of a sort and project is arranged in a way, so that the sort is applied to the input with the fewest columns. This is because the CPU cost of sort takes the number of columns into account. A pull request is available.

> The query planner should take CPU cost into account
> ---------------------------------------------------
>
>                 Key: CALCITE-4264
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4264
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite only takes the row count into account when optimizing the queries. See [the relevant lines in VolcanoCost|https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98-L116]. However, two plans might have the same row count, but differ greatly in CPU cost. This happens for example when the limit sort rule ([CALCITE-3920|https://issues.apache.org/jira/browse/CALCITE-3920]) is activated. The row cost is the same, the EnumerableLimitSort only sorts the input partially, so has a lower CPU cost.
> Low impact proposal: Compare first the row cost, and only if the row cost is equal, compare by CPU cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)