You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2020/12/16 12:56:00 UTC

[jira] [Commented] (CALCITE-4264) The query planner should take CPU cost into account

    [ https://issues.apache.org/jira/browse/CALCITE-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250285#comment-17250285 ] 

Vladimir Sitnikov commented on CALCITE-4264:
--------------------------------------------

[~rubenql] mentions a sample in https://issues.apache.org/jira/browse/CALCITE-4264?focusedCommentId=17199258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17199258

AFAIK PostgreSQL models LIMIT via using two costs: one for "startup" and the other for "full execution" of the query.

Then, LIMIT cost is calculated as "startup + limit_percentage*execution_cost".
In practice, limit(index scan all rows in table) works only for the very small number of returned rows.

For instance, query "SELECT * from myTable ORDER BY myField LIMIT 1000000" would be much more efficient in case it uses "table full scan" (or something) since it would avoid random-access reads for fetching the index.

However, the example [~rubenql] mentions would always favour {{limit(index_scan)}} plan no matter what :-(

> The query planner should take CPU cost into account
> ---------------------------------------------------
>
>                 Key: CALCITE-4264
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4264
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Calcite only takes the row count into account when optimizing the queries. See [the relevant lines in VolcanoCost|https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98-L116]. However, two plans might have the same row count, but differ greatly in CPU cost. This happens for example when the limit sort rule ([CALCITE-3920|https://issues.apache.org/jira/browse/CALCITE-3920]) is activated. The row cost is the same, the EnumerableLimitSort only sorts the input partially, so has a lower CPU cost.
> Low impact proposal: Compare first the row cost, and only if the row cost is equal, compare by CPU cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)