You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2021/03/29 07:44:00 UTC
[jira] [Comment Edited] (CALCITE-4557) Sort costing should account
the number of sort keys: the more keys to compare, the more the cost
[ https://issues.apache.org/jira/browse/CALCITE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310474#comment-17310474 ]
Vladimir Sitnikov edited comment on CALCITE-4557 at 3/29/21, 7:43 AM:
----------------------------------------------------------------------
{quote}Most likely, they all terminate in 1 or 1.1 comparisons on average.{quote}
If the input is already sorted, then the leading columns would not yield an early return from the comparisons.
For instance, if the input is already sorted on {{a, b, c}}, then {{sort by a, b, c, d}} would have to perform 4 comparisons always.
{code:sql}
select *
from (
select a, b, c, d ... inherently ordered by a, b
)
order by a, b, c, d
{code}
---
However, I'm not much into assessing the number of distinct values in the sort key fields, so I think we should be fine to assume that every next field would be compared with 0.1 probability decay.
In other words, "sort key comparison cost factor" should be like {{numberOfPreSortedFieldsInSortKey + 1 * (1 - 0.1^numberOfNonSortedFieldsInSortKey)/(1-0.1)}
was (Author: vladimirsitnikov):
(quote}Most likely, they all terminate in 1 or 1.1 comparisons on average.{quote}
If the input is already sorted, then the leading columns would not yield an early return from the comparisons.
For instance, if the input is already sorted on {{a, b, c}}, then {{sort by a, b, c, d}} would have to perform 4 comparisons always.
{code:sql}
select *
from (
select a, b, c, d ... inherently ordered by a, b
)
order by a, b, c, d
{code}
---
However, I'm not much into assessing the number of distinct values in the sort key fields, so I think we should be fine to assume that every next field would be compared with 0.1 probability decay.
In other words, "sort key comparison cost factor" should be like {{numberOfPreSortedFieldsInSortKey + 1 * (1 - 0.1^numberOfNonSortedFieldsInSortKey)/(1-0.1)}
> Sort costing should account the number of sort keys: the more keys to compare, the more the cost
> ------------------------------------------------------------------------------------------------
>
> Key: CALCITE-4557
> URL: https://issues.apache.org/jira/browse/CALCITE-4557
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.26.0
> Reporter: Vladimir Sitnikov
> Priority: Major
>
> Current costing for sort does not account for the number of keys in the comparison function which might result in worse plan selection.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)