You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2021/03/29 07:44:00 UTC

[jira] [Comment Edited] (CALCITE-4557) Sort costing should account the number of sort keys: the more keys to compare, the more the cost

    [ https://issues.apache.org/jira/browse/CALCITE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310474#comment-17310474 ] 

Vladimir Sitnikov edited comment on CALCITE-4557 at 3/29/21, 7:43 AM:
----------------------------------------------------------------------

{quote}Most likely, they all terminate in 1 or 1.1 comparisons on average.{quote}

If the input is already sorted, then the leading columns would not yield an early return from the comparisons.

For instance, if the input is already sorted on {{a, b, c}}, then {{sort by a, b, c, d}} would have to perform 4 comparisons always.

{code:sql}
select *
  from (
   select a, b, c, d ... inherently ordered by a, b
  )
 order by a, b, c, d
{code}

---

However, I'm not much into assessing the number of distinct values in the sort key fields, so I think we should be fine to assume that every next field would be compared with 0.1 probability decay.

In other words, "sort key comparison cost factor" should be like {{numberOfPreSortedFieldsInSortKey + 1 * (1 - 0.1^numberOfNonSortedFieldsInSortKey)/(1-0.1)}






was (Author: vladimirsitnikov):
(quote}Most likely, they all terminate in 1 or 1.1 comparisons on average.{quote}

If the input is already sorted, then the leading columns would not yield an early return from the comparisons.

For instance, if the input is already sorted on {{a, b, c}}, then {{sort by a, b, c, d}} would have to perform 4 comparisons always.

{code:sql}
select *
  from (
   select a, b, c, d ... inherently ordered by a, b
  )
 order by a, b, c, d
{code}

---

However, I'm not much into assessing the number of distinct values in the sort key fields, so I think we should be fine to assume that every next field would be compared with 0.1 probability decay.

In other words, "sort key comparison cost factor" should be like {{numberOfPreSortedFieldsInSortKey + 1 * (1 - 0.1^numberOfNonSortedFieldsInSortKey)/(1-0.1)}





> Sort costing should account the number of sort keys: the more keys to compare, the more the cost
> ------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4557
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4557
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.26.0
>            Reporter: Vladimir Sitnikov
>            Priority: Major
>
> Current costing for sort does not account for the number of keys in the comparison function which might result in worse plan selection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)