You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2021/03/10 15:02:00 UTC
[jira] [Commented] (CALCITE-4522) Sort cost should account for the
number of columns in collation
[ https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298888#comment-17298888 ]
Vladimir Sitnikov commented on CALCITE-4522:
--------------------------------------------
{quote}You should multiply by constant + row_bytes, not row_bytes. You cannot sort 4-byte rows 25 types faster than 100-byte rows. There is a per-row overhead.{quote}
per-row overhead is caused by the comparison which is not related with the row size.
I suggest we use {{collation.getFieldCollations().size()}} as an estimate of the per-tuple comparison cost.
A slightly better estimate would be {{sum of datatype widths of all the fields in the collation oclumns}}
> Sort cost should account for the number of columns in collation
> ---------------------------------------------------------------
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: hqx
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 9h
> Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
> # When the RelCollation is empty, there is no need to sort, but it still compute the cpu cost of sort.
> # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, where n means the output row count of the sort operator, and row_byte means the average bytes of one row .
> Instead, I give follow suggestion.
> # the cpu cost is zero if the RelCollation is empty.
> # let heap_size be min\(offset + output_count, input_count), and use input_count * log\(heap_size)* row_byte to compute the cpu cost.
> When fetch is zero, I found the output_count is 1 not 0. This conveniently ensure the log\(heap_size) no less than zero
--
This message was sent by Atlassian Jira
(v8.3.4#803005)