You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2021/04/09 07:48:00 UTC
[jira] [Comment Edited] (CALCITE-4522) CPU cost of Sort should be
lower if sort keys are empty
[ https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317748#comment-17317748 ]
Ruben Q L edited comment on CALCITE-4522 at 4/9/21, 7:47 AM:
-------------------------------------------------------------
That is a fair question, and honestly I'm not sure about the answer ({{RelOptCostFactory}} interface does not clarify it). However, it would seem that, at least for some operators, this parameter is calculated with a certain formula that represents (somehow) their computational cost, e.g. [EnumerableHashJoin|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java#L129], [EnumerableMergeJoin|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L396], [Correlate|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/rel/core/Correlate.java#L206], ...
In any case, this is not a blocking issue; as a workaround, I can overcome it by defining my own metadada for NonCumulativeCost for EnumerableSort & EnumerableLimitSort, but I just wanted to alert about this "regression" before the next release is built.
was (Author: rubenql):
That is a fair question, and honestly I'm not sure about the answer ({{RelOptCostFactory}} interface does not clarify it). However, it would seem that, at least for some operators, this parameter is calculated with a certain formula that represents (somehow) their computational cost, e.g. [EnumerableHashJoin|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java#L129], [EnumerableMergeJoin|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L396], [Correlate|https://github.com/apache/calcite/blob/8581f0a3fe9a4f079cb4d36f02121ae22118714c/core/src/main/java/org/apache/calcite/rel/core/Correlate.java#L206], ...
> CPU cost of Sort should be lower if sort keys are empty
> -------------------------------------------------------
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: hqx
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.27.0
>
> Time Spent: 9h 50m
> Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
> # When the RelCollation is empty, there is no need to sort, but it still compute the cpu cost of sort.
> # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, where n means the output row count of the sort operator, and row_byte means the average bytes of one row .
> Instead, I give follow suggestion.
> # the cpu cost is zero if the RelCollation is empty.
> # let heap_size be min(offset + fetch, input_count), and use input_count * max(1, log(heap_size))* row_byte to compute the cpu cost.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)