You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2014/11/25 09:58:15 UTC

The cost of correlation

Hi,

While working on LogicalCorrelate, I identified that Calcite's cost
model does not have room for rescans.
In other words, the cost of the plan Correlate(ScanA, ScanB) is just
selfCost(ScanA)+selfCost(ScanB)+selfCost(Correlate).

What is the intended way to report rescan cost?
Should selfCost(Correlate) include numRows(A)*fullCost(ScanB) -
fullCost(ScanB) ? /* Calcite will add fullCost(ScanB), so this -1 is a
compensation */

-- 
Vladimir Sitnikov

Re: The cost of correlation

Posted by Julian Hyde <ju...@gmail.com>.
Yes. But it’s never been tried… so let me know how it goes.

The related thing that has not been tried: a DAG query plan where node X is evaluated once but used by 2 or more parents, and we only want to cost it once. See https://issues.apache.org/jira/browse/CALCITE-481.

Julian


On Nov 25, 2014, at 5:31 AM, Vladimir Sitnikov <si...@gmail.com> wrote:

> Do you mean to specialize RelMetadataQuery.getCumulativeCost(Correlate) ?
> 
> Vladimir


Re: The cost of correlation

Posted by Vladimir Sitnikov <si...@gmail.com>.
Do you mean to specialize RelMetadataQuery.getCumulativeCost(Correlate) ?

Vladimir

Re: The cost of correlation

Posted by Julian Hyde <ju...@hydromatic.net>.
I don't know. How about if a RelNode's cost means the cost of each
execution (yeah, I know, we're ignoring the fact that the 1st is
always slower than the rest) and Correlate multiplies the cost of the
RHS by the estimated number of restarts?

On Tue, Nov 25, 2014 at 12:58 AM, Vladimir Sitnikov
<si...@gmail.com> wrote:
> Hi,
>
> While working on LogicalCorrelate, I identified that Calcite's cost
> model does not have room for rescans.
> In other words, the cost of the plan Correlate(ScanA, ScanB) is just
> selfCost(ScanA)+selfCost(ScanB)+selfCost(Correlate).
>
> What is the intended way to report rescan cost?
> Should selfCost(Correlate) include numRows(A)*fullCost(ScanB) -
> fullCost(ScanB) ? /* Calcite will add fullCost(ScanB), so this -1 is a
> compensation */
>
> --
> Vladimir Sitnikov