You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/09/07 23:20:00 UTC
[jira] [Commented] (CALCITE-1933) Cost estimation for expression evaluate

    [ https://issues.apache.org/jira/browse/CALCITE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157839#comment-16157839 ] 

Julian Hyde commented on CALCITE-1933:
--------------------------------------

Sorry for the delayed reply; somehow I mislaid this message.

Yes, I think this would be great to add.

Can you explain the purpose of the {{predicate}} argument? I think I understand the rest of it.

There is a related issue that I will need to solve that crops up in CALCITE-1861 (spatial indexes). I wonder if you have any advice, or whether it affects this proposal. Given an expensive filter such as {code}WHERE ST_DWithin(x, y, ST_Point(10, 20), 3){code} (let's call it "f1"), we would like to add an approximate pre-filter (call it "f2"), converting the condition into {code}WHERE (h BETWEEN 10 AND 13 OR h BETWEEN 17 AND 19)
AND ST_DWithin(x, y, ST_Point(10, 20), 3){code}

The pre-filter f2 is cheaper than the regular filter f1 because ({{BETWEEN}} uses less CPU than {{ST_DWithin}}, and exploits sorting so will inspect many fewer rows) but it is approximate. Also it affects the selectivity of the regular filter; or, put another way, the selectivity of (f2 AND f1) is the same as the selectivity of f1. To model the cost effectively, we need to represent the fact that f1 is being applied to many fewer rows after f2 has been applied. Will your proposal handle this?

> Cost estimation for expression evaluate
> ---------------------------------------
>
>                 Key: CALCITE-1933
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1933
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Ted Xu
>            Assignee: Julian Hyde
>
> Expression evaluate performance is different one to another based on its operator type and underlying execution engine. For example, a regex pattern matching (e.g., col like 'foo%bar%') may requires 100 times more cost compares to a simple comparison (e.g., col > 3). Currently we don't have this kind of differentiation.
> Expression cost will grant following applications:
> 1. Improved predicate push down: predicates can be pushed down to the right position based on row count and its own cost.
> 2. Predicate ordering rule: predicate with different order, e.g.,
> {code:sql}
> col1 like 'foo%bar%' and col2 > 3
> col2 > 3 and col1 like 'foo%bar%'
> {code}
>  
> may have different performance, introducing a predicate ordering rule will help find a better one.
> The cost of each RexCall / RexAggCall is provided by Schema, stored in an external metadata service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)