You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Xiening Dai (Jira)" <ji...@apache.org> on 2020/06/01 17:16:00 UTC

[jira] [Commented] (CALCITE-3963) Maintain logical properties at RelSet (equivalent group) instead of RelNode

    [ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121177#comment-17121177 ] 

Xiening Dai commented on CALCITE-3963:
--------------------------------------

_*What's wrong with my suggestion to treat all RelNode instances in a set as equivalent? Use an order-independent (and monotonic) folding operations such as 'min', 'max', 'union' to combine property values.*_

Right now each RelNode has its own algorithm, and come up with estimated row count independently. From relational algebra point of view, I just don't see why we would average them or choose the min/max value to represent the set. Also in the example I described, let's say we have a materialized view that's equivalant to a join, then we have a TableScan node existed in the same RelSet as the join node. Now the table scan come from a materialized view which has the accurate statistics, such as row count, why would we average it with the join node, or use the min/max between them? The other example is MultiJoin. Currently MultiJoin doesn't have an implementation of getEstimatedRowCount(), and always returns 1 as row count. That's understandable since it's usually harder to estimate row count when you have multiple join inputs. After it's been converted into LogicalJoin, we get a better estimate. Using the confidence level, we can now update RelSet row count to use the one from LogicalJoin. Again it won't make sense if we average them or use the min/max here.

Regarding the non-deterministic comment, I believe as long as the rule firing order doesn't change, the behavior is deterministic. If rule firing order is different, a lot of things could be different. Even for the best plan, we don't update sub set's best as long as the cost is the same, which means we always choose the first plan among those that have the same cost. Thus when the rule firing order is changed, the order of rel node creation is changed, the best plan could be different. We have seen that a lot.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) are maintained at RelNode level. This creates a number of meta data consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties per definition of relational equivalence. So it makes more sense to keep logical properties at RelSet level, rather than the RelNode. And such properties shouldn't change when new sub set is created or subset's best is changed.
> Specifically I think below build in metadata should fall into the logical properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)