You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2020/04/30 19:06:00 UTC
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

    [ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096889#comment-17096889 ] 

Julian Hyde commented on CALCITE-3963:
--------------------------------------

Minor quibble: in JIRA subject, use the imperative form of the verb ("Maintain") rather than third-person active ("Maintains")

When you stay "maintain" do you mean "store"? I'm not sure I agree. The metadata system allows us to derive a property for any {{RelNode}} (e.g. calling {{RelMetadataQuery. getUniqueKeys(RelNode rel, boolean ignoreNulls)}} on a particular {{LogicalProject}}) and it also maintains a cache, so that once derived, the value does not have to be re-computed.

So, the metadata system allows us to not worry too much about whether values are stored, which is good.

Now, let's suppose that you want to know the unique keys of a particular {{RelSet}} (or {{RelSubSet}} - the reasoning is similar). Unique keys are a logical property, so we should be able to derive the set of unique keys by taking the union of the unique keys of every {{RelNode}} in that set.

If you add a {{RelNode}} to a set, or merge sets, then the set may acquire additional unique keys. And those keys may cause changes to unique keys (and other metadata) for any {{RelNode}} that consumes any {{RelNode}} in the set. It's complicated, so we should lean on the metadata system to maintain everything for us.

I think we need to add a 'fold' operator to each type of metadata to say how the metadata of the {{RelSet}} is derived from those of the constituent nodes. In the case of {{RelMdUniqueKeys}} the fold operator is 'union'. (In SQL terms, the 'fold' operator would be called a 'roll up', that is, an aggregate function. {{RelMdMinRowCount}} rolls up using {{MAX}}. Et cetera.)

As I said earlier, we should not focus on where the {{RelSet}}'s metadata is stored. Let the metadata system worry about that. Focus instead on how the metadata is derived.



> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) are maintained at RelNode level. This creates a number of meta data consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties per definition of relational equivalence. So it makes more sense to keep logical properties at RelSet level, rather than the RelNode. And such properties shouldn't change when new sub set is created or subset's best is changed.
> Specifically I think below build in metadata should fall into the logical properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)