You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/03/08 11:35:38 UTC

[jira] [Created] (CALCITE-1682) New metadata providers for expression column origin and all predicates in plan

Jesus Camacho Rodriguez created CALCITE-1682:
------------------------------------------------

             Summary: New metadata providers for expression column origin and all predicates in plan
                 Key: CALCITE-1682
                 URL: https://issues.apache.org/jira/browse/CALCITE-1682
             Project: Calcite
          Issue Type: New Feature
          Components: core
    Affects Versions: 1.12.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


I am working on the integration of materialized view rewriting within Hive.

Once a view matches an operator plan, rewriting is split vastly in two steps. The first step will verify that the input to the root operator of the matched plan is equivalent or contained within the input to the root operator of the query representing the view. The second step will trigger a _unify_ rule, which tries to rewrite the matched operator tree into a scan on the view and possibly some additional operators to compute the exact results needed by the query (think about Project that alters the column order, additional Filter on the view, additional Join operation, etc.)

If we focus on step 1, checking equivalence/containment, I would like to extend the metadata providers in Calcite to give us more information about the matched (sub)plan. In particular, I am thinking on:
- Expression column origin. Currently Calcite can provide the column origins for a certain column and whether it is derived or not. However, we would need to obtain the expression that generated a certain column. This expression should contain references to the input tables. For instance, given expression column _c_, the new md provider would return that it was generated by expression _A.a + B.b_. 
- All predicates. Currently Calcite can extract predicates that have been applied on an RelNode output (we can think on them as constraints on the output). However, I would like to extract all predicates that have been applied on a given RelNode (sub)plan. Since nodes might not be part of the output, expressions should contain references to the input tables. For instance, the new md provider might return the expressions _A.a + B.b > C.c AND D.d = 100_.
- PK-FK relationship. I do not plan to implement this one immediately. However, exposing this information (given it is provided) can help us to trigger more rewriting containing join operators. Thus, I was wondering if it is worth adding it.

Once this information is available, we can rely on it to implement logic similar to [1] to check whether a given (sub)plan is equivalent/contained within a given view.

One question I have is about representing the table columns as a RexNode, as I think it is the easiest way to be returned by the new metadata providers. I checked _RexPatternFieldRef_ and I think it will meet our requirements: alpha would be the qualified table name, while the index is the column idx for the table. Thoughts?

I have started working on this and will provide a patch shortly; feedback is greatly appreciated.

[1] ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)