You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2016/01/03 03:48:39 UTC

[jira] [Created] (ASTERIXDB-1246) Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.

Taewoo Kim created ASTERIXDB-1246:
-------------------------------------

             Summary: Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.
                 Key: ASTERIXDB-1246
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1246
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


Unnecessary decor variables of a group-by is not removed until PushProjectDownRule is fired.

Currently, group-by for a subplan is introduced when IntroduceGroupByForSubplanRule is fired. At this time, decor variables for the new group-by operator are also added based on the variable usage after the new group-by operator.

After this rule, other optimizations might make decor variables unnecessary. One example is that an assign after group-by can be moved before the group-by operator so that a record variable (e.g., $$0) that is required for the given assign does not need to be passed through the group-by operator. These unnecessary decor variables will be removed only when PushProjectDownRule is fired. 

As the rule name suggests, PushProjectDownRule rule will be fired only when we have a project operator in the plan. Currently in my branch (index-only plan branch), this affects the IntroduceSelectAccessMethodRule, which transforms a plan into indexes-utilization plan. In this rule, it checks whether the given plan is an index-only plan by checking variables used after a SELECT operator. If only secondary key and/or primary key are used, then the given plan is an index-only plan and we can use a secodnary-index search to return SK and PK. 

The issue is that IntroduceSelectAccessMethodRule is fired before PushProjectDownRule and generally there is no project is introduced in the plan before IntroduceSelectAccessMethodRule. So, these unnecessary decor variables are not used; however, they still sit in the plan so that the optimizer wrongly decides the given plan as a non-index-only plan. The following is an example query. If we have a secondary index on count1 (PK:tweetid), then this should be qualified as an index-only plan for the outer branch. In fact, it doesn't because of unnecessary decor variables that still sit after some optimizations.

for $t1 in dataset('TweetMessages')
where $t1.countA > 0
return {
"tweetid1": $t1.tweetid,
"count1":$t1.countA,
"t2info": for $t2 in dataset('TweetMessages')
                        where $t1.countA /* +indexnl */= $t2.tweetid
                        return {"tweetid2": $t2.tweetid,
                                "count2": $t2.countB}
}

We can separate PushProjectDownRule rule into two rules: push project down and clean decor variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)