You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Till Westmann (JIRA)" <ji...@apache.org> on 2016/01/03 07:47:47 UTC
[jira] [Commented] (ASTERIXDB-1246) Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076773#comment-15076773 ] 

Till Westmann commented on ASTERIXDB-1246:
------------------------------------------

[~wangsaeu] I assume that you see this behavior after [~buyingyi]'s commit e3e13735b760491482ac7dd680dec58c5f635c16 on master.
Is that correct?

> Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.
> ---------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1246
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1246
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>
> Unnecessary decor variables of a group-by is not removed until PushProjectDownRule is fired.
> Currently, group-by for a subplan is introduced when IntroduceGroupByForSubplanRule is fired. At this time, decor variables for the new group-by operator are also added based on the variable usage after the new group-by operator.
> After this rule, other optimizations might make decor variables unnecessary. One example is that an assign after group-by can be moved before the group-by operator so that a record variable (e.g., $$0) that is required for the given assign does not need to be passed through the group-by operator. These unnecessary decor variables will be removed only when PushProjectDownRule is fired. 
> As the rule name suggests, PushProjectDownRule rule will be fired only when we have a project operator in the plan. Currently in my branch (index-only plan branch), this affects the IntroduceSelectAccessMethodRule, which transforms a plan into indexes-utilization plan. In this rule, it checks whether the given plan is an index-only plan by checking variables used after a SELECT operator. If only secondary key and/or primary key are used, then the given plan is an index-only plan and we can use a secodnary-index search to return SK and PK. 
> The issue is that IntroduceSelectAccessMethodRule is fired before PushProjectDownRule and generally there is no project is introduced in the plan before IntroduceSelectAccessMethodRule. So, these unnecessary decor variables are not used; however, they still sit in the plan so that the optimizer wrongly decides the given plan as a non-index-only plan. The following is an example query. If we have a secondary index on count1 (PK:tweetid), then this should be qualified as an index-only plan for the outer branch. In fact, it doesn't because of unnecessary decor variables that still sit after some optimizations.
> for $t1 in dataset('TweetMessages')
> where $t1.countA > 0
> return {
> "tweetid1": $t1.tweetid,
> "count1":$t1.countA,
> "t2info": for $t2 in dataset('TweetMessages')
>                         where $t1.countA /* +indexnl */= $t2.tweetid
>                         return {"tweetid2": $t2.tweetid,
>                                 "count2": $t2.countB}
> }
> We can separate PushProjectDownRule rule into two rules: push project down and clean decor variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)