You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Wail Alkowaileet <wa...@gmail.com> on 2017/06/25 00:50:55 UTC

[COMP] Few questions about Query Optimizer

Hi Devs,

I have few questions about the query optimizer.

*- Given the query:*
use dataverse TwitterDataverse

for $x in dataset Tweets
where $x.name = "trump"
let $geo := $x.geo
group by $name:=$x.name with $geo
return {"name": $name, "geo":$geo[0].coordinates.coordinates}

*- Logical Plan:*
distribute result [$$10] -- |UNPARTITIONED|
  project ([$$10]) -- |UNPARTITIONED|
    assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
      group by ([$$name := $$x.getField("name")]) decor ([]) {
                aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
                  nested tuple source -- |UNPARTITIONED|
             } -- |UNPARTITIONED|
        assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
          select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
            unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
              empty-tuple-source -- |UNPARTITIONED|

*- Optimized Logical Plan:*
distribute result [$$10]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$10])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$10] <- [{"name": $$name, "geo": $$19.getField("coordinates")
}]
      -- ASSIGN  |PARTITIONED|
        project ([$$name, $$19])
        -- STREAM_PROJECT  |PARTITIONED|
          assign [$$19, $$22] <- [get-item($$9,
0).getField("coordinates"), get-item($$9,
0)]
          -- ASSIGN  |PARTITIONED|
            exchange
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              group by ([$$name := $$15]) decor ([]) {
                        aggregate [$$9] <- [listify($$geo)]
                        -- AGGREGATE  |LOCAL|
                          nested tuple source
                          -- NESTED_TUPLE_SOURCE  |LOCAL|
                     }
              -- PRE_CLUSTERED_GROUP_BY[$$15]  |PARTITIONED|
                exchange
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  order (ASC, $$15)
                  -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
                    exchange
                    -- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
                      select (eq($$15, "Alice"))
                      -- STREAM_SELECT  |PARTITIONED|
                        project ([$$geo, $$15])
                        -- STREAM_PROJECT  |PARTITIONED|
                          assign [$$geo, $$15] <- [$$x.getField("geo"),
$$x.getField("name")]
                          -- ASSIGN  |PARTITIONED|
                            project ([$$x])
                            -- STREAM_PROJECT  |PARTITIONED|
                              exchange
                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                data-scan []<-[$$16, $$x] <-
TwitterDataverse.Tweets
                                -- DATASOURCE_SCAN  |PARTITIONED|
                                  exchange
                                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                    empty-tuple-source
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

*- Questions:*
$$22:

   - Why the variable $22 is produced ? Although there is no use for it. Is
   it just a harmless bug or there's some intuition I might be missing?

$$19:

   - It seems (sometimes) getField function calls are splitted. Is there a
   reason why is that the case? (There's another example that reproduces the
   same behavior)
   - That leads to my next question, I see no rule for "FieldAccessNested"
   which can be exploited here to save few function calls. Can this function
   interfere with other functions/access methods?


-- 

*Regards,.*
Wail Alkowaileet

Re: [COMP] Few questions about Query Optimizer

Posted by Ildar Absalyamov <il...@gmail.com>.
If I remember correctly we eliminated FieldAccessNested function in favor in chained FieldAccessByName\ByIndex. @Steven, correct me if I am wrong.

> On Jun 24, 2017, at 18:00, Yingyi Bu <bu...@gmail.com> wrote:
> 
> Hi Wail,
> 
>    $22 should be a harmless bug -- it's related to the ordering of rules.
>    For $19:  we could potentially have a rule for that.
> 
> Best,
> Yingyi
> 
> On Sat, Jun 24, 2017 at 5:50 PM, Wail Alkowaileet <wa...@gmail.com>
> wrote:
> 
>> Hi Devs,
>> 
>> I have few questions about the query optimizer.
>> 
>> *- Given the query:*
>> use dataverse TwitterDataverse
>> 
>> for $x in dataset Tweets
>> where $x.name = "trump"
>> let $geo := $x.geo
>> group by $name:=$x.name with $geo
>> return {"name": $name, "geo":$geo[0].coordinates.coordinates}
>> 
>> *- Logical Plan:*
>> distribute result [$$10] -- |UNPARTITIONED|
>>  project ([$$10]) -- |UNPARTITIONED|
>>    assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
>> 0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
>>      group by ([$$name := $$x.getField("name")]) decor ([]) {
>>                aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
>>                  nested tuple source -- |UNPARTITIONED|
>>             } -- |UNPARTITIONED|
>>        assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
>>          select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
>>            unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
>>              empty-tuple-source -- |UNPARTITIONED|
>> 
>> *- Optimized Logical Plan:*
>> distribute result [$$10]
>> -- DISTRIBUTE_RESULT  |PARTITIONED|
>>  exchange
>>  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>    project ([$$10])
>>    -- STREAM_PROJECT  |PARTITIONED|
>>      assign [$$10] <- [{"name": $$name, "geo":
>> $$19.getField("coordinates")
>> }]
>>      -- ASSIGN  |PARTITIONED|
>>        project ([$$name, $$19])
>>        -- STREAM_PROJECT  |PARTITIONED|
>>          assign [$$19, $$22] <- [get-item($$9,
>> 0).getField("coordinates"), get-item($$9,
>> 0)]
>>          -- ASSIGN  |PARTITIONED|
>>            exchange
>>            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>              group by ([$$name := $$15]) decor ([]) {
>>                        aggregate [$$9] <- [listify($$geo)]
>>                        -- AGGREGATE  |LOCAL|
>>                          nested tuple source
>>                          -- NESTED_TUPLE_SOURCE  |LOCAL|
>>                     }
>>              -- PRE_CLUSTERED_GROUP_BY[$$15]  |PARTITIONED|
>>                exchange
>>                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                  order (ASC, $$15)
>>                  -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
>>                    exchange
>>                    -- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
>>                      select (eq($$15, "Alice"))
>>                      -- STREAM_SELECT  |PARTITIONED|
>>                        project ([$$geo, $$15])
>>                        -- STREAM_PROJECT  |PARTITIONED|
>>                          assign [$$geo, $$15] <- [$$x.getField("geo"),
>> $$x.getField("name")]
>>                          -- ASSIGN  |PARTITIONED|
>>                            project ([$$x])
>>                            -- STREAM_PROJECT  |PARTITIONED|
>>                              exchange
>>                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                                data-scan []<-[$$16, $$x] <-
>> TwitterDataverse.Tweets
>>                                -- DATASOURCE_SCAN  |PARTITIONED|
>>                                  exchange
>>                                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                                    empty-tuple-source
>>                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>> 
>> *- Questions:*
>> $$22:
>> 
>>   - Why the variable $22 is produced ? Although there is no use for it. Is
>>   it just a harmless bug or there's some intuition I might be missing?
>> 
>> $$19:
>> 
>>   - It seems (sometimes) getField function calls are splitted. Is there a
>>   reason why is that the case? (There's another example that reproduces
>> the
>>   same behavior)
>>   - That leads to my next question, I see no rule for "FieldAccessNested"
>>   which can be exploited here to save few function calls. Can this
>> function
>>   interfere with other functions/access methods?
>> 
>> 
>> --
>> 
>> *Regards,.*
>> Wail Alkowaileet
>> 

Best regards,
Ildar


Re: [COMP] Few questions about Query Optimizer

Posted by Yingyi Bu <bu...@gmail.com>.
Hi Wail,

    $22 should be a harmless bug -- it's related to the ordering of rules.
    For $19:  we could potentially have a rule for that.

Best,
Yingyi

On Sat, Jun 24, 2017 at 5:50 PM, Wail Alkowaileet <wa...@gmail.com>
wrote:

> Hi Devs,
>
> I have few questions about the query optimizer.
>
> *- Given the query:*
> use dataverse TwitterDataverse
>
> for $x in dataset Tweets
> where $x.name = "trump"
> let $geo := $x.geo
> group by $name:=$x.name with $geo
> return {"name": $name, "geo":$geo[0].coordinates.coordinates}
>
> *- Logical Plan:*
> distribute result [$$10] -- |UNPARTITIONED|
>   project ([$$10]) -- |UNPARTITIONED|
>     assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
> 0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
>       group by ([$$name := $$x.getField("name")]) decor ([]) {
>                 aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
>                   nested tuple source -- |UNPARTITIONED|
>              } -- |UNPARTITIONED|
>         assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
>           select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
>             unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
>               empty-tuple-source -- |UNPARTITIONED|
>
> *- Optimized Logical Plan:*
> distribute result [$$10]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$10])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$10] <- [{"name": $$name, "geo":
> $$19.getField("coordinates")
> }]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$name, $$19])
>         -- STREAM_PROJECT  |PARTITIONED|
>           assign [$$19, $$22] <- [get-item($$9,
> 0).getField("coordinates"), get-item($$9,
> 0)]
>           -- ASSIGN  |PARTITIONED|
>             exchange
>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>               group by ([$$name := $$15]) decor ([]) {
>                         aggregate [$$9] <- [listify($$geo)]
>                         -- AGGREGATE  |LOCAL|
>                           nested tuple source
>                           -- NESTED_TUPLE_SOURCE  |LOCAL|
>                      }
>               -- PRE_CLUSTERED_GROUP_BY[$$15]  |PARTITIONED|
>                 exchange
>                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                   order (ASC, $$15)
>                   -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
>                     exchange
>                     -- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
>                       select (eq($$15, "Alice"))
>                       -- STREAM_SELECT  |PARTITIONED|
>                         project ([$$geo, $$15])
>                         -- STREAM_PROJECT  |PARTITIONED|
>                           assign [$$geo, $$15] <- [$$x.getField("geo"),
> $$x.getField("name")]
>                           -- ASSIGN  |PARTITIONED|
>                             project ([$$x])
>                             -- STREAM_PROJECT  |PARTITIONED|
>                               exchange
>                               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                 data-scan []<-[$$16, $$x] <-
> TwitterDataverse.Tweets
>                                 -- DATASOURCE_SCAN  |PARTITIONED|
>                                   exchange
>                                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                     empty-tuple-source
>                                     -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>
> *- Questions:*
> $$22:
>
>    - Why the variable $22 is produced ? Although there is no use for it. Is
>    it just a harmless bug or there's some intuition I might be missing?
>
> $$19:
>
>    - It seems (sometimes) getField function calls are splitted. Is there a
>    reason why is that the case? (There's another example that reproduces
> the
>    same behavior)
>    - That leads to my next question, I see no rule for "FieldAccessNested"
>    which can be exploited here to save few function calls. Can this
> function
>    interfere with other functions/access methods?
>
>
> --
>
> *Regards,.*
> Wail Alkowaileet
>