You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Wail Alkowaileet <wa...@gmail.com> on 2017/06/25 00:50:55 UTC
[COMP] Few questions about Query Optimizer
Hi Devs,
I have few questions about the query optimizer.
*- Given the query:*
use dataverse TwitterDataverse
for $x in dataset Tweets
where $x.name = "trump"
let $geo := $x.geo
group by $name:=$x.name with $geo
return {"name": $name, "geo":$geo[0].coordinates.coordinates}
*- Logical Plan:*
distribute result [$$10] -- |UNPARTITIONED|
project ([$$10]) -- |UNPARTITIONED|
assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
group by ([$$name := $$x.getField("name")]) decor ([]) {
aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
nested tuple source -- |UNPARTITIONED|
} -- |UNPARTITIONED|
assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
empty-tuple-source -- |UNPARTITIONED|
*- Optimized Logical Plan:*
distribute result [$$10]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$10])
-- STREAM_PROJECT |PARTITIONED|
assign [$$10] <- [{"name": $$name, "geo": $$19.getField("coordinates")
}]
-- ASSIGN |PARTITIONED|
project ([$$name, $$19])
-- STREAM_PROJECT |PARTITIONED|
assign [$$19, $$22] <- [get-item($$9,
0).getField("coordinates"), get-item($$9,
0)]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
group by ([$$name := $$15]) decor ([]) {
aggregate [$$9] <- [listify($$geo)]
-- AGGREGATE |LOCAL|
nested tuple source
-- NESTED_TUPLE_SOURCE |LOCAL|
}
-- PRE_CLUSTERED_GROUP_BY[$$15] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$15)
-- STABLE_SORT [$$15(ASC)] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$15] |PARTITIONED|
select (eq($$15, "Alice"))
-- STREAM_SELECT |PARTITIONED|
project ([$$geo, $$15])
-- STREAM_PROJECT |PARTITIONED|
assign [$$geo, $$15] <- [$$x.getField("geo"),
$$x.getField("name")]
-- ASSIGN |PARTITIONED|
project ([$$x])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$16, $$x] <-
TwitterDataverse.Tweets
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
*- Questions:*
$$22:
- Why the variable $22 is produced ? Although there is no use for it. Is
it just a harmless bug or there's some intuition I might be missing?
$$19:
- It seems (sometimes) getField function calls are splitted. Is there a
reason why is that the case? (There's another example that reproduces the
same behavior)
- That leads to my next question, I see no rule for "FieldAccessNested"
which can be exploited here to save few function calls. Can this function
interfere with other functions/access methods?
--
*Regards,.*
Wail Alkowaileet
Re: [COMP] Few questions about Query Optimizer
Posted by Ildar Absalyamov <il...@gmail.com>.
If I remember correctly we eliminated FieldAccessNested function in favor in chained FieldAccessByName\ByIndex. @Steven, correct me if I am wrong.
> On Jun 24, 2017, at 18:00, Yingyi Bu <bu...@gmail.com> wrote:
>
> Hi Wail,
>
> $22 should be a harmless bug -- it's related to the ordering of rules.
> For $19: we could potentially have a rule for that.
>
> Best,
> Yingyi
>
> On Sat, Jun 24, 2017 at 5:50 PM, Wail Alkowaileet <wa...@gmail.com>
> wrote:
>
>> Hi Devs,
>>
>> I have few questions about the query optimizer.
>>
>> *- Given the query:*
>> use dataverse TwitterDataverse
>>
>> for $x in dataset Tweets
>> where $x.name = "trump"
>> let $geo := $x.geo
>> group by $name:=$x.name with $geo
>> return {"name": $name, "geo":$geo[0].coordinates.coordinates}
>>
>> *- Logical Plan:*
>> distribute result [$$10] -- |UNPARTITIONED|
>> project ([$$10]) -- |UNPARTITIONED|
>> assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
>> 0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
>> group by ([$$name := $$x.getField("name")]) decor ([]) {
>> aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
>> nested tuple source -- |UNPARTITIONED|
>> } -- |UNPARTITIONED|
>> assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
>> select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
>> unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
>> empty-tuple-source -- |UNPARTITIONED|
>>
>> *- Optimized Logical Plan:*
>> distribute result [$$10]
>> -- DISTRIBUTE_RESULT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> project ([$$10])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$10] <- [{"name": $$name, "geo":
>> $$19.getField("coordinates")
>> }]
>> -- ASSIGN |PARTITIONED|
>> project ([$$name, $$19])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$19, $$22] <- [get-item($$9,
>> 0).getField("coordinates"), get-item($$9,
>> 0)]
>> -- ASSIGN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> group by ([$$name := $$15]) decor ([]) {
>> aggregate [$$9] <- [listify($$geo)]
>> -- AGGREGATE |LOCAL|
>> nested tuple source
>> -- NESTED_TUPLE_SOURCE |LOCAL|
>> }
>> -- PRE_CLUSTERED_GROUP_BY[$$15] |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> order (ASC, $$15)
>> -- STABLE_SORT [$$15(ASC)] |PARTITIONED|
>> exchange
>> -- HASH_PARTITION_EXCHANGE [$$15] |PARTITIONED|
>> select (eq($$15, "Alice"))
>> -- STREAM_SELECT |PARTITIONED|
>> project ([$$geo, $$15])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$geo, $$15] <- [$$x.getField("geo"),
>> $$x.getField("name")]
>> -- ASSIGN |PARTITIONED|
>> project ([$$x])
>> -- STREAM_PROJECT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> data-scan []<-[$$16, $$x] <-
>> TwitterDataverse.Tweets
>> -- DATASOURCE_SCAN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> empty-tuple-source
>> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
>>
>> *- Questions:*
>> $$22:
>>
>> - Why the variable $22 is produced ? Although there is no use for it. Is
>> it just a harmless bug or there's some intuition I might be missing?
>>
>> $$19:
>>
>> - It seems (sometimes) getField function calls are splitted. Is there a
>> reason why is that the case? (There's another example that reproduces
>> the
>> same behavior)
>> - That leads to my next question, I see no rule for "FieldAccessNested"
>> which can be exploited here to save few function calls. Can this
>> function
>> interfere with other functions/access methods?
>>
>>
>> --
>>
>> *Regards,.*
>> Wail Alkowaileet
>>
Best regards,
Ildar
Re: [COMP] Few questions about Query Optimizer
Posted by Yingyi Bu <bu...@gmail.com>.
Hi Wail,
$22 should be a harmless bug -- it's related to the ordering of rules.
For $19: we could potentially have a rule for that.
Best,
Yingyi
On Sat, Jun 24, 2017 at 5:50 PM, Wail Alkowaileet <wa...@gmail.com>
wrote:
> Hi Devs,
>
> I have few questions about the query optimizer.
>
> *- Given the query:*
> use dataverse TwitterDataverse
>
> for $x in dataset Tweets
> where $x.name = "trump"
> let $geo := $x.geo
> group by $name:=$x.name with $geo
> return {"name": $name, "geo":$geo[0].coordinates.coordinates}
>
> *- Logical Plan:*
> distribute result [$$10] -- |UNPARTITIONED|
> project ([$$10]) -- |UNPARTITIONED|
> assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
> 0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
> group by ([$$name := $$x.getField("name")]) decor ([]) {
> aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
> nested tuple source -- |UNPARTITIONED|
> } -- |UNPARTITIONED|
> assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
> select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
> unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
> empty-tuple-source -- |UNPARTITIONED|
>
> *- Optimized Logical Plan:*
> distribute result [$$10]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$10])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$10] <- [{"name": $$name, "geo":
> $$19.getField("coordinates")
> }]
> -- ASSIGN |PARTITIONED|
> project ([$$name, $$19])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$19, $$22] <- [get-item($$9,
> 0).getField("coordinates"), get-item($$9,
> 0)]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$name := $$15]) decor ([]) {
> aggregate [$$9] <- [listify($$geo)]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$15] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, $$15)
> -- STABLE_SORT [$$15(ASC)] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$15] |PARTITIONED|
> select (eq($$15, "Alice"))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$geo, $$15])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$geo, $$15] <- [$$x.getField("geo"),
> $$x.getField("name")]
> -- ASSIGN |PARTITIONED|
> project ([$$x])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$16, $$x] <-
> TwitterDataverse.Tweets
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
>
> *- Questions:*
> $$22:
>
> - Why the variable $22 is produced ? Although there is no use for it. Is
> it just a harmless bug or there's some intuition I might be missing?
>
> $$19:
>
> - It seems (sometimes) getField function calls are splitted. Is there a
> reason why is that the case? (There's another example that reproduces
> the
> same behavior)
> - That leads to my next question, I see no rule for "FieldAccessNested"
> which can be exploited here to save few function calls. Can this
> function
> interfere with other functions/access methods?
>
>
> --
>
> *Regards,.*
> Wail Alkowaileet
>