You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Till (JIRA)" <ji...@apache.org> on 2016/08/17 20:45:20 UTC
[jira] [Updated] (ASTERIXDB-1266) Sqlpp query degrades the
performance by picking NLJ instead of HHJ
[ https://issues.apache.org/jira/browse/ASTERIXDB-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till updated ASTERIXDB-1266:
----------------------------
Labels: soon (was: )
> Sqlpp query degrades the performance by picking NLJ instead of HHJ
> ------------------------------------------------------------------
>
> Key: ASTERIXDB-1266
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1266
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Pouria
> Assignee: Yingyi Bu
> Labels: soon
>
> Translating TPCH Q17 from AQL to Sqlpp causes the optimizer to pick NLJ (instead of HHJ) which drops the performance of the query.
> Below you can find the AQL and Sqlpp versions of the query along with their corresponding optimized plans:
> Q17-SQL-PP
> {noformat}
> use tpch_1g;
> declare function tmp() {
> (
> select element {'t_partkey':l_partkey,'t_avg_quantity':(0.2 * avg((
> select element i.l_quantity
> from l as i
> )))}
> from LineItem as l
> group by l.l_partkey as l_partkey
> )
> };
> select element ( sum((
> select element l.l_extendedprice
> from LineItem as l,
> Part as p,
> tmp() as t
> where (((p.p_partkey = l.l_partkey) and (p.p_brand = 'Brand#23') and (p.p_container = 'MED BOX')) and ((l.l_partkey = t.t_partkey) and (l.l_quantity < t.t_avg_quantity)))
> )) / 7.0);
> {noformat}
> Query plan:
> {noformat}
> distribute result [%0->$$54]
> -- DISTRIBUTE_RESULT |UNPARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
> project ([$$54])
> -- STREAM_PROJECT |UNPARTITIONED|
> assign [$$54] <- [function-call: asterix:numeric-divide, Args:[%0->$$55, ADouble: {7.0}]]
> -- ASSIGN |UNPARTITIONED|
> aggregate [$$55] <- [function-call: asterix:agg-sum, Args:[%0->$$72]]
> -- AGGREGATE |UNPARTITIONED|
> exchange
> -- RANDOM_MERGE_EXCHANGE |PARTITIONED|
> aggregate [$$72] <- [function-call: asterix:agg-local-sum, Args:[%0->$$51]]
> -- AGGREGATE |PARTITIONED|
> project ([$$51])
> -- STREAM_PROJECT |PARTITIONED|
> select (function-call: algebricks:lt, Args:[%0->$$65, %0->$$66])
> -- STREAM_SELECT |PARTITIONED|
> project ([$$65, $$66, $$51])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (function-call: algebricks:and, Args:[function-call: algebricks:eq, Args:[%0->$$57, %0->$$14], function-call: algebricks:eq, Args:[%0->$$60, %0->$$57]])
> -- NESTED_LOOP |PARTITIONED|
> exchange
> -- BROADCAST_EXCHANGE |PARTITIONED|
> project ([$$65, $$51, $$57])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$51, $$65, $$57] <- [function-call: asterix:field-access-by-index, Args:[%0->$$11, AInt32: {5}], function-call: asterix:field-access-by-index, Args:[%0->$$11, AInt32: {4}], function-call: asterix:field-access-by-index, Args:[%0->$$11, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> project ([$$11])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$11] <- [%0->$$15]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> replicate
> -- SPLIT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$15])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$61, $$62, $$15] <- tpch_1g:LineItem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (TRUE)
> -- NESTED_LOOP |PARTITIONED|
> exchange
> -- BROADCAST_EXCHANGE |PARTITIONED|
> project ([$$60])
> -- STREAM_PROJECT |PARTITIONED|
> select (function-call: algebricks:and, Args:[function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-index, Args:[%0->$$16, AInt32: {6}], AString: {MED BOX}], function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-index, Args:[%0->$$16, AInt32: {3}], AString: {Brand#23}]])
> -- STREAM_SELECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$60, $$16] <- tpch_1g:Part
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$66, $$14])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$66] <- [function-call: asterix:numeric-multiply, Args:[ADouble: {0.2}, %0->$$63]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$14 := %0->$$71]) decor ([]) {
> aggregate [$$63] <- [function-call: asterix:agg-global-avg, Args:[%0->$$70]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$71] |PARTITIONED|
> exchange
> -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$71(ASC)] HASH:[$$71] |PARTITIONED|
> group by ([$$71 := %0->$$56]) decor ([]) {
> aggregate [$$70] <- [function-call: asterix:agg-local-avg, Args:[%0->$$29]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- SORT_GROUP_BY[$$56] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$56, $$29])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$29, $$56] <- [function-call: asterix:field-access-by-index, Args:[%0->$$15, AInt32: {4}], function-call: asterix:field-access-by-index, Args:[%0->$$15, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> replicate
> -- SPLIT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$15])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$61, $$62, $$15] <- tpch_1g:LineItem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> {noformat}
> Q17-AQL
> {noformat}
> declare function tmp(){
> for $l in dataset('LineItem')
> group by $l_partkey := $l.l_partkey with $l
> return {
> "t_partkey": $l_partkey,
> "t_avg_quantity": 0.2 * avg(for $i in $l return $i.l_quantity)
> }
> }
> sum(
> for $l in dataset('LineItem')
> for $p in dataset('Part')
> where $p.p_partkey = $l.l_partkey
> and $p.p_brand = 'Brand#23'
> and $p.p_container = 'MED BOX'
> for $t in tmp()
> where $l.l_partkey = $t.t_partkey
> and $l.l_quantity < $t.t_avg_quantity
> return $l.l_extendedprice
> )/7.0
> {noformat}
> {noformat}
> distribute result [%0->$$7]
> -- DISTRIBUTE_RESULT |UNPARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
> project ([$$7])
> -- STREAM_PROJECT |UNPARTITIONED|
> assign [$$7] <- [function-call: asterix:numeric-divide, Args:[%0->$$41, ADouble: {7.0}]]
> -- ASSIGN |UNPARTITIONED|
> aggregate [$$41] <- [function-call: asterix:agg-sum, Args:[%0->$$58]]
> -- AGGREGATE |UNPARTITIONED|
> exchange
> -- RANDOM_MERGE_EXCHANGE |PARTITIONED|
> aggregate [$$58] <- [function-call: asterix:agg-local-sum, Args:[%0->$$38]]
> -- AGGREGATE |PARTITIONED|
> project ([$$38])
> -- STREAM_PROJECT |PARTITIONED|
> select (function-call: algebricks:lt, Args:[%0->$$50, %0->$$51])
> -- STREAM_SELECT |PARTITIONED|
> project ([$$50, $$51, $$38])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (function-call: algebricks:eq, Args:[%0->$$43, %0->$$9])
> -- HYBRID_HASH_JOIN [$$43][$$9] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$50, $$38, $$43])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (function-call: algebricks:eq, Args:[%0->$$46, %0->$$43])
> -- HYBRID_HASH_JOIN [$$43][$$46] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$43] |PARTITIONED|
> project ([$$50, $$38, $$43])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$38, $$50, $$43] <- [function-call: asterix:field-access-by-index, Args:[%0->$$4, AInt32: {5}], function-call: asterix:field-access-by-index, Args:[%0->$$4, AInt32: {4}], function-call: asterix:field-access-by-index, Args:[%0->$$4, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> project ([$$4])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$4] <- [%0->$$8]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> replicate
> -- SPLIT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$47, $$48, $$8] <- tpch_1g:LineItem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$46])
> -- STREAM_PROJECT |PARTITIONED|
> select (function-call: algebricks:and, Args:[function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-index, Args:[%0->$$5, AInt32: {3}], AString: {Brand#23}], function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-index, Args:[%0->$$5, AInt32: {6}], AString: {MED BOX}]])
> -- STREAM_SELECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$46, $$5] <- tpch_1g:Part
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$9] |PARTITIONED|
> project ([$$51, $$9])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$51] <- [function-call: asterix:numeric-multiply, Args:[ADouble: {0.2}, %0->$$53]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$9 := %0->$$57]) decor ([]) {
> aggregate [$$53] <- [function-call: asterix:agg-global-avg, Args:[%0->$$56]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$57] |PARTITIONED|
> exchange
> -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$57(ASC)] HASH:[$$57] |PARTITIONED|
> group by ([$$57 := %0->$$42]) decor ([]) {
> aggregate [$$56] <- [function-call: asterix:agg-local-avg, Args:[%0->$$27]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- SORT_GROUP_BY[$$42] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$42, $$27])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$27, $$42] <- [function-call: asterix:field-access-by-index, Args:[%0->$$8, AInt32: {4}], function-call: asterix:field-access-by-index, Args:[%0->$$8, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> replicate
> -- SPLIT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$47, $$48, $$8] <- tpch_1g:LineItem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> {noformat}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)