You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Chen Luo (JIRA)" <ji...@apache.org> on 2018/03/30 17:30:00 UTC
[jira] [Created] (ASTERIXDB-2350) Improvement of IN subquery
Chen Luo created ASTERIXDB-2350:
-----------------------------------
Summary: Improvement of IN subquery
Key: ASTERIXDB-2350
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350
Project: Apache AsterixDB
Issue Type: Improvement
Components: COMP - Compiler
Reporter: Chen Luo
By default, IN is translated into a hybrid hash join followed by a sort group by, which is expensive for simply in queries with constant values.
For example
{code}
select *
from lineitem
where l_partkey in [1, 2 ,3];
{code}
is translated into
{code}
distribute result [$$16]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$16])
-- STREAM_PROJECT |PARTITIONED|
assign [$$16] <- [{"lineitem": $$lineitem}]
-- ASSIGN |PARTITIONED|
project ([$$lineitem])
-- STREAM_PROJECT |PARTITIONED|
select ($$14)
-- STREAM_SELECT |PARTITIONED|
project ([$$14, $$lineitem])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
group by ([$$20 := $$17]) decor ([$$lineitem]) {
aggregate [$$14] <- [non-empty-stream()]
-- AGGREGATE |LOCAL|
select (not(is-missing($$19)))
-- STREAM_SELECT |LOCAL|
nested tuple source
-- NESTED_TUPLE_SOURCE |LOCAL|
}
-- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$17)
-- STABLE_SORT [$$17(ASC)] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED|
project ([$$lineitem, $$19, $$17])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
left outer join (eq($$18, $#1))
-- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED|
assign [$$18] <- [$$lineitem.getField(1)]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$17, $$lineitem] <- Default.lineitem
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED|
assign [$$19] <- [TRUE]
-- ASSIGN |UNPARTITIONED|
unnest $#1 <- scan-collection(array: [ 1, 2, 3 ])
-- UNNEST |UNPARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
{code}
While the following query
{code}
select *
from lineitem
where l_partkey = 1 OR l_partkey=2 OR l_partkey =3;
{code}
is translated into
{code}
distribute result [$$18]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$18])
-- STREAM_PROJECT |PARTITIONED|
assign [$$18] <- [{"lineitem": $$lineitem}]
-- ASSIGN |PARTITIONED|
project ([$$lineitem])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
join (eq($$21, $$19))
-- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$19] <- [$$lineitem.getField(1)]
-- ASSIGN |PARTITIONED|
project ([$$lineitem])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$20, $$lineitem] <- Default.lineitem
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- BROADCAST_EXCHANGE |PARTITIONED|
unnest $$21 <- scan-collection(array: [ 1, 2, 3 ])
-- UNNEST |UNPARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)