You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Chen Luo (JIRA)" <ji...@apache.org> on 2018/03/30 17:30:00 UTC
[jira] [Updated] (ASTERIXDB-2350) Improve IN subquery
[ https://issues.apache.org/jira/browse/ASTERIXDB-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Luo updated ASTERIXDB-2350:
--------------------------------
Summary: Improve IN subquery (was: Improvement of IN subquery)
> Improve IN subquery
> -------------------
>
> Key: ASTERIXDB-2350
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: COMP - Compiler
> Reporter: Chen Luo
> Priority: Minor
>
> By default, IN is translated into a hybrid hash join followed by a sort group by, which is expensive for simply in queries with constant values.
> For example
> {code}
> select *
> from lineitem
> where l_partkey in [1, 2 ,3];
> {code}
> is translated into
> {code}
> distribute result [$$16]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$16])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$16] <- [{"lineitem": $$lineitem}]
> -- ASSIGN |PARTITIONED|
> project ([$$lineitem])
> -- STREAM_PROJECT |PARTITIONED|
> select ($$14)
> -- STREAM_SELECT |PARTITIONED|
> project ([$$14, $$lineitem])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$20 := $$17]) decor ([$$lineitem]) {
> aggregate [$$14] <- [non-empty-stream()]
> -- AGGREGATE |LOCAL|
> select (not(is-missing($$19)))
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, $$17)
> -- STABLE_SORT [$$17(ASC)] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED|
> project ([$$lineitem, $$19, $$17])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (eq($$18, $#1))
> -- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED|
> assign [$$18] <- [$$lineitem.getField(1)]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$17, $$lineitem] <- Default.lineitem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED|
> assign [$$19] <- [TRUE]
> -- ASSIGN |UNPARTITIONED|
> unnest $#1 <- scan-collection(array: [ 1, 2, 3 ])
> -- UNNEST |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> {code}
> While the following query
> {code}
> select *
> from lineitem
> where l_partkey = 1 OR l_partkey=2 OR l_partkey =3;
> {code}
> is translated into
> {code}
> distribute result [$$18]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$18])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$18] <- [{"lineitem": $$lineitem}]
> -- ASSIGN |PARTITIONED|
> project ([$$lineitem])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (eq($$21, $$19))
> -- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$19] <- [$$lineitem.getField(1)]
> -- ASSIGN |PARTITIONED|
> project ([$$lineitem])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$20, $$lineitem] <- Default.lineitem
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- BROADCAST_EXCHANGE |PARTITIONED|
> unnest $$21 <- scan-collection(array: [ 1, 2, 3 ])
> -- UNNEST |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)