You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Ildar Absalyamov (JIRA)" <ji...@apache.org> on 2018/01/19 19:24:00 UTC

[jira] [Commented] (ASTERIXDB-2253) Disjunctive predicts on the same fields introduces join

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332785#comment-16332785 ] 

Ildar Absalyamov commented on ASTERIXDB-2253:
---------------------------------------------

[~wyk] , this plan is a result of rule DisjunctivePredicateToJoinRule being fired.

But I agree in your example keeping just two selects will be a better alternative. I guess it falls down to a broader issue of nested loop vs. hash join, which depends on outer/inner cardinalities.

An intermediate solution could be to use experiment and determine a threshold (in number of disjuncts) after which this rule is fired.

> Disjunctive predicts on the same fields introduces join
> -------------------------------------------------------
>
>                 Key: ASTERIXDB-2253
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2253
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler
>            Reporter: Wail Alkowaileet
>            Priority: Major
>
> I'm not sure if I'm missing something ... It looks more expensive than StreamSelect
> Query:
> {noformat}
> SELECT value t.text
> FROM Tweets as t
> WHERE t.retweet_count = 10 or t.retweet_count = 20{noformat}
> Plan:
> {noformat}
> distribute result [$$16]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$16])
>     -- STREAM_PROJECT  |PARTITIONED|
>       exchange
>       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>         join (eq($$19, $$17))
>         -- HYBRID_HASH_JOIN [$$17][$$19]  |PARTITIONED|
>           exchange
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             project ([$$16, $$17])
>             -- STREAM_PROJECT  |PARTITIONED|
>               assign [$$16, $$17] <- [$$t.getField("text"), $$t.getField("retweet_count")]
>               -- ASSIGN  |PARTITIONED|
>                 project ([$$t])
>                 -- STREAM_PROJECT  |PARTITIONED|
>                   exchange
>                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                     data-scan []<-[$$18, $$t] <- TwitterDataverse.Tweets
>                     -- DATASOURCE_SCAN  |PARTITIONED|
>                       exchange
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         empty-tuple-source
>                         -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>           exchange
>           -- BROADCAST_EXCHANGE  |PARTITIONED|
>             unnest $$19 <- scan-collection(array: [ 20, 10 ])
>             -- UNNEST  |UNPARTITIONED|
>               empty-tuple-source
>               -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)