You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Ildar Absalyamov (JIRA)" <ji...@apache.org> on 2018/01/19 19:24:00 UTC
[jira] [Commented] (ASTERIXDB-2253) Disjunctive predicts on the
same fields introduces join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332785#comment-16332785 ]
Ildar Absalyamov commented on ASTERIXDB-2253:
---------------------------------------------
[~wyk] , this plan is a result of rule DisjunctivePredicateToJoinRule being fired.
But I agree in your example keeping just two selects will be a better alternative. I guess it falls down to a broader issue of nested loop vs. hash join, which depends on outer/inner cardinalities.
An intermediate solution could be to use experiment and determine a threshold (in number of disjuncts) after which this rule is fired.
> Disjunctive predicts on the same fields introduces join
> -------------------------------------------------------
>
> Key: ASTERIXDB-2253
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2253
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: COMP - Compiler
> Reporter: Wail Alkowaileet
> Priority: Major
>
> I'm not sure if I'm missing something ... It looks more expensive than StreamSelect
> Query:
> {noformat}
> SELECT value t.text
> FROM Tweets as t
> WHERE t.retweet_count = 10 or t.retweet_count = 20{noformat}
> Plan:
> {noformat}
> distribute result [$$16]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$16])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (eq($$19, $$17))
> -- HYBRID_HASH_JOIN [$$17][$$19] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$16, $$17])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$16, $$17] <- [$$t.getField("text"), $$t.getField("retweet_count")]
> -- ASSIGN |PARTITIONED|
> project ([$$t])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$18, $$t] <- TwitterDataverse.Tweets
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- BROADCAST_EXCHANGE |PARTITIONED|
> unnest $$19 <- scan-collection(array: [ 20, 10 ])
> -- UNNEST |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)