You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2016/08/25 20:01:20 UTC
[jira] [Closed] (ASTERIXDB-1186) Index Join Hint on an index with
composite keys causes query to return incorrect results
[ https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Taewoo Kim closed ASTERIXDB-1186.
---------------------------------
Resolution: Fixed
> Index Join Hint on an index with composite keys causes query to return incorrect results
> ----------------------------------------------------------------------------------------
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
> Reporter: Pouria
> Assignee: Taewoo Kim
> Priority: Critical
> Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists of more than one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning) which results in wrong results (only a subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is composite) are hash-partitioned on a "combination" of hash values of all attributes in the PK. As a result the records of the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange and the runtime wont send them to all partitions.
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
> l_orderkey: int64,
> l_partkey: int64,
> l_suppkey: int64,
> l_linenumber: int32,
> l_quantity: int32,
> l_extendedprice: double,
> l_discount: double,
> l_tax: double,
> l_returnflag: string,
> l_linestatus: string,
> l_shipdate: string,
> l_commitdate: string,
> l_receiptdate: string,
> l_shipinstruct: string,
> l_shipmode: string,
> l_comment: string
> }
> create type OrderType as {
> o_orderkey: int64,
> o_custkey: int64,
> o_orderstatus: string,
> o_totalprice: double,
> o_orderdate: string,
> o_orderpriority: string,
> o_clerk: string,
> o_shippriority: int32,
> o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType) primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where
> $l.l_orderkey /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem, and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber value - which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT |UNPARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE |UNPARTITIONED|
> exchange
> -- RANDOM_MERGE_EXCHANGE |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE |PARTITIONED|
> project ([$$9])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$9] <- [function-call: asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN |PARTITIONED|
> project ([$$17, $$16])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> unnest-map [$$17, $$18, $$1] <- function-call: asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
> -- BTREE_SEARCH |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$16])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$16, $$0] <- dummy:Orders
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)