You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Till Westmann (JIRA)" <ji...@apache.org> on 2015/11/10 23:44:10 UTC
[jira] [Commented] (ASTERIXDB-1168) Should not sort&group after an
OrderedList left-join with a dataset
[ https://issues.apache.org/jira/browse/ASTERIXDB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999496#comment-14999496 ]
Till Westmann commented on ASTERIXDB-1168:
------------------------------------------
It seems that you got what you were asking for when putting the nested subquery into the return statement.
To get what you are looking for I think that the query should look something like this:
let $ps := ["b","a", "b","c","c"]
for $p in $ps
for $x in dataset TData where $x.content = $p
return
{ "p":$p, "match": [ $x.id ] }
> Should not sort&group after an OrderedList left-join with a dataset
> -------------------------------------------------------------------
>
> Key: ASTERIXDB-1168
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: Optimizer
> Reporter: Jianfeng Jia
>
> Hi,
> Here is the context for this issue, I wanted to lookup some records in the DB through REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList and expected a left-out join would give me all matching records that consistent with query order. However, the result was re-sorted and grouped, which confused the client side response handler.
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type TType as closed {
> id: int64,
> content: string
> }
> create dataset TData (TType) primary key id;
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3, "content":"c"}])
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
> ]
> ---------------------------------------------------------------------------
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
> ]
> ---------------------------------------------------------------------------
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$4])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
> -- ASSIGN |PARTITIONED|
> project ([$$1, $$9])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
> aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]]
> -- AGGREGATE |LOCAL|
> select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, %0->$$12) (ASC, %0->$$13)
> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$10, $$11, $$12, $$13])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$13])
> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
> unnest $$13 <- function-call: asterix:scan-collection, Args:[%0->$$12]
> -- UNNEST |UNPARTITIONED|
> assign [$$12] <- [AOrderedList: [ AString: {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
> -- ASSIGN |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
> project ([$$10, $$11, $$14])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$10, $$2] <- test:TData
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE
> ---------------------------------------------------------------------------------
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join?
> We can close this issue if this is an intended design.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)