You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Jianfeng Jia (JIRA)" <ji...@apache.org> on 2015/11/10 23:34:10 UTC
[jira] [Created] (ASTERIXDB-1168) Should not sort&group after an
OrderedList left-join with a dataset
Jianfeng Jia created ASTERIXDB-1168:
---------------------------------------
Summary: Should not sort&group after an OrderedList left-join with a dataset
Key: ASTERIXDB-1168
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
Project: Apache AsterixDB
Issue Type: Bug
Components: Optimizer
Reporter: Jianfeng Jia
Hi,
Here is the context for this issue, I wanted to lookup some records in the DB through REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList and expected a left-out join would give me all matching records that consistent with query order. However, the result was re-sorted and grouped, which confused the client side response handler.
Here is the synthetic query that emulates the similar use case:
---------------------------------------------------------------------------
drop dataverse test if exists;
create dataverse test;
use dataverse test;
create type TType as closed {
id: int64,
content: string
}
create dataset TData (TType) primary key id;
insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3, "content":"c"}])
// now let's query on
let $ps := ["b","a", "b","c","c"]
for $p in $ps
return { "p":$p,
"match": for $x in dataset TData where $x.content = $p return $x.id
}
---------------------------------------------------------------------------
What I expected is following:
---------------------------------------------------------------------------
[ { "p": "b", "match": [ 2 ] }
, { "p": "a", "match": [ 1 ] }
, { "p": "b", "match": [ 2 ] }
, { "p": "c", "match": [ 3 ] }
, { "p": "c", "match": [ 3 ] }
]
---------------------------------------------------------------------------
The returned result is following, which is aggregated and re-sorted.
---------------------------------------------------------------------------
[ { "p": "a", "match": [ 1 ] }
, { "p": "b", "match": [ 2, 2 ] }
, { "p": "c", "match": [ 3, 3 ] }
]
---------------------------------------------------------------------------
The optimized logical plan is following:
---------------------------------------------------------------------------
distribute result [%0->$$4]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$4])
-- STREAM_PROJECT |PARTITIONED|
assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
-- ASSIGN |PARTITIONED|
project ([$$1, $$9])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]]
-- AGGREGATE |LOCAL|
select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
-- STREAM_SELECT |LOCAL|
nested tuple source
-- NESTED_TUPLE_SOURCE |LOCAL|
}
-- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, %0->$$12) (ASC, %0->$$13)
-- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$10, $$11, $$12, $$13])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$13])
-- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
unnest $$13 <- function-call: asterix:scan-collection, Args:[%0->$$12]
-- UNNEST |UNPARTITIONED|
assign [$$12] <- [AOrderedList: [ AString: {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
-- ASSIGN |UNPARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
project ([$$10, $$11, $$14])
-- STREAM_PROJECT |PARTITIONED|
assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$10, $$2] <- test:TData
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE
---------------------------------------------------------------------------------
Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join?
We can close this issue if this is an intended design.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Posted by Jianfeng Jia <ji...@gmail.com>.
No problem. Let me re-open it.
> On Nov 10, 2015, at 3:20 PM, Yingyi Bu <bu...@gmail.com> wrote:
>
> Ah, yes!
> So this should be a bug then...
>
> Best,
> Yingyi
>
>
> On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <ji...@gmail.com>
> wrote:
>
>> Actually, I’m still confused with the “cardinality” here. Isn’t the
>> cardinality of $ps is 5?
>>>> let $ps := ["b","a", "b","c","c”]
>>
>>
>>> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <bu...@gmail.com> wrote:
>>>
>>> Jianfeng,
>>>
>>> The results of the query is correct.
>>> The cardinality of returned results should be the same as the number of
>>> input binding tuples for $p.
>>>
>>> Best,
>>> Yingyi
>>>
>>>
>>> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
>>> wrote:
>>>
>>>> Jianfeng Jia created ASTERIXDB-1168:
>>>> ---------------------------------------
>>>>
>>>> Summary: Should not sort&group after an OrderedList
>> left-join
>>>> with a dataset
>>>> Key: ASTERIXDB-1168
>>>> URL:
>> https://issues.apache.org/jira/browse/ASTERIXDB-1168
>>>> Project: Apache AsterixDB
>>>> Issue Type: Bug
>>>> Components: Optimizer
>>>> Reporter: Jianfeng Jia
>>>>
>>>>
>>>> Hi,
>>>> Here is the context for this issue, I wanted to lookup some records in
>>>> the DB through REST API, and I wanted to lookup in a batch way. Then I
>>>> packaged the "keys" into an OrderdList and expected a left-out join
>> would
>>>> give me all matching records that consistent with query order. However,
>> the
>>>> result was re-sorted and grouped, which confused the client side
>> response
>>>> handler.
>>>>
>>>> Here is the synthetic query that emulates the similar use case:
>>>>
>> ---------------------------------------------------------------------------
>>>> drop dataverse test if exists;
>>>> create dataverse test;
>>>>
>>>> use dataverse test;
>>>>
>>>> create type TType as closed {
>>>> id: int64,
>>>> content: string
>>>> }
>>>>
>>>> create dataset TData (TType) primary key id;
>>>>
>>>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2,
>> "content":
>>>> "b"}, {"id":3, "content":"c"}])
>>>>
>>>> // now let's query on
>>>> let $ps := ["b","a", "b","c","c"]
>>>>
>>>> for $p in $ps
>>>> return { "p":$p,
>>>> "match": for $x in dataset TData where $x.content = $p return $x.id
>>>> }
>>>>
>> ---------------------------------------------------------------------------
>>>>
>>>> What I expected is following:
>>>>
>> ---------------------------------------------------------------------------
>>>> [ { "p": "b", "match": [ 2 ] }
>>>> , { "p": "a", "match": [ 1 ] }
>>>> , { "p": "b", "match": [ 2 ] }
>>>> , { "p": "c", "match": [ 3 ] }
>>>> , { "p": "c", "match": [ 3 ] }
>>>> ]
>>>>
>> ---------------------------------------------------------------------------
>>>>
>>>> The returned result is following, which is aggregated and re-sorted.
>>>>
>> ---------------------------------------------------------------------------
>>>> [ { "p": "a", "match": [ 1 ] }
>>>> , { "p": "b", "match": [ 2, 2 ] }
>>>> , { "p": "c", "match": [ 3, 3 ] }
>>>> ]
>>>>
>> ---------------------------------------------------------------------------
>>>>
>>>> The optimized logical plan is following:
>>>>
>> ---------------------------------------------------------------------------
>>>> distribute result [%0->$$4]
>>>> -- DISTRIBUTE_RESULT |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> project ([$$4])
>>>> -- STREAM_PROJECT |PARTITIONED|
>>>> assign [$$4] <- [function-call: asterix:closed-record-constructor,
>>>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>>>> -- ASSIGN |PARTITIONED|
>>>> project ([$$1, $$9])
>>>> -- STREAM_PROJECT |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>>>> aggregate [$$9] <- [function-call: asterix:listify,
>>>> Args:[%0->$$10]]
>>>> -- AGGREGATE |LOCAL|
>>>> select (function-call: algebricks:not,
>>>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>>>> -- STREAM_SELECT |LOCAL|
>>>> nested tuple source
>>>> -- NESTED_TUPLE_SOURCE |LOCAL|
>>>> }
>>>> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> order (ASC, %0->$$12) (ASC, %0->$$13)
>>>> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> project ([$$10, $$11, $$12, $$13])
>>>> -- STREAM_PROJECT |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> left outer join (function-call: algebricks:eq,
>>>> Args:[%0->$$14, %0->$$13])
>>>> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
>>>> exchange
>>>> -- HASH_PARTITION_EXCHANGE [$$13]
>> |PARTITIONED|
>>>> unnest $$13 <- function-call:
>>>> asterix:scan-collection, Args:[%0->$$12]
>>>> -- UNNEST |UNPARTITIONED|
>>>> assign [$$12] <- [AOrderedList: [ AString:
>>>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>>>> -- ASSIGN |UNPARTITIONED|
>>>> empty-tuple-source
>>>> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
>>>> exchange
>>>> -- HASH_PARTITION_EXCHANGE [$$14]
>> |PARTITIONED|
>>>> project ([$$10, $$11, $$14])
>>>> -- STREAM_PROJECT |PARTITIONED|
>>>> assign [$$11, $$14] <- [TRUE,
>> function-call:
>>>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>>>> -- ASSIGN |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> data-scan []<-[$$10, $$2] <- test:TData
>>>> -- DATASOURCE_SCAN |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>>>> empty-tuple-source
>>>> -- EMPTY_TUPLE_SOURCE
>>>>
>>>>
>> ---------------------------------------------------------------------------------
>>>>
>>>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
>>>> join?
>>>> We can close this issue if this is an intended design.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>>
>>
>>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>
>>
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an
OrderedList left-join with a dataset
Posted by Yingyi Bu <bu...@gmail.com>.
Ah, yes!
So this should be a bug then...
Best,
Yingyi
On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <ji...@gmail.com>
wrote:
> Actually, I’m still confused with the “cardinality” here. Isn’t the
> cardinality of $ps is 5?
> >> let $ps := ["b","a", "b","c","c”]
>
>
> > On Nov 10, 2015, at 2:50 PM, Yingyi Bu <bu...@gmail.com> wrote:
> >
> > Jianfeng,
> >
> > The results of the query is correct.
> > The cardinality of returned results should be the same as the number of
> > input binding tuples for $p.
> >
> > Best,
> > Yingyi
> >
> >
> > On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
> > wrote:
> >
> >> Jianfeng Jia created ASTERIXDB-1168:
> >> ---------------------------------------
> >>
> >> Summary: Should not sort&group after an OrderedList
> left-join
> >> with a dataset
> >> Key: ASTERIXDB-1168
> >> URL:
> https://issues.apache.org/jira/browse/ASTERIXDB-1168
> >> Project: Apache AsterixDB
> >> Issue Type: Bug
> >> Components: Optimizer
> >> Reporter: Jianfeng Jia
> >>
> >>
> >> Hi,
> >> Here is the context for this issue, I wanted to lookup some records in
> >> the DB through REST API, and I wanted to lookup in a batch way. Then I
> >> packaged the "keys" into an OrderdList and expected a left-out join
> would
> >> give me all matching records that consistent with query order. However,
> the
> >> result was re-sorted and grouped, which confused the client side
> response
> >> handler.
> >>
> >> Here is the synthetic query that emulates the similar use case:
> >>
> ---------------------------------------------------------------------------
> >> drop dataverse test if exists;
> >> create dataverse test;
> >>
> >> use dataverse test;
> >>
> >> create type TType as closed {
> >> id: int64,
> >> content: string
> >> }
> >>
> >> create dataset TData (TType) primary key id;
> >>
> >> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2,
> "content":
> >> "b"}, {"id":3, "content":"c"}])
> >>
> >> // now let's query on
> >> let $ps := ["b","a", "b","c","c"]
> >>
> >> for $p in $ps
> >> return { "p":$p,
> >> "match": for $x in dataset TData where $x.content = $p return $x.id
> >> }
> >>
> ---------------------------------------------------------------------------
> >>
> >> What I expected is following:
> >>
> ---------------------------------------------------------------------------
> >> [ { "p": "b", "match": [ 2 ] }
> >> , { "p": "a", "match": [ 1 ] }
> >> , { "p": "b", "match": [ 2 ] }
> >> , { "p": "c", "match": [ 3 ] }
> >> , { "p": "c", "match": [ 3 ] }
> >> ]
> >>
> ---------------------------------------------------------------------------
> >>
> >> The returned result is following, which is aggregated and re-sorted.
> >>
> ---------------------------------------------------------------------------
> >> [ { "p": "a", "match": [ 1 ] }
> >> , { "p": "b", "match": [ 2, 2 ] }
> >> , { "p": "c", "match": [ 3, 3 ] }
> >> ]
> >>
> ---------------------------------------------------------------------------
> >>
> >> The optimized logical plan is following:
> >>
> ---------------------------------------------------------------------------
> >> distribute result [%0->$$4]
> >> -- DISTRIBUTE_RESULT |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> project ([$$4])
> >> -- STREAM_PROJECT |PARTITIONED|
> >> assign [$$4] <- [function-call: asterix:closed-record-constructor,
> >> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
> >> -- ASSIGN |PARTITIONED|
> >> project ([$$1, $$9])
> >> -- STREAM_PROJECT |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
> >> aggregate [$$9] <- [function-call: asterix:listify,
> >> Args:[%0->$$10]]
> >> -- AGGREGATE |LOCAL|
> >> select (function-call: algebricks:not,
> >> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
> >> -- STREAM_SELECT |LOCAL|
> >> nested tuple source
> >> -- NESTED_TUPLE_SOURCE |LOCAL|
> >> }
> >> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> order (ASC, %0->$$12) (ASC, %0->$$13)
> >> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> project ([$$10, $$11, $$12, $$13])
> >> -- STREAM_PROJECT |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> left outer join (function-call: algebricks:eq,
> >> Args:[%0->$$14, %0->$$13])
> >> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
> >> exchange
> >> -- HASH_PARTITION_EXCHANGE [$$13]
> |PARTITIONED|
> >> unnest $$13 <- function-call:
> >> asterix:scan-collection, Args:[%0->$$12]
> >> -- UNNEST |UNPARTITIONED|
> >> assign [$$12] <- [AOrderedList: [ AString:
> >> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
> >> -- ASSIGN |UNPARTITIONED|
> >> empty-tuple-source
> >> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> >> exchange
> >> -- HASH_PARTITION_EXCHANGE [$$14]
> |PARTITIONED|
> >> project ([$$10, $$11, $$14])
> >> -- STREAM_PROJECT |PARTITIONED|
> >> assign [$$11, $$14] <- [TRUE,
> function-call:
> >> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
> >> -- ASSIGN |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> data-scan []<-[$$10, $$2] <- test:TData
> >> -- DATASOURCE_SCAN |PARTITIONED|
> >> exchange
> >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> >> empty-tuple-source
> >> -- EMPTY_TUPLE_SOURCE
> >>
> >>
> ---------------------------------------------------------------------------------
> >>
> >> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
> >> join?
> >> We can close this issue if this is an intended design.
> >>
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
> >>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Posted by Jianfeng Jia <ji...@gmail.com>.
Actually, I’m still confused with the “cardinality” here. Isn’t the cardinality of $ps is 5?
>> let $ps := ["b","a", "b","c","c”]
> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <bu...@gmail.com> wrote:
>
> Jianfeng,
>
> The results of the query is correct.
> The cardinality of returned results should be the same as the number of
> input binding tuples for $p.
>
> Best,
> Yingyi
>
>
> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
> wrote:
>
>> Jianfeng Jia created ASTERIXDB-1168:
>> ---------------------------------------
>>
>> Summary: Should not sort&group after an OrderedList left-join
>> with a dataset
>> Key: ASTERIXDB-1168
>> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>> Project: Apache AsterixDB
>> Issue Type: Bug
>> Components: Optimizer
>> Reporter: Jianfeng Jia
>>
>>
>> Hi,
>> Here is the context for this issue, I wanted to lookup some records in
>> the DB through REST API, and I wanted to lookup in a batch way. Then I
>> packaged the "keys" into an OrderdList and expected a left-out join would
>> give me all matching records that consistent with query order. However, the
>> result was re-sorted and grouped, which confused the client side response
>> handler.
>>
>> Here is the synthetic query that emulates the similar use case:
>> ---------------------------------------------------------------------------
>> drop dataverse test if exists;
>> create dataverse test;
>>
>> use dataverse test;
>>
>> create type TType as closed {
>> id: int64,
>> content: string
>> }
>>
>> create dataset TData (TType) primary key id;
>>
>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content":
>> "b"}, {"id":3, "content":"c"}])
>>
>> // now let's query on
>> let $ps := ["b","a", "b","c","c"]
>>
>> for $p in $ps
>> return { "p":$p,
>> "match": for $x in dataset TData where $x.content = $p return $x.id
>> }
>> ---------------------------------------------------------------------------
>>
>> What I expected is following:
>> ---------------------------------------------------------------------------
>> [ { "p": "b", "match": [ 2 ] }
>> , { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2 ] }
>> , { "p": "c", "match": [ 3 ] }
>> , { "p": "c", "match": [ 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>>
>> The returned result is following, which is aggregated and re-sorted.
>> ---------------------------------------------------------------------------
>> [ { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2, 2 ] }
>> , { "p": "c", "match": [ 3, 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>>
>> The optimized logical plan is following:
>> ---------------------------------------------------------------------------
>> distribute result [%0->$$4]
>> -- DISTRIBUTE_RESULT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> project ([$$4])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$4] <- [function-call: asterix:closed-record-constructor,
>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>> -- ASSIGN |PARTITIONED|
>> project ([$$1, $$9])
>> -- STREAM_PROJECT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>> aggregate [$$9] <- [function-call: asterix:listify,
>> Args:[%0->$$10]]
>> -- AGGREGATE |LOCAL|
>> select (function-call: algebricks:not,
>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>> -- STREAM_SELECT |LOCAL|
>> nested tuple source
>> -- NESTED_TUPLE_SOURCE |LOCAL|
>> }
>> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> order (ASC, %0->$$12) (ASC, %0->$$13)
>> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> project ([$$10, $$11, $$12, $$13])
>> -- STREAM_PROJECT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> left outer join (function-call: algebricks:eq,
>> Args:[%0->$$14, %0->$$13])
>> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
>> exchange
>> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
>> unnest $$13 <- function-call:
>> asterix:scan-collection, Args:[%0->$$12]
>> -- UNNEST |UNPARTITIONED|
>> assign [$$12] <- [AOrderedList: [ AString:
>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>> -- ASSIGN |UNPARTITIONED|
>> empty-tuple-source
>> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
>> exchange
>> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
>> project ([$$10, $$11, $$14])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$11, $$14] <- [TRUE, function-call:
>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>> -- ASSIGN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> data-scan []<-[$$10, $$2] <- test:TData
>> -- DATASOURCE_SCAN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> empty-tuple-source
>> -- EMPTY_TUPLE_SOURCE
>>
>> ---------------------------------------------------------------------------------
>>
>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
>> join?
>> We can close this issue if this is an intended design.
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Posted by Jianfeng Jia <ji...@gmail.com>.
Got it. I’ve closed the issue. Thanks for answering.
> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <bu...@gmail.com> wrote:
>
> Jianfeng,
>
> The results of the query is correct.
> The cardinality of returned results should be the same as the number of
> input binding tuples for $p.
>
> Best,
> Yingyi
>
>
> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
> wrote:
>
>> Jianfeng Jia created ASTERIXDB-1168:
>> ---------------------------------------
>>
>> Summary: Should not sort&group after an OrderedList left-join
>> with a dataset
>> Key: ASTERIXDB-1168
>> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>> Project: Apache AsterixDB
>> Issue Type: Bug
>> Components: Optimizer
>> Reporter: Jianfeng Jia
>>
>>
>> Hi,
>> Here is the context for this issue, I wanted to lookup some records in
>> the DB through REST API, and I wanted to lookup in a batch way. Then I
>> packaged the "keys" into an OrderdList and expected a left-out join would
>> give me all matching records that consistent with query order. However, the
>> result was re-sorted and grouped, which confused the client side response
>> handler.
>>
>> Here is the synthetic query that emulates the similar use case:
>> ---------------------------------------------------------------------------
>> drop dataverse test if exists;
>> create dataverse test;
>>
>> use dataverse test;
>>
>> create type TType as closed {
>> id: int64,
>> content: string
>> }
>>
>> create dataset TData (TType) primary key id;
>>
>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content":
>> "b"}, {"id":3, "content":"c"}])
>>
>> // now let's query on
>> let $ps := ["b","a", "b","c","c"]
>>
>> for $p in $ps
>> return { "p":$p,
>> "match": for $x in dataset TData where $x.content = $p return $x.id
>> }
>> ---------------------------------------------------------------------------
>>
>> What I expected is following:
>> ---------------------------------------------------------------------------
>> [ { "p": "b", "match": [ 2 ] }
>> , { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2 ] }
>> , { "p": "c", "match": [ 3 ] }
>> , { "p": "c", "match": [ 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>>
>> The returned result is following, which is aggregated and re-sorted.
>> ---------------------------------------------------------------------------
>> [ { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2, 2 ] }
>> , { "p": "c", "match": [ 3, 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>>
>> The optimized logical plan is following:
>> ---------------------------------------------------------------------------
>> distribute result [%0->$$4]
>> -- DISTRIBUTE_RESULT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> project ([$$4])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$4] <- [function-call: asterix:closed-record-constructor,
>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>> -- ASSIGN |PARTITIONED|
>> project ([$$1, $$9])
>> -- STREAM_PROJECT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>> aggregate [$$9] <- [function-call: asterix:listify,
>> Args:[%0->$$10]]
>> -- AGGREGATE |LOCAL|
>> select (function-call: algebricks:not,
>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>> -- STREAM_SELECT |LOCAL|
>> nested tuple source
>> -- NESTED_TUPLE_SOURCE |LOCAL|
>> }
>> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> order (ASC, %0->$$12) (ASC, %0->$$13)
>> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> project ([$$10, $$11, $$12, $$13])
>> -- STREAM_PROJECT |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> left outer join (function-call: algebricks:eq,
>> Args:[%0->$$14, %0->$$13])
>> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
>> exchange
>> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
>> unnest $$13 <- function-call:
>> asterix:scan-collection, Args:[%0->$$12]
>> -- UNNEST |UNPARTITIONED|
>> assign [$$12] <- [AOrderedList: [ AString:
>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>> -- ASSIGN |UNPARTITIONED|
>> empty-tuple-source
>> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
>> exchange
>> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
>> project ([$$10, $$11, $$14])
>> -- STREAM_PROJECT |PARTITIONED|
>> assign [$$11, $$14] <- [TRUE, function-call:
>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>> -- ASSIGN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> data-scan []<-[$$10, $$2] <- test:TData
>> -- DATASOURCE_SCAN |PARTITIONED|
>> exchange
>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
>> empty-tuple-source
>> -- EMPTY_TUPLE_SOURCE
>>
>> ---------------------------------------------------------------------------------
>>
>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
>> join?
>> We can close this issue if this is an intended design.
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an
OrderedList left-join with a dataset
Posted by Yingyi Bu <bu...@gmail.com>.
Jianfeng,
The results of the query is correct.
The cardinality of returned results should be the same as the number of
input binding tuples for $p.
Best,
Yingyi
On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
wrote:
> Jianfeng Jia created ASTERIXDB-1168:
> ---------------------------------------
>
> Summary: Should not sort&group after an OrderedList left-join
> with a dataset
> Key: ASTERIXDB-1168
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: Optimizer
> Reporter: Jianfeng Jia
>
>
> Hi,
> Here is the context for this issue, I wanted to lookup some records in
> the DB through REST API, and I wanted to lookup in a batch way. Then I
> packaged the "keys" into an OrderdList and expected a left-out join would
> give me all matching records that consistent with query order. However, the
> result was re-sorted and grouped, which confused the client side response
> handler.
>
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
>
> use dataverse test;
>
> create type TType as closed {
> id: int64,
> content: string
> }
>
> create dataset TData (TType) primary key id;
>
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content":
> "b"}, {"id":3, "content":"c"}])
>
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
>
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
>
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
> ]
> ---------------------------------------------------------------------------
>
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
> ]
> ---------------------------------------------------------------------------
>
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$4])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$4] <- [function-call: asterix:closed-record-constructor,
> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
> -- ASSIGN |PARTITIONED|
> project ([$$1, $$9])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
> aggregate [$$9] <- [function-call: asterix:listify,
> Args:[%0->$$10]]
> -- AGGREGATE |LOCAL|
> select (function-call: algebricks:not,
> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, %0->$$12) (ASC, %0->$$13)
> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$10, $$11, $$12, $$13])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (function-call: algebricks:eq,
> Args:[%0->$$14, %0->$$13])
> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
> unnest $$13 <- function-call:
> asterix:scan-collection, Args:[%0->$$12]
> -- UNNEST |UNPARTITIONED|
> assign [$$12] <- [AOrderedList: [ AString:
> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
> -- ASSIGN |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
> project ([$$10, $$11, $$14])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$11, $$14] <- [TRUE, function-call:
> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$10, $$2] <- test:TData
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE
>
> ---------------------------------------------------------------------------------
>
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
> join?
> We can close this issue if this is an intended design.
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an
OrderedList left-join with a dataset
Posted by Yingyi Bu <bu...@gmail.com>.
Jianfeng,
The results of the query is correct.
The cardinality of returned results should be the same as the number of
input binding tuples for $p.
Best,
Yingyi
On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <ji...@apache.org>
wrote:
> Jianfeng Jia created ASTERIXDB-1168:
> ---------------------------------------
>
> Summary: Should not sort&group after an OrderedList left-join
> with a dataset
> Key: ASTERIXDB-1168
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: Optimizer
> Reporter: Jianfeng Jia
>
>
> Hi,
> Here is the context for this issue, I wanted to lookup some records in
> the DB through REST API, and I wanted to lookup in a batch way. Then I
> packaged the "keys" into an OrderdList and expected a left-out join would
> give me all matching records that consistent with query order. However, the
> result was re-sorted and grouped, which confused the client side response
> handler.
>
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
>
> use dataverse test;
>
> create type TType as closed {
> id: int64,
> content: string
> }
>
> create dataset TData (TType) primary key id;
>
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content":
> "b"}, {"id":3, "content":"c"}])
>
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
>
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
>
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
> ]
> ---------------------------------------------------------------------------
>
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
> ]
> ---------------------------------------------------------------------------
>
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$4])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$4] <- [function-call: asterix:closed-record-constructor,
> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
> -- ASSIGN |PARTITIONED|
> project ([$$1, $$9])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
> aggregate [$$9] <- [function-call: asterix:listify,
> Args:[%0->$$10]]
> -- AGGREGATE |LOCAL|
> select (function-call: algebricks:not,
> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, %0->$$12) (ASC, %0->$$13)
> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$10, $$11, $$12, $$13])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (function-call: algebricks:eq,
> Args:[%0->$$14, %0->$$13])
> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED|
> unnest $$13 <- function-call:
> asterix:scan-collection, Args:[%0->$$12]
> -- UNNEST |UNPARTITIONED|
> assign [$$12] <- [AOrderedList: [ AString:
> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
> -- ASSIGN |UNPARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED|
> project ([$$10, $$11, $$14])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$11, $$14] <- [TRUE, function-call:
> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$10, $$2] <- test:TData
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE
>
> ---------------------------------------------------------------------------------
>
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
> join?
> We can close this issue if this is an intended design.
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>