You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by khaleel mershad <kh...@gmail.com> on 2015/04/22 15:45:23 UTC

Error when using aggregates with correlated subqueries

Hello,

I tried executing the following query in Phoenix:
select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W where EXISTS( select MAX(H."timestamp") from "History" AS H where
(H."rowId" = W."rowId") AND (H."status" = 'approved') );

I got the following error:
ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
(state=42Y27,code=1018)

If I replace W."rowId" (in the where clause) with a constant value (for
example: '3587'), the query works fine. Also, if I replace the aggregate
MAX(H."timestamp") with H."timestamp", the query also works fine.

So it seems that Phoenix generates error when using a reference to an outer
query while using an aggregate within the inner query.

Any solutions?



Regards,

Khaleel Mershad, Ph.D.
Research Associate

American University of Beirut
Department of Electrical and Computer Engineering
Bliss Street, Beirut, Lebanon
email: kwm03@aub.edu.lb

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Thank you in advance! Just let me know.


Thanks,
Maryann

On Sunday, April 26, 2015, khaleel mershad <kh...@gmail.com> wrote:

>
> Thanks for the explanation. I see what you mean. I will also try testing a
> query that contains nested correlated subqueries with multiple level
> outer reference, and with using IN instead of EXISTS and I will see if it
> produces an error.
>
>
>
> Best,
> Khaleel
>
> On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue <maryann.xue@gmail.com
> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>> wrote:
>
>> Hi Khaleel,
>>
>> Thanks for looking into the problem! But there IS a bug with EXISTS
>> having an aggregate function, say "max(H."timestamp") as used in your very
>> first query that did not work. Otherwise, if subquery is SELECT * or SELECT
>> column1 or SELECT substr(column1), it's fine coz there is no aggregate.
>>
>> And interestingly the query with W."rowId" failed because it referenced
>> to the outmost query that had EXISTS. Otherwise if using H.
>> rowId", it only had to deal with "SELECT * from H" which was a comparison
>> subquery and so it worked fine.
>>
>> Anyway, I will verify if there is any problem with multiple level outer
>> reference in correlated subqueries, as an independent issue without EXISTS.
>>
>> Given that our EXISTS support is currently incomplete, you may want to
>> rewrite your EXISTS subqueries with equivalent IN subqueries, which would
>> most likely just work.
>>
>>
>> Thanks,
>> Maryann
>>
>>
>> On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <khellom007@gmail.com
>> <javascript:_e(%7B%7D,'cvml','khellom007@gmail.com');>> wrote:
>>
>>>
>>> I don't think that the bug is with EXISTS. As you can see with the query
>>> from my last email, it contains a nested correlated subquery which contains
>>> an aggregate within *EXISTS* and it worked after replacing the
>>> reference to the outer table (W."rowId") with a reference to the inner
>>> subquery table (H."rowId") within the aggregate subquery. In other words,
>>> the following query generates an error:
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where EXISTS (select * from "History" AS H
>>>                         where
>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>>> from "History" AS H2
>>>                                                           where (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
>>> =*
>>> *
>>> W."rowId"*)
>>>                                                           AND
>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>>> AND NOT EXISTS (select * from "History" AS H3
>>>                               where
>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>>                                 AND (H3."status" = 'pending') AND
>>> (TO_NUMBER(H3."value") < 1000));
>>>
>>>
>>> But if I replace *W."rowId"* with REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
>>> the query works fine (which is the query from my last email). So I think
>>> the problem is not with EXISTS, but with the fact that a reference to a
>>> table from the outer query (W in my query) can be done up to a maximum one
>>> nested level, and not more than that. In other words, referring to a table
>>> from the outer query from within a subquery that is in the second or more
>>> nested level will generate an error.
>>>
>>>
>>>
>>> Best,
>>> Khaleel
>>>
>>>
>>>
>>> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann.xue@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>> wrote:
>>>
>>>> Glad that your queries worked. Please do let us know if any further
>>>> problems.
>>>>
>>>> The bug is with EXISTS. Right now Phoenix does not handle aggregate
>>>> function calls correctly within an EXISTS subquery or EXISTS nested
>>>> subquery. I have opened a JIRA:
>>>> https://issues.apache.org/jira/browse/PHOENIX-1923.
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>>
>>>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom007@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','khellom007@gmail.com');>> wrote:
>>>>
>>>>>
>>>>> Dear Maryann,
>>>>>
>>>>> I tried your query and it worked. I also executed a more complex query
>>>>> which I need in my testing, which is:
>>>>>
>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>> "Wiki" AS W
>>>>> where EXISTS (select * from "History" AS H
>>>>>                         where
>>>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>>>>                         AND H."timestamp" = (select
>>>>> MAX(H2."timestamp") from "History" AS H2
>>>>>                                                           where
>>>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>>>>
>>>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>>>>                                                           AND
>>>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>>>>> AND NOT EXISTS (select * from "History" AS H3
>>>>>                               where
>>>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>>>>                                 AND (H3."status" = 'pending') AND
>>>>> (TO_NUMBER(H3."value") < 1000));
>>>>>
>>>>> and it also worked after I understood your fix of the original query.
>>>>>
>>>>> So the trick here is that we can use the reference to the outer query
>>>>> within the next subquery level only, and not up to two levels as I was
>>>>> doing? Maybe this limitation exists because Phoenix joins the tables from
>>>>> the outer and the inner correlated query, but it can perform this join up
>>>>> to one level only?
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>> Khaleel
>>>>>
>>>>>
>>>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann.xue@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>> wrote:
>>>>>
>>>>>> Hi Khaleel,
>>>>>>
>>>>>> Mind if you try the following query? I think it's the same semantics
>>>>>> as you meant for your query.
>>>>>>
>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>> "Wiki" AS W
>>>>>> where
>>>>>> W."rowId" in (select H"rowId" from "History" AS H
>>>>>>                          where H."timestamp" =
>>>>>>                                 (select MAX(H2."timestamp") from
>>>>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>>>>                          AND
>>>>>>  H."status" = 'approved')
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <
>>>>>> khellom007@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','khellom007@gmail.com');>> wrote:
>>>>>>
>>>>>>> Dear Maryann,
>>>>>>>
>>>>>>> Thanks for your question. You are right: the query that I was
>>>>>>> writing wasn't the correct one for my purpose. The query that will satisfy
>>>>>>> my request would be:
>>>>>>>
>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>> "Wiki" AS W
>>>>>>> where
>>>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId")
>>>>>>> AND
>>>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>>>>> where (H2."rowId" = W."rowId")) AND
>>>>>>>  (H."status" = 'approved') )
>>>>>>>
>>>>>>> In this query I specify that I need to select the exact version
>>>>>>> which has the Maximum timestamp among all versions of the same data item by
>>>>>>> using MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>>>>
>>>>>>> However I tried such query and it still produces the same error as
>>>>>>> the old query, which is: (Aggregate may not contain columns not in
>>>>>>> GROUP BY.)
>>>>>>>
>>>>>>> Thank you for your help.
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Khaleel
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann.xue@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Hi Khaleel,
>>>>>>>>
>>>>>>>> Thanks for the explanation! But my question was since this is an
>>>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>>>>> status, the EXISTS will return true regardless of whether you are testing
>>>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>>>>> something?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maryann
>>>>>>>>
>>>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <
>>>>>>>> khellom007@gmail.com
>>>>>>>> <javascript:_e(%7B%7D,'cvml','khellom007@gmail.com');>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello Maryann,
>>>>>>>>>
>>>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in
>>>>>>>>> the next release so that I can continue working with this part in my
>>>>>>>>> research project. Thanks for keeping me posted.
>>>>>>>>>
>>>>>>>>> With respect to your question, I am using the "History" table as a
>>>>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>>>>> H."timestamp") inside the query, I am selecting the latest
>>>>>>>>> version (most recent) that is "approved" (which is checked using the
>>>>>>>>> condition H."status" = 'approved')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Khaleel
>>>>>>>>>
>>>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <
>>>>>>>>> maryann.xue@gmail.com
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Khaleel,
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>>>>> will file a JIRA and keep you posted.
>>>>>>>>>>
>>>>>>>>>> One question though, why would we use MAX(H."timestamp") instead
>>>>>>>>>> of H."timestamp"? What difference would it make?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Maryann
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>>>>> khellom007@gmail.com
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','khellom007@gmail.com');>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current"
>>>>>>>>>>> from "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS
>>>>>>>>>>> H where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>>>>
>>>>>>>>>>> I got the following error:
>>>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in
>>>>>>>>>>> GROUP BY. (state=42Y27,code=1018)
>>>>>>>>>>>
>>>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant
>>>>>>>>>>> value (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>>>>>>
>>>>>>>>>>> So it seems that Phoenix generates error when using a reference
>>>>>>>>>>> to an outer query while using an aggregate within the inner query.
>>>>>>>>>>>
>>>>>>>>>>> Any solutions?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>>>>> Research Associate
>>>>>>>>>>>
>>>>>>>>>>> American University of Beirut
>>>>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','kwm03@aub.edu.lb');>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by khaleel mershad <kh...@gmail.com>.
Thanks for the explanation. I see what you mean. I will also try testing a
query that contains nested correlated subqueries with multiple level outer
reference, and with using IN instead of EXISTS and I will see if it
produces an error.



Best,
Khaleel

On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Khaleel,
>
> Thanks for looking into the problem! But there IS a bug with EXISTS having
> an aggregate function, say "max(H."timestamp") as used in your very first
> query that did not work. Otherwise, if subquery is SELECT * or SELECT
> column1 or SELECT substr(column1), it's fine coz there is no aggregate.
>
> And interestingly the query with W."rowId" failed because it referenced to
> the outmost query that had EXISTS. Otherwise if using H.
> rowId", it only had to deal with "SELECT * from H" which was a comparison
> subquery and so it worked fine.
>
> Anyway, I will verify if there is any problem with multiple level outer
> reference in correlated subqueries, as an independent issue without EXISTS.
>
> Given that our EXISTS support is currently incomplete, you may want to
> rewrite your EXISTS subqueries with equivalent IN subqueries, which would
> most likely just work.
>
>
> Thanks,
> Maryann
>
>
> On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <kh...@gmail.com>
> wrote:
>
>>
>> I don't think that the bug is with EXISTS. As you can see with the query
>> from my last email, it contains a nested correlated subquery which contains
>> an aggregate within *EXISTS* and it worked after replacing the reference
>> to the outer table (W."rowId") with a reference to the inner subquery table
>> (H."rowId") within the aggregate subquery. In other words, the following
>> query generates an error:
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where EXISTS (select * from "History" AS H
>>                         where
>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>> from "History" AS H2
>>                                                           where (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
>> =*
>> *
>> W."rowId"*)
>>                                                           AND
>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>> AND NOT EXISTS (select * from "History" AS H3
>>                               where
>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>                                 AND (H3."status" = 'pending') AND
>> (TO_NUMBER(H3."value") < 1000));
>>
>>
>> But if I replace *W."rowId"* with REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
>> the query works fine (which is the query from my last email). So I think
>> the problem is not with EXISTS, but with the fact that a reference to a
>> table from the outer query (W in my query) can be done up to a maximum one
>> nested level, and not more than that. In other words, referring to a table
>> from the outer query from within a subquery that is in the second or more
>> nested level will generate an error.
>>
>>
>>
>> Best,
>> Khaleel
>>
>>
>>
>> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Glad that your queries worked. Please do let us know if any further
>>> problems.
>>>
>>> The bug is with EXISTS. Right now Phoenix does not handle aggregate
>>> function calls correctly within an EXISTS subquery or EXISTS nested
>>> subquery. I have opened a JIRA:
>>> https://issues.apache.org/jira/browse/PHOENIX-1923.
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>>
>>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <kh...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Dear Maryann,
>>>>
>>>> I tried your query and it worked. I also executed a more complex query
>>>> which I need in my testing, which is:
>>>>
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W
>>>> where EXISTS (select * from "History" AS H
>>>>                         where
>>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>>>> from "History" AS H2
>>>>                                                           where
>>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>>>
>>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>>>                                                           AND
>>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>>>> AND NOT EXISTS (select * from "History" AS H3
>>>>                               where
>>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>>>                                 AND (H3."status" = 'pending') AND
>>>> (TO_NUMBER(H3."value") < 1000));
>>>>
>>>> and it also worked after I understood your fix of the original query.
>>>>
>>>> So the trick here is that we can use the reference to the outer query
>>>> within the next subquery level only, and not up to two levels as I was
>>>> doing? Maybe this limitation exists because Phoenix joins the tables from
>>>> the outer and the inner correlated query, but it can perform this join up
>>>> to one level only?
>>>>
>>>>
>>>>
>>>> Best,
>>>> Khaleel
>>>>
>>>>
>>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Khaleel,
>>>>>
>>>>> Mind if you try the following query? I think it's the same semantics
>>>>> as you meant for your query.
>>>>>
>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>> "Wiki" AS W
>>>>> where
>>>>> W."rowId" in (select H"rowId" from "History" AS H
>>>>>                          where H."timestamp" =
>>>>>                                 (select MAX(H2."timestamp") from
>>>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>>>                          AND
>>>>>  H."status" = 'approved')
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom007@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Dear Maryann,
>>>>>>
>>>>>> Thanks for your question. You are right: the query that I was writing
>>>>>> wasn't the correct one for my purpose. The query that will satisfy my
>>>>>> request would be:
>>>>>>
>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>> "Wiki" AS W
>>>>>> where
>>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>>>> where (H2."rowId" = W."rowId")) AND
>>>>>>  (H."status" = 'approved') )
>>>>>>
>>>>>> In this query I specify that I need to select the exact version which
>>>>>> has the Maximum timestamp among all versions of the same data item by using
>>>>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>>>
>>>>>> However I tried such query and it still produces the same error as
>>>>>> the old query, which is: (Aggregate may not contain columns not in
>>>>>> GROUP BY.)
>>>>>>
>>>>>> Thank you for your help.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Khaleel
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Khaleel,
>>>>>>>
>>>>>>> Thanks for the explanation! But my question was since this is an
>>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>>>> status, the EXISTS will return true regardless of whether you are testing
>>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>>>> something?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <
>>>>>>> khellom007@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hello Maryann,
>>>>>>>>
>>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in
>>>>>>>> the next release so that I can continue working with this part in my
>>>>>>>> research project. Thanks for keeping me posted.
>>>>>>>>
>>>>>>>> With respect to your question, I am using the "History" table as a
>>>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>>>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>>>>>>> = 'approved')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Khaleel
>>>>>>>>
>>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann.xue@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Khaleel,
>>>>>>>>>
>>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>>>> will file a JIRA and keep you posted.
>>>>>>>>>
>>>>>>>>> One question though, why would we use MAX(H."timestamp") instead
>>>>>>>>> of H."timestamp"? What difference would it make?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Maryann
>>>>>>>>>
>>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>>>> khellom007@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>>>
>>>>>>>>>> I got the following error:
>>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in
>>>>>>>>>> GROUP BY. (state=42Y27,code=1018)
>>>>>>>>>>
>>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant
>>>>>>>>>> value (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>>>>>
>>>>>>>>>> So it seems that Phoenix generates error when using a reference
>>>>>>>>>> to an outer query while using an aggregate within the inner query.
>>>>>>>>>>
>>>>>>>>>> Any solutions?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>>>> Research Associate
>>>>>>>>>>
>>>>>>>>>> American University of Beirut
>>>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Hi Khaleel,

Thanks for looking into the problem! But there IS a bug with EXISTS having
an aggregate function, say "max(H."timestamp") as used in your very first
query that did not work. Otherwise, if subquery is SELECT * or SELECT
column1 or SELECT substr(column1), it's fine coz there is no aggregate.

And interestingly the query with W."rowId" failed because it referenced to
the outmost query that had EXISTS. Otherwise if using H.
rowId", it only had to deal with "SELECT * from H" which was a comparison
subquery and so it worked fine.

Anyway, I will verify if there is any problem with multiple level outer
reference in correlated subqueries, as an independent issue without EXISTS.

Given that our EXISTS support is currently incomplete, you may want to
rewrite your EXISTS subqueries with equivalent IN subqueries, which would
most likely just work.


Thanks,
Maryann


On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <kh...@gmail.com>
wrote:

>
> I don't think that the bug is with EXISTS. As you can see with the query
> from my last email, it contains a nested correlated subquery which contains
> an aggregate within *EXISTS* and it worked after replacing the reference
> to the outer table (W."rowId") with a reference to the inner subquery table
> (H."rowId") within the aggregate subquery. In other words, the following
> query generates an error:
>
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W
> where EXISTS (select * from "History" AS H
>                         where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
> = W."rowId")
>                         AND H."timestamp" = (select MAX(H2."timestamp")
> from "History" AS H2
>                                                           where (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
> =*
> *
> W."rowId"*)
>                                                           AND (H2."status"
> = 'approved')) AND (TO_NUMBER(H."value") > 1000))
> AND NOT EXISTS (select * from "History" AS H3
>                               where
> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>                                 AND (H3."status" = 'pending') AND
> (TO_NUMBER(H3."value") < 1000));
>
>
> But if I replace *W."rowId"* with REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
> the query works fine (which is the query from my last email). So I think
> the problem is not with EXISTS, but with the fact that a reference to a
> table from the outer query (W in my query) can be done up to a maximum one
> nested level, and not more than that. In other words, referring to a table
> from the outer query from within a subquery that is in the second or more
> nested level will generate an error.
>
>
>
> Best,
> Khaleel
>
>
>
> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Glad that your queries worked. Please do let us know if any further
>> problems.
>>
>> The bug is with EXISTS. Right now Phoenix does not handle aggregate
>> function calls correctly within an EXISTS subquery or EXISTS nested
>> subquery. I have opened a JIRA:
>> https://issues.apache.org/jira/browse/PHOENIX-1923.
>>
>>
>> Thanks,
>> Maryann
>>
>>
>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <kh...@gmail.com>
>> wrote:
>>
>>>
>>> Dear Maryann,
>>>
>>> I tried your query and it worked. I also executed a more complex query
>>> which I need in my testing, which is:
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where EXISTS (select * from "History" AS H
>>>                         where
>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>>> from "History" AS H2
>>>                                                           where
>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>>
>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>>                                                           AND
>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>>> AND NOT EXISTS (select * from "History" AS H3
>>>                               where
>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>>                                 AND (H3."status" = 'pending') AND
>>> (TO_NUMBER(H3."value") < 1000));
>>>
>>> and it also worked after I understood your fix of the original query.
>>>
>>> So the trick here is that we can use the reference to the outer query
>>> within the next subquery level only, and not up to two levels as I was
>>> doing? Maybe this limitation exists because Phoenix joins the tables from
>>> the outer and the inner correlated query, but it can perform this join up
>>> to one level only?
>>>
>>>
>>>
>>> Best,
>>> Khaleel
>>>
>>>
>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Hi Khaleel,
>>>>
>>>> Mind if you try the following query? I think it's the same semantics as
>>>> you meant for your query.
>>>>
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W
>>>> where
>>>> W."rowId" in (select H"rowId" from "History" AS H
>>>>                          where H."timestamp" =
>>>>                                 (select MAX(H2."timestamp") from
>>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>>                          AND
>>>>  H."status" = 'approved')
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <kh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Maryann,
>>>>>
>>>>> Thanks for your question. You are right: the query that I was writing
>>>>> wasn't the correct one for my purpose. The query that will satisfy my
>>>>> request would be:
>>>>>
>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>> "Wiki" AS W
>>>>> where
>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>>> where (H2."rowId" = W."rowId")) AND
>>>>>  (H."status" = 'approved') )
>>>>>
>>>>> In this query I specify that I need to select the exact version which
>>>>> has the Maximum timestamp among all versions of the same data item by using
>>>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>>
>>>>> However I tried such query and it still produces the same error as the
>>>>> old query, which is: (Aggregate may not contain columns not in GROUP
>>>>> BY.)
>>>>>
>>>>> Thank you for your help.
>>>>>
>>>>>
>>>>> Best,
>>>>> Khaleel
>>>>>
>>>>>
>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Khaleel,
>>>>>>
>>>>>> Thanks for the explanation! But my question was since this is an
>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>>> status, the EXISTS will return true regardless of whether you are testing
>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>>> something?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <
>>>>>> khellom007@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hello Maryann,
>>>>>>>
>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in
>>>>>>> the next release so that I can continue working with this part in my
>>>>>>> research project. Thanks for keeping me posted.
>>>>>>>
>>>>>>> With respect to your question, I am using the "History" table as a
>>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>>>>>> = 'approved')
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Khaleel
>>>>>>>
>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Khaleel,
>>>>>>>>
>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>>> will file a JIRA and keep you posted.
>>>>>>>>
>>>>>>>> One question though, why would we use MAX(H."timestamp") instead
>>>>>>>> of H."timestamp"? What difference would it make?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maryann
>>>>>>>>
>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>>> khellom007@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>>
>>>>>>>>> I got the following error:
>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
>>>>>>>>> BY. (state=42Y27,code=1018)
>>>>>>>>>
>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>>>>
>>>>>>>>> So it seems that Phoenix generates error when using a reference to
>>>>>>>>> an outer query while using an aggregate within the inner query.
>>>>>>>>>
>>>>>>>>> Any solutions?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>>> Research Associate
>>>>>>>>>
>>>>>>>>> American University of Beirut
>>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by khaleel mershad <kh...@gmail.com>.
I don't think that the bug is with EXISTS. As you can see with the query
from my last email, it contains a nested correlated subquery which contains
an aggregate within *EXISTS* and it worked after replacing the reference to
the outer table (W."rowId") with a reference to the inner subquery table
(H."rowId") within the aggregate subquery. In other words, the following
query generates an error:

select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W
where EXISTS (select * from "History" AS H
                        where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
= W."rowId")
                        AND H."timestamp" = (select MAX(H2."timestamp")
from "History" AS H2
                                                          where
(*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
=*
*
W."rowId"*)
                                                          AND (H2."status"
= 'approved')) AND (TO_NUMBER(H."value") > 1000))
AND NOT EXISTS (select * from "History" AS H3
                              where
(REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
                                AND (H3."status" = 'pending') AND
(TO_NUMBER(H3."value") < 1000));


But if I replace *W."rowId"* with
REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
the query works fine (which is the query from my last email). So I think
the problem is not with EXISTS, but with the fact that a reference to a
table from the outer query (W in my query) can be done up to a maximum one
nested level, and not more than that. In other words, referring to a table
from the outer query from within a subquery that is in the second or more
nested level will generate an error.



Best,
Khaleel



On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <ma...@gmail.com> wrote:

> Glad that your queries worked. Please do let us know if any further
> problems.
>
> The bug is with EXISTS. Right now Phoenix does not handle aggregate
> function calls correctly within an EXISTS subquery or EXISTS nested
> subquery. I have opened a JIRA:
> https://issues.apache.org/jira/browse/PHOENIX-1923.
>
>
> Thanks,
> Maryann
>
>
> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <kh...@gmail.com>
> wrote:
>
>>
>> Dear Maryann,
>>
>> I tried your query and it worked. I also executed a more complex query
>> which I need in my testing, which is:
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where EXISTS (select * from "History" AS H
>>                         where
>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>> from "History" AS H2
>>                                                           where
>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>
>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>                                                           AND
>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>> AND NOT EXISTS (select * from "History" AS H3
>>                               where
>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>                                 AND (H3."status" = 'pending') AND
>> (TO_NUMBER(H3."value") < 1000));
>>
>> and it also worked after I understood your fix of the original query.
>>
>> So the trick here is that we can use the reference to the outer query
>> within the next subquery level only, and not up to two levels as I was
>> doing? Maybe this limitation exists because Phoenix joins the tables from
>> the outer and the inner correlated query, but it can perform this join up
>> to one level only?
>>
>>
>>
>> Best,
>> Khaleel
>>
>>
>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Hi Khaleel,
>>>
>>> Mind if you try the following query? I think it's the same semantics as
>>> you meant for your query.
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where
>>> W."rowId" in (select H"rowId" from "History" AS H
>>>                          where H."timestamp" =
>>>                                 (select MAX(H2."timestamp") from
>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>                          AND
>>>  H."status" = 'approved')
>>>
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <kh...@gmail.com>
>>> wrote:
>>>
>>>> Dear Maryann,
>>>>
>>>> Thanks for your question. You are right: the query that I was writing
>>>> wasn't the correct one for my purpose. The query that will satisfy my
>>>> request would be:
>>>>
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W
>>>> where
>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>> where (H2."rowId" = W."rowId")) AND
>>>>  (H."status" = 'approved') )
>>>>
>>>> In this query I specify that I need to select the exact version which
>>>> has the Maximum timestamp among all versions of the same data item by using
>>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>
>>>> However I tried such query and it still produces the same error as the
>>>> old query, which is: (Aggregate may not contain columns not in GROUP
>>>> BY.)
>>>>
>>>> Thank you for your help.
>>>>
>>>>
>>>> Best,
>>>> Khaleel
>>>>
>>>>
>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Khaleel,
>>>>>
>>>>> Thanks for the explanation! But my question was since this is an
>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>> status, the EXISTS will return true regardless of whether you are testing
>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>> something?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <khellom007@gmail.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>> Hello Maryann,
>>>>>>
>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>>>>> next release so that I can continue working with this part in my research
>>>>>> project. Thanks for keeping me posted.
>>>>>>
>>>>>> With respect to your question, I am using the "History" table as a
>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>>>>> = 'approved')
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Khaleel
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Khaleel,
>>>>>>>
>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>> will file a JIRA and keep you posted.
>>>>>>>
>>>>>>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>>>>>>> What difference would it make?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>> khellom007@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>
>>>>>>>> I got the following error:
>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
>>>>>>>> BY. (state=42Y27,code=1018)
>>>>>>>>
>>>>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>>>
>>>>>>>> So it seems that Phoenix generates error when using a reference to
>>>>>>>> an outer query while using an aggregate within the inner query.
>>>>>>>>
>>>>>>>> Any solutions?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>> Research Associate
>>>>>>>>
>>>>>>>> American University of Beirut
>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Glad that your queries worked. Please do let us know if any further
problems.

The bug is with EXISTS. Right now Phoenix does not handle aggregate
function calls correctly within an EXISTS subquery or EXISTS nested
subquery. I have opened a JIRA:
https://issues.apache.org/jira/browse/PHOENIX-1923.


Thanks,
Maryann

On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <kh...@gmail.com>
wrote:

>
> Dear Maryann,
>
> I tried your query and it worked. I also executed a more complex query
> which I need in my testing, which is:
>
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W
> where EXISTS (select * from "History" AS H
>                         where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
> = W."rowId")
>                         AND H."timestamp" = (select MAX(H2."timestamp")
> from "History" AS H2
>                                                           where
> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>
> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>                                                           AND (H2."status"
> = 'approved')) AND (TO_NUMBER(H."value") > 1000))
> AND NOT EXISTS (select * from "History" AS H3
>                               where
> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>                                 AND (H3."status" = 'pending') AND
> (TO_NUMBER(H3."value") < 1000));
>
> and it also worked after I understood your fix of the original query.
>
> So the trick here is that we can use the reference to the outer query
> within the next subquery level only, and not up to two levels as I was
> doing? Maybe this limitation exists because Phoenix joins the tables from
> the outer and the inner correlated query, but it can perform this join up
> to one level only?
>
>
>
> Best,
> Khaleel
>
>
> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Hi Khaleel,
>>
>> Mind if you try the following query? I think it's the same semantics as
>> you meant for your query.
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where
>> W."rowId" in (select H"rowId" from "History" AS H
>>                          where H."timestamp" =
>>                                 (select MAX(H2."timestamp") from
>> "History" AS H2 where H2."rowId" = H."rowId")
>>                          AND
>>  H."status" = 'approved')
>>
>>
>>
>> Thanks,
>> Maryann
>>
>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <kh...@gmail.com>
>> wrote:
>>
>>> Dear Maryann,
>>>
>>> Thanks for your question. You are right: the query that I was writing
>>> wasn't the correct one for my purpose. The query that will satisfy my
>>> request would be:
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where
>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>> where (H2."rowId" = W."rowId")) AND
>>>  (H."status" = 'approved') )
>>>
>>> In this query I specify that I need to select the exact version which
>>> has the Maximum timestamp among all versions of the same data item by using
>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>
>>> However I tried such query and it still produces the same error as the
>>> old query, which is: (Aggregate may not contain columns not in GROUP BY.
>>> )
>>>
>>> Thank you for your help.
>>>
>>>
>>> Best,
>>> Khaleel
>>>
>>>
>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Hi Khaleel,
>>>>
>>>> Thanks for the explanation! But my question was since this is an
>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>> status, the EXISTS will return true regardless of whether you are testing
>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>> something?
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <kh...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hello Maryann,
>>>>>
>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>>>> next release so that I can continue working with this part in my research
>>>>> project. Thanks for keeping me posted.
>>>>>
>>>>> With respect to your question, I am using the "History" table as a
>>>>> data store of all versions of a certain data item. When I say Max(
>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>>>> = 'approved')
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Khaleel
>>>>>
>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Khaleel,
>>>>>>
>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>> will file a JIRA and keep you posted.
>>>>>>
>>>>>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>>>>>> What difference would it make?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>> khellom007@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I tried executing the following query in Phoenix:
>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>
>>>>>>> I got the following error:
>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
>>>>>>> BY. (state=42Y27,code=1018)
>>>>>>>
>>>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>>
>>>>>>> So it seems that Phoenix generates error when using a reference to
>>>>>>> an outer query while using an aggregate within the inner query.
>>>>>>>
>>>>>>> Any solutions?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>> Research Associate
>>>>>>>
>>>>>>> American University of Beirut
>>>>>>> Department of Electrical and Computer Engineering
>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by khaleel mershad <kh...@gmail.com>.
Dear Maryann,

I tried your query and it worked. I also executed a more complex query
which I need in my testing, which is:

select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W
where EXISTS (select * from "History" AS H
                        where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
= W."rowId")
                        AND H."timestamp" = (select MAX(H2."timestamp")
from "History" AS H2
                                                          where
(REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =

REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
                                                          AND (H2."status"
= 'approved')) AND (TO_NUMBER(H."value") > 1000))
AND NOT EXISTS (select * from "History" AS H3
                              where
(REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
                                AND (H3."status" = 'pending') AND
(TO_NUMBER(H3."value") < 1000));

and it also worked after I understood your fix of the original query.

So the trick here is that we can use the reference to the outer query
within the next subquery level only, and not up to two levels as I was
doing? Maybe this limitation exists because Phoenix joins the tables from
the outer and the inner correlated query, but it can perform this join up
to one level only?



Best,
Khaleel


On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Khaleel,
>
> Mind if you try the following query? I think it's the same semantics as
> you meant for your query.
>
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W
> where
> W."rowId" in (select H"rowId" from "History" AS H
>                          where H."timestamp" =
>                                 (select MAX(H2."timestamp") from "History"
> AS H2 where H2."rowId" = H."rowId")
>                          AND
>  H."status" = 'approved')
>
>
>
> Thanks,
> Maryann
>
> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <kh...@gmail.com>
> wrote:
>
>> Dear Maryann,
>>
>> Thanks for your question. You are right: the query that I was writing
>> wasn't the correct one for my purpose. The query that will satisfy my
>> request would be:
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where
>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>> where (H2."rowId" = W."rowId")) AND
>>  (H."status" = 'approved') )
>>
>> In this query I specify that I need to select the exact version which has
>> the Maximum timestamp among all versions of the same data item by using
>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>
>> However I tried such query and it still produces the same error as the
>> old query, which is: (Aggregate may not contain columns not in GROUP BY.)
>>
>> Thank you for your help.
>>
>>
>> Best,
>> Khaleel
>>
>>
>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Hi Khaleel,
>>>
>>> Thanks for the explanation! But my question was since this is an EXISTS,
>>> I assume if there ever is a H."timestamp" with an 'approved' status, the
>>> EXISTS will return true regardless of whether you are testing H."timestamp"
>>> or Max(H."timestamp"). Is that correct? or have I missed something?
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <kh...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hello Maryann,
>>>>
>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>>> next release so that I can continue working with this part in my research
>>>> project. Thanks for keeping me posted.
>>>>
>>>> With respect to your question, I am using the "History" table as a data
>>>> store of all versions of a certain data item. When I say Max(
>>>> H."timestamp") inside the query, I am selecting the latest version
>>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>>> = 'approved')
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Khaleel
>>>>
>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Khaleel,
>>>>>
>>>>> Thanks a lot for reporting the problem, which looks like a bug. I will
>>>>> file a JIRA and keep you posted.
>>>>>
>>>>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>>>>> What difference would it make?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <khellom007@gmail.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I tried executing the following query in Phoenix:
>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>
>>>>>> I got the following error:
>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
>>>>>> BY. (state=42Y27,code=1018)
>>>>>>
>>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>>
>>>>>> So it seems that Phoenix generates error when using a reference to an
>>>>>> outer query while using an aggregate within the inner query.
>>>>>>
>>>>>> Any solutions?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Khaleel Mershad, Ph.D.
>>>>>> Research Associate
>>>>>>
>>>>>> American University of Beirut
>>>>>> Department of Electrical and Computer Engineering
>>>>>> Bliss Street, Beirut, Lebanon
>>>>>> email: kwm03@aub.edu.lb
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Hi Khaleel,

Mind if you try the following query? I think it's the same semantics as you
meant for your query.

select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W
where
W."rowId" in (select H"rowId" from "History" AS H
                         where H."timestamp" =
                                (select MAX(H2."timestamp") from "History"
AS H2 where H2."rowId" = H."rowId")
                         AND
 H."status" = 'approved')



Thanks,
Maryann

On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <kh...@gmail.com>
wrote:

> Dear Maryann,
>
> Thanks for your question. You are right: the query that I was writing
> wasn't the correct one for my purpose. The query that will satisfy my
> request would be:
>
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W
> where
> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
> where (H2."rowId" = W."rowId")) AND
>  (H."status" = 'approved') )
>
> In this query I specify that I need to select the exact version which has
> the Maximum timestamp among all versions of the same data item by using
> MAX(H2."timestamp") within the inner subquery within EXISTS.
>
> However I tried such query and it still produces the same error as the old
> query, which is: (Aggregate may not contain columns not in GROUP BY.)
>
> Thank you for your help.
>
>
> Best,
> Khaleel
>
>
> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Hi Khaleel,
>>
>> Thanks for the explanation! But my question was since this is an EXISTS,
>> I assume if there ever is a H."timestamp" with an 'approved' status, the
>> EXISTS will return true regardless of whether you are testing H."timestamp"
>> or Max(H."timestamp"). Is that correct? or have I missed something?
>>
>>
>> Thanks,
>> Maryann
>>
>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <kh...@gmail.com>
>> wrote:
>>
>>>
>>> Hello Maryann,
>>>
>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>> next release so that I can continue working with this part in my research
>>> project. Thanks for keeping me posted.
>>>
>>> With respect to your question, I am using the "History" table as a data
>>> store of all versions of a certain data item. When I say Max(
>>> H."timestamp") inside the query, I am selecting the latest version
>>> (most recent) that is "approved" (which is checked using the condition H."status"
>>> = 'approved')
>>>
>>>
>>>
>>> Best Regards,
>>> Khaleel
>>>
>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Hi Khaleel,
>>>>
>>>> Thanks a lot for reporting the problem, which looks like a bug. I will
>>>> file a JIRA and keep you posted.
>>>>
>>>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>>>> What difference would it make?
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <kh...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I tried executing the following query in Phoenix:
>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>
>>>>> I got the following error:
>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
>>>>> (state=42Y27,code=1018)
>>>>>
>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>>
>>>>> So it seems that Phoenix generates error when using a reference to an
>>>>> outer query while using an aggregate within the inner query.
>>>>>
>>>>> Any solutions?
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Khaleel Mershad, Ph.D.
>>>>> Research Associate
>>>>>
>>>>> American University of Beirut
>>>>> Department of Electrical and Computer Engineering
>>>>> Bliss Street, Beirut, Lebanon
>>>>> email: kwm03@aub.edu.lb
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by khaleel mershad <kh...@gmail.com>.
Dear Maryann,

Thanks for your question. You are right: the query that I was writing
wasn't the correct one for my purpose. The query that will satisfy my
request would be:

select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W
where
EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
where (H2."rowId" = W."rowId")) AND
 (H."status" = 'approved') )

In this query I specify that I need to select the exact version which has
the Maximum timestamp among all versions of the same data item by using
MAX(H2."timestamp") within the inner subquery within EXISTS.

However I tried such query and it still produces the same error as the old
query, which is: (Aggregate may not contain columns not in GROUP BY.)

Thank you for your help.


Best,
Khaleel


On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Khaleel,
>
> Thanks for the explanation! But my question was since this is an EXISTS, I
> assume if there ever is a H."timestamp" with an 'approved' status, the
> EXISTS will return true regardless of whether you are testing H."timestamp"
> or Max(H."timestamp"). Is that correct? or have I missed something?
>
>
> Thanks,
> Maryann
>
> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <kh...@gmail.com>
> wrote:
>
>>
>> Hello Maryann,
>>
>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>> next release so that I can continue working with this part in my research
>> project. Thanks for keeping me posted.
>>
>> With respect to your question, I am using the "History" table as a data
>> store of all versions of a certain data item. When I say Max(
>> H."timestamp") inside the query, I am selecting the latest version (most
>> recent) that is "approved" (which is checked using the condition H."status"
>> = 'approved')
>>
>>
>>
>> Best Regards,
>> Khaleel
>>
>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Hi Khaleel,
>>>
>>> Thanks a lot for reporting the problem, which looks like a bug. I will
>>> file a JIRA and keep you posted.
>>>
>>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>>> What difference would it make?
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <kh...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> I tried executing the following query in Phoenix:
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>
>>>> I got the following error:
>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
>>>> (state=42Y27,code=1018)
>>>>
>>>> If I replace W."rowId" (in the where clause) with a constant value (for
>>>> example: '3587'), the query works fine. Also, if I replace the aggregate
>>>> MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>>
>>>> So it seems that Phoenix generates error when using a reference to an
>>>> outer query while using an aggregate within the inner query.
>>>>
>>>> Any solutions?
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Khaleel Mershad, Ph.D.
>>>> Research Associate
>>>>
>>>> American University of Beirut
>>>> Department of Electrical and Computer Engineering
>>>> Bliss Street, Beirut, Lebanon
>>>> email: kwm03@aub.edu.lb
>>>>
>>>
>>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Hi Khaleel,

Thanks for the explanation! But my question was since this is an EXISTS, I
assume if there ever is a H."timestamp" with an 'approved' status, the
EXISTS will return true regardless of whether you are testing H."timestamp"
or Max(H."timestamp"). Is that correct? or have I missed something?


Thanks,
Maryann

On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <kh...@gmail.com>
wrote:

>
> Hello Maryann,
>
> Thanks very much for your reply. Hopefully this bug gets fixed in the next
> release so that I can continue working with this part in my research
> project. Thanks for keeping me posted.
>
> With respect to your question, I am using the "History" table as a data
> store of all versions of a certain data item. When I say Max(H."timestamp")
> inside the query, I am selecting the latest version (most recent) that is
> "approved" (which is checked using the condition H."status" = 'approved')
>
>
>
> Best Regards,
> Khaleel
>
> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Hi Khaleel,
>>
>> Thanks a lot for reporting the problem, which looks like a bug. I will
>> file a JIRA and keep you posted.
>>
>> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
>> What difference would it make?
>>
>>
>> Thanks,
>> Maryann
>>
>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <kh...@gmail.com>
>> wrote:
>>
>>>
>>> Hello,
>>>
>>> I tried executing the following query in Phoenix:
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W where EXISTS( select MAX(H."timestamp") from "History" AS H where
>>> (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>
>>> I got the following error:
>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
>>> (state=42Y27,code=1018)
>>>
>>> If I replace W."rowId" (in the where clause) with a constant value (for
>>> example: '3587'), the query works fine. Also, if I replace the aggregate
>>> MAX(H."timestamp") with H."timestamp", the query also works fine.
>>>
>>> So it seems that Phoenix generates error when using a reference to an
>>> outer query while using an aggregate within the inner query.
>>>
>>> Any solutions?
>>>
>>>
>>>
>>> Regards,
>>>
>>> Khaleel Mershad, Ph.D.
>>> Research Associate
>>>
>>> American University of Beirut
>>> Department of Electrical and Computer Engineering
>>> Bliss Street, Beirut, Lebanon
>>> email: kwm03@aub.edu.lb
>>>
>>
>>
>

Re: Error when using aggregates with correlated subqueries

Posted by khaleel mershad <kh...@gmail.com>.
Hello Maryann,

Thanks very much for your reply. Hopefully this bug gets fixed in the next
release so that I can continue working with this part in my research
project. Thanks for keeping me posted.

With respect to your question, I am using the "History" table as a data
store of all versions of a certain data item. When I say Max(H."timestamp")
inside the query, I am selecting the latest version (most recent) that is
"approved" (which is checked using the condition H."status" = 'approved')



Best Regards,
Khaleel

On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Khaleel,
>
> Thanks a lot for reporting the problem, which looks like a bug. I will
> file a JIRA and keep you posted.
>
> One question though, why would we use MAX(H."timestamp") instead of H."timestamp"?
> What difference would it make?
>
>
> Thanks,
> Maryann
>
> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <kh...@gmail.com>
> wrote:
>
>>
>> Hello,
>>
>> I tried executing the following query in Phoenix:
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W where EXISTS( select MAX(H."timestamp") from "History" AS H where
>> (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>
>> I got the following error:
>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
>> (state=42Y27,code=1018)
>>
>> If I replace W."rowId" (in the where clause) with a constant value (for
>> example: '3587'), the query works fine. Also, if I replace the aggregate
>> MAX(H."timestamp") with H."timestamp", the query also works fine.
>>
>> So it seems that Phoenix generates error when using a reference to an
>> outer query while using an aggregate within the inner query.
>>
>> Any solutions?
>>
>>
>>
>> Regards,
>>
>> Khaleel Mershad, Ph.D.
>> Research Associate
>>
>> American University of Beirut
>> Department of Electrical and Computer Engineering
>> Bliss Street, Beirut, Lebanon
>> email: kwm03@aub.edu.lb
>>
>
>

Re: Error when using aggregates with correlated subqueries

Posted by Maryann Xue <ma...@gmail.com>.
Hi Khaleel,

Thanks a lot for reporting the problem, which looks like a bug. I will file
a JIRA and keep you posted.

One question though, why would we use MAX(H."timestamp") instead of
H."timestamp"?
What difference would it make?


Thanks,
Maryann

On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <kh...@gmail.com>
wrote:

>
> Hello,
>
> I tried executing the following query in Phoenix:
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W where EXISTS( select MAX(H."timestamp") from "History" AS H where
> (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>
> I got the following error:
> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
> (state=42Y27,code=1018)
>
> If I replace W."rowId" (in the where clause) with a constant value (for
> example: '3587'), the query works fine. Also, if I replace the aggregate
> MAX(H."timestamp") with H."timestamp", the query also works fine.
>
> So it seems that Phoenix generates error when using a reference to an
> outer query while using an aggregate within the inner query.
>
> Any solutions?
>
>
>
> Regards,
>
> Khaleel Mershad, Ph.D.
> Research Associate
>
> American University of Beirut
> Department of Electrical and Computer Engineering
> Bliss Street, Beirut, Lebanon
> email: kwm03@aub.edu.lb
>