You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Ankit Singhal <an...@gmail.com> on 2021/10/11 23:29:58 UTC

Re: Major problem for us with Phoenix joins with certain aggregations

-user@hbase.apache.org,+user@phoenix.apache.org

Are you trying to use the secondary index directly in the join query
(VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as
we don't support direct reference to the secondary index, so either you
need to use HINTs or refactor your query/schema to use indices
automatically.

Regards,
Ankit Singhal

On Mon, Oct 11, 2021 at 2:08 PM Josh Elser <el...@apache.org> wrote:

> No worries. Thanks for confirming!
>
> On 10/10/21 1:43 PM, Simon Mottram wrote:
> > Hi
> >
> > Thanks for the reply, I posted here by mistake and wasn't sure how to
> delete.  It's indeed a problem with phoenix
> >
> > Sorry to waste your time
> >
> > Cheers
> >
> > S
> >
> >
> >
> >
> > ________________________________
> > From: Josh Elser <el...@apache.org>
> > Sent: Saturday, 9 October 2021 3:25 am
> > To: user@hbase.apache.org
> > Subject: Re: Major problem for us with Phoenix joins with certain
> aggregations
> >
> > That error sounds like a bug in Phoenix.
> >
> > Maybe you could try with a newer version of Phoenix? Asking over on
> > user@phoenix might net a better result.
> >
> > On 9/27/21 11:47 PM, Simon Mottram wrote:
> >> Forgot to mention this is only an issue for LAST_VALUE (so far!)
> >>
> >> This works fine
> >>
> >>    SELECT
> >> "BIOMATERIAL_NAME",
> >> AVG("PLANT_FRUIT_COUNT")
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> EXISTS (
> >> SELECT
> >> DOCID
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> WHERE
> >> DOCID = OBSERVATION_VALUE_ID
> >> AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >> ________________________________
> >> From: Simon Mottram <Si...@cucumber.co.nz>
> >> Sent: 28 September 2021 4:34 PM
> >> To: user@hbase.apache.org <us...@hbase.apache.org>
> >> Subject: Major problem for us with Phoenix joins with certain
> aggregations
> >>
> >> Hi
> >>
> >> Got my fingers crossed that there's a work around for this as this
> really is a big problem for us
> >>
> >> We are using:
> >>
> >> Amazon EMR
> >>
> >> Release label:emr-6.1.0
> >> Hadoop distribution:Amazon
> >> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0
> >>
> >> Thin Client version:
> >> phoenix-5.0.0-HBase-2.0-thin-client.jar
> >>
> >> We get the following error when doing an aggregation where
> >>
> >>     1.  A JOIN is empty
> >>     2.  The column is INTEGER or DATETIME
> >>
> >> Remote driver error: IllegalArgumentException: offset (25) + length (4)
> exceed the capacity of the array: 25
> >>
> >> The query that breaks is:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> ON DOCID = OBSERVATION_VALUE_ID
> >>     AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> I can refactor this using EXIST but get same error, presumably the
> driver knows to treat them the same:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> EXISTS (
> >> SELECT
> >> DOCID
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> WHERE
> >> DOCID = OBSERVATION_VALUE_ID
> >> AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> If we remove the external reference we get no error, regardless of
> whether there are any hits or not
> >>
> >> -- these all work
> >> There are no hits for this query
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> BIOMATERIAL_TYPE = 'aardvark'
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> Lots of hits for this query:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >>
> >>
> >>
> >>
> >
>

Re: Major problem for us with Phoenix joins with certain aggregations

Posted by Simon Mottram <Si...@cucumber.co.nz>.
Hiya, no it's not a phoenix index. It's my own table.

As it turns out, the bug turns up when there is any filtering, either via join or via a where clause.

Check my last example which has a simple Where TRUE


________________________________
From: Ankit Singhal <an...@gmail.com>
Sent: Tuesday, 12 October 2021 12:30 pm
To: user
Subject: Re: Major problem for us with Phoenix joins with certain aggregations

-user@hbase.apache.org<ma...@phoenix.apache.org>

Are you trying to use the secondary index directly in the join query (VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as we don't support direct reference to the secondary index, so either you need to use HINTs or refactor your query/schema to use indices automatically.

Regards,
Ankit Singhal

On Mon, Oct 11, 2021 at 2:08 PM Josh Elser <el...@apache.org>> wrote:
No worries. Thanks for confirming!

On 10/10/21 1:43 PM, Simon Mottram wrote:
> Hi
>
> Thanks for the reply, I posted here by mistake and wasn't sure how to delete.  It's indeed a problem with phoenix
>
> Sorry to waste your time
>
> Cheers
>
> S
>
>
>
>
> ________________________________
> From: Josh Elser <el...@apache.org>>
> Sent: Saturday, 9 October 2021 3:25 am
> To: user@hbase.apache.org<ma...@hbase.apache.org>
> Subject: Re: Major problem for us with Phoenix joins with certain aggregations
>
> That error sounds like a bug in Phoenix.
>
> Maybe you could try with a newer version of Phoenix? Asking over on
> user@phoenix might net a better result.
>
> On 9/27/21 11:47 PM, Simon Mottram wrote:
>> Forgot to mention this is only an issue for LAST_VALUE (so far!)
>>
>> This works fine
>>
>>    SELECT
>> "BIOMATERIAL_NAME",
>> AVG("PLANT_FRUIT_COUNT")
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> EXISTS (
>> SELECT
>> DOCID
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> WHERE
>> DOCID = OBSERVATION_VALUE_ID
>> AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>> ________________________________
>> From: Simon Mottram <Si...@cucumber.co.nz>>
>> Sent: 28 September 2021 4:34 PM
>> To: user@hbase.apache.org<ma...@hbase.apache.org> <us...@hbase.apache.org>>
>> Subject: Major problem for us with Phoenix joins with certain aggregations
>>
>> Hi
>>
>> Got my fingers crossed that there's a work around for this as this really is a big problem for us
>>
>> We are using:
>>
>> Amazon EMR
>>
>> Release label:emr-6.1.0
>> Hadoop distribution:Amazon
>> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0
>>
>> Thin Client version:
>> phoenix-5.0.0-HBase-2.0-thin-client.jar
>>
>> We get the following error when doing an aggregation where
>>
>>     1.  A JOIN is empty
>>     2.  The column is INTEGER or DATETIME
>>
>> Remote driver error: IllegalArgumentException: offset (25) + length (4) exceed the capacity of the array: 25
>>
>> The query that breaks is:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> ON DOCID = OBSERVATION_VALUE_ID
>>     AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> I can refactor this using EXIST but get same error, presumably the driver knows to treat them the same:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> EXISTS (
>> SELECT
>> DOCID
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> WHERE
>> DOCID = OBSERVATION_VALUE_ID
>> AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> If we remove the external reference we get no error, regardless of whether there are any hits or not
>>
>> -- these all work
>> There are no hits for this query
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> BIOMATERIAL_TYPE = 'aardvark'
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> Lots of hits for this query:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>>
>>
>>
>>
>