You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Maryann Xue <ma...@gmail.com> on 2015/04/07 19:43:37 UTC

Re: indexed query question

Hi Ralph,

Now that we have implemented (on master)
https://issues.apache.org/jira/browse/PHOENIX-1580 and
https://issues.apache.org/jira/browse/PHOENIX-1807, I think you should be
able to enable your multiple index use case, which actually looks great to
us. And please let us know if there are other problems.

For the AND logic, you still need to use multiple subqueries connected by
AND, like I mentioned before:

SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ?) AND pk IN
(SELECT pk from t where q2 = ?);

And for the OR logic, you can now use UNION ALL in your subquery connecting
those "or" conditions. For example:

SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ? UNION ALL SELECT
pk from t where q2 = ?);





Thanks,
Maryann

On Mon, Jan 19, 2015 at 2:34 PM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Ralph,
>
> I think in your case this is indeed a nice approach. Given that INTERSECT
> is not yet supported in Phoenix, you can instead use AND to connect your
> conditions, which would work almost as efficiently as applying INTERSECT on
> your inner queries:
>
> SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ?) AND pk IN
> (SELECT pk from t where q2 = ?);
>
> If your query is of OR logic, unfortunately there is currently no optimal
> way to do this until we have UNION (
> https://issues.apache.org/jira/browse/PHOENIX-671) implemented. That is
> because with OR subqueries, we do a left join instead of a semi join, and
> left joins do not use PK skip scan according to its semantics. A query
> similar to the above example with OR logic would only run slower than one
> simply doing a full-scan on the main data table (without using the index
> table at all).
>
>
>
> Thanks,
> Maryann
>
>
> On Mon, Jan 19, 2015 at 1:25 PM, Perko, Ralph J <Ra...@pnnl.gov>
> wrote:
>
>>   Hi
>>
>>  I have a question about the most efficient way to query many indexed
>> columns.  Here is the scenario:
>>
>>  Say I have a table with 100 fields
>>
>>  Table {f1,f2,f3,f4,…f100}
>>
>>  The first 10 fields are core fields and the client wishes to query them
>> in any combination.
>>
>>  This is too many fields to create a secondary index for every
>> combination so I create just 10, one for each core field.
>>
>>  Then when a query is submitted, I create a single query for each
>> secondary index and return just the PK, thus taking advantage to the index:
>>
>>  Example – for each indexed field included in the query:
>>
>>  SELECT PK FROM Table WHERE <indexed field> = <some value>’;
>>
>>  Then once I have a list of all PK values from all the index queries I
>> will either get the combination or intersection of all PKs, depending on
>> the query logic (and/or)and then run a final select:
>>
>>  SELECT * FROM Table WHERE pk IN(…);
>>
>>  Does this sound like a reasonable approach?
>>
>>  Thanks,
>> Ralph
>>
>>
>

Re: indexed query question

Posted by "Perko, Ralph J" <Ra...@pnnl.gov>.
Maryann,

Thanks for the fix.  Which versions are these available in?

Ralph

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
ralph.perko@pnnl.gov

From: Maryann Xue <ma...@gmail.com>>
Reply-To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Date: Tuesday, April 7, 2015 at 10:43 AM
To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Subject: Re: indexed query question

Hi Ralph,

Now that we have implemented (on master) https://issues.apache.org/jira/browse/PHOENIX-1580 and https://issues.apache.org/jira/browse/PHOENIX-1807, I think you should be able to enable your multiple index use case, which actually looks great to us. And please let us know if there are other problems.

For the AND logic, you still need to use multiple subqueries connected by AND, like I mentioned before:

SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ?) AND pk IN (SELECT pk from t where q2 = ?);

And for the OR logic, you can now use UNION ALL in your subquery connecting those "or" conditions. For example:

SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ? UNION ALL SELECT pk from t where q2 = ?);





Thanks,
Maryann

On Mon, Jan 19, 2015 at 2:34 PM, Maryann Xue <ma...@gmail.com>> wrote:
Hi Ralph,

I think in your case this is indeed a nice approach. Given that INTERSECT is not yet supported in Phoenix, you can instead use AND to connect your conditions, which would work almost as efficiently as applying INTERSECT on your inner queries:

SELECT * FROM t WHERE pk IN (SELECT pk from t where q1 = ?) AND pk IN (SELECT pk from t where q2 = ?);

If your query is of OR logic, unfortunately there is currently no optimal way to do this until we have UNION (https://issues.apache.org/jira/browse/PHOENIX-671) implemented. That is because with OR subqueries, we do a left join instead of a semi join, and left joins do not use PK skip scan according to its semantics. A query similar to the above example with OR logic would only run slower than one simply doing a full-scan on the main data table (without using the index table at all).



Thanks,
Maryann


On Mon, Jan 19, 2015 at 1:25 PM, Perko, Ralph J <Ra...@pnnl.gov>> wrote:
Hi

I have a question about the most efficient way to query many indexed columns.  Here is the scenario:

Say I have a table with 100 fields

Table {f1,f2,f3,f4,…f100}

The first 10 fields are core fields and the client wishes to query them in any combination.

This is too many fields to create a secondary index for every combination so I create just 10, one for each core field.

Then when a query is submitted, I create a single query for each secondary index and return just the PK, thus taking advantage to the index:

Example – for each indexed field included in the query:

SELECT PK FROM Table WHERE <indexed field> = <some value>’;

Then once I have a list of all PK values from all the index queries I will either get the combination or intersection of all PKs, depending on the query logic (and/or)and then run a final select:

SELECT * FROM Table WHERE pk IN(…);

Does this sound like a reasonable approach?

Thanks,
Ralph