You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by James Taylor <ja...@apache.org> on 2017/10/03 17:13:41 UTC

Re: using index with "or" query

Please file a JIRA for the ability to use multiple tables for an OR query
(but that's a pretty big effort). In the meantime, you may be able to work
around the lack of support for UNION by doing a merge sort yourself on the
client as an efficient way to dedup. For example, if you have a table like
this:

CREATE TABLE T (K VARCHAR PRIMARY KEY, A VARCHAR, B VARCHAR)

With an index on A and an index on B, you could do the following:

SELECT A,K FROM T WHERE A=1 ORDER BY A,K
SELECT B,K FROM T WHERE B=2 ORDER BY B,K

Then as you retrieve a row from the first query, you can merge sort (based
on the PK) with the second query and only emit rows when the primary key is
different.

Thanks,
James

On Mon, Sep 25, 2017 at 4:38 AM, venkata subbarayudu <av...@gmail.com>
wrote:

> Hi Noam,
>          You can evaluate below approach
>
>    - Create a temp-table with your target-schema for the result-set
>    - Have indexes created for all your OR clause queries
>    - Fire one-query at a time (with one OR clause) and dump data to
>    target-temp table (here, all later queries updates the records if they are
>    already present)
>    - Read data from target-temp table (after firing all queries) , and
>    drop temp-table
>
> though this approach results in firing more queries, it can complete in
> less time than firing queries without using index
>
> On Wed, Sep 20, 2017 at 9:31 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> If you're using local index, then the hint isn't necessary. However, a
>> covered local index may improve performance (at the expense of extra space)
>> by preventing a local get to retrieve the data row (when a column not in
>> the index is referenced).
>>
>> Phoenix will only use a single index. The only exception is with UNION
>> ALL.
>>
>> Thanks,
>> James
>>
>> On Tue, Sep 19, 2017 at 12:27 PM Sapir, Yoav <Yo...@teoco.com>
>> wrote:
>>
>>> Hi James,
>>> Thank you for the fast reply.
>>>
>>> 1. As far as we saw in Phoenix documentation UNION is not supported,
>>> only UNION ALL.
>>> Breaking the queries to multiple queries with UNION ALL will return
>>> duplicates.
>>> It may be possible to wrap these queries with select distinct, but it
>>> significantly complicates the queries and will have performance impact.
>>> 2. In case of select …. Where a='xyz' or b='123'
>>> I don't see how adding column b as covered column to index on column a
>>> will help for finding a row such as the row (a: 'abc', b: '123')
>>> It will help only for a row such as (a: 'xyz', b: '123')
>>> 3. Hint on global index works for a single index. Is there a way to use
>>> multiple indexes? Hint on using multiple indexes?
>>> Hint on local index cause an error of unknown field if the where clause
>>> refer to a field in the index and to another field that is not part of the
>>> specific local index. There is an open bug on it.
>>>
>>> BR,
>>>
>>> Yoav Sapir
>>>
>>>
>>> On 19 Sep 2017, at 18:21, James Taylor <ja...@apache.org> wrote:
>>>
>>> Hi Noam,
>>> A few ideas:
>>> 1) Use a UNION instead of an OR and you may be able to use more than one
>>> index for one query.
>>> 2) Include the columns you're referencing in the index to make it a
>>> covered index [1].
>>> 3) Hint [2] the query to force the index to be used.
>>>
>>> Thanks,
>>> James
>>>
>>> [1] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#1)
>>> [2] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#2)
>>>
>>> On Tue, Sep 19, 2017 at 4:21 AM Bulvik, Noam <No...@teoco.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> We have a case where we have a table with few index on different
>>>> columns  a, b, c  etc' . It works well if we do select with "and"
>>>> condition  (for example select …. Where a='xyz' and b='123' )but when we
>>>> have or condition (for example select …. Where a='xyz' or b='123') we get
>>>> full scan even though we have index on a and on b.
>>>>
>>>>
>>>>
>>>> Is there a way to get this query to use indexes and not full scan
>>>> beside creating index on all available column combination (index on a+b ,
>>>> index on a +c …)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>>
>>>>
>>>> *Noam *
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> PRIVILEGED AND CONFIDENTIAL
>>>> PLEASE NOTE: The information contained in this message is privileged
>>>> and confidential, and is intended only for the use of the individual to
>>>> whom it is addressed and others who have been specifically authorized to
>>>> receive it. If you are not the intended recipient, you are hereby notified
>>>> that any dissemination, distribution or copying of this communication is
>>>> strictly prohibited. If you have received this communication in error, or
>>>> if any problems occur with transmission, please contact sender. Thank you.
>>>>
>>>
>>> ------------------------------
>>>
>>> PRIVILEGED AND CONFIDENTIAL
>>> PLEASE NOTE: The information contained in this message is privileged and
>>> confidential, and is intended only for the use of the individual to whom it
>>> is addressed and others who have been specifically authorized to receive
>>> it. If you are not the intended recipient, you are hereby notified that any
>>> dissemination, distribution or copying of this communication is strictly
>>> prohibited. If you have received this communication in error, or if any
>>> problems occur with transmission, please contact sender. Thank you.
>>>
>>
>
>
> --
> *Venkata Subbarayudu Amanchi.*
>

Re: using index with "or" query

Posted by Jonathan Leech <jo...@gmail.com>.
I had an idea a while back that I’ll share here because it’s relevant. It’s basically a combined index, or index group, and it would work in this case. It could be implemented in both global and local indexes. The data for two or more indexes would be interleaved. For a local index, the combined index would essentially share the same view_index_id, then there would be an additional field to identify which index it is. Likewise for a global index, minus the view_index_id. 

> On Oct 3, 2017, at 11:13 AM, James Taylor <ja...@apache.org> wrote:
> 
> Please file a JIRA for the ability to use multiple tables for an OR query (but that's a pretty big effort). In the meantime, you may be able to work around the lack of support for UNION by doing a merge sort yourself on the client as an efficient way to dedup. For example, if you have a table like this:
> 
> CREATE TABLE T (K VARCHAR PRIMARY KEY, A VARCHAR, B VARCHAR)
> 
> With an index on A and an index on B, you could do the following:
> 
> SELECT A,K FROM T WHERE A=1 ORDER BY A,K
> SELECT B,K FROM T WHERE B=2 ORDER BY B,K
> 
> Then as you retrieve a row from the first query, you can merge sort (based on the PK) with the second query and only emit rows when the primary key is different.
> 
> Thanks,
> James
> 
>> On Mon, Sep 25, 2017 at 4:38 AM, venkata subbarayudu <av...@gmail.com> wrote:
>> Hi Noam,
>>          You can evaluate below approach
>> Create a temp-table with your target-schema for the result-set
>> Have indexes created for all your OR clause queries
>> Fire one-query at a time (with one OR clause) and dump data to target-temp table (here, all later queries updates the records if they are already present)
>> Read data from target-temp table (after firing all queries) , and drop temp-table
>> though this approach results in firing more queries, it can complete in less time than firing queries without using index
>> 
>> 
>>> On Wed, Sep 20, 2017 at 9:31 PM, James Taylor <ja...@apache.org> wrote:
>>> If you're using local index, then the hint isn't necessary. However, a covered local index may improve performance (at the expense of extra space) by preventing a local get to retrieve the data row (when a column not in the index is referenced).
>>> 
>>> Phoenix will only use a single index. The only exception is with UNION ALL. 
>>> 
>>> Thanks,
>>> James
>>> 
>>>> On Tue, Sep 19, 2017 at 12:27 PM Sapir, Yoav <Yo...@teoco.com> wrote:
>>>> Hi James,
>>>> Thank you for the fast reply. 
>>>> 
>>>> 1. As far as we saw in Phoenix documentation UNION is not supported, only UNION ALL. 
>>>> Breaking the queries to multiple queries with UNION ALL will return duplicates.
>>>> It may be possible to wrap these queries with select distinct, but it significantly complicates the queries and will have performance impact. 
>>>> 2. In case of select …. Where a='xyz' or b='123' 
>>>> I don't see how adding column b as covered column to index on column a will help for finding a row such as the row (a: 'abc', b: '123')
>>>> It will help only for a row such as (a: 'xyz', b: '123')
>>>> 3. Hint on global index works for a single index. Is there a way to use multiple indexes? Hint on using multiple indexes?
>>>> Hint on local index cause an error of unknown field if the where clause refer to a field in the index and to another field that is not part of the specific local index. There is an open bug on it. 
>>>> 
>>>> BR,
>>>> 
>>>> Yoav Sapir
>>>> 
>>>> 
>>>> On 19 Sep 2017, at 18:21, James Taylor <ja...@apache.org> wrote:
>>>> 
>>>> Hi Noam,
>>>> A few ideas:
>>>> 1) Use a UNION instead of an OR and you may be able to use more than one index for one query.
>>>> 2) Include the columns you're referencing in the index to make it a covered index [1].
>>>> 3) Hint [2] the query to force the index to be used.
>>>> 
>>>> Thanks,
>>>> James 
>>>> 
>>>> [1] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#1)
>>>> [2] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#2)
>>>> 
>>>>> On Tue, Sep 19, 2017 at 4:21 AM Bulvik, Noam <No...@teoco.com> wrote:
>>>>> Hi,
>>>>> 
>>>>>  
>>>>> 
>>>>> We have a case where we have a table with few index on different columns  a, b, c  etc' . It works well if we do select with "and" condition  (for example select …. Where a='xyz' and b='123' )but when we have or condition (for example select …. Where a='xyz' or b='123') we get full scan even though we have index on a and on b.
>>>>> 
>>>>>  
>>>>> 
>>>>> Is there a way to get this query to use indexes and not full scan beside creating index on all available column combination (index on a+b , index on a +c …)
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> Regards,
>>>>> 
>>>>>  
>>>>> 
>>>>> Noam
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>> 
>>>>> PRIVILEGED AND CONFIDENTIAL
>>>>> PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with transmission, please contact sender. Thank you.
>>>> 
>>>> 
>>>> PRIVILEGED AND CONFIDENTIAL
>>>> PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with transmission, please contact sender. Thank you.
>> 
>> 
>> 
>> -- 
>> Venkata Subbarayudu Amanchi.
>