You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Simon Lee (JIRA)" <ji...@apache.org> on 2016/01/15 20:27:39 UTC
[jira] [Updated] (PHOENIX-2601) Query result is incorrect when both index hint and limit are used

     [ https://issues.apache.org/jira/browse/PHOENIX-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Lee updated PHOENIX-2601:
-------------------------------
    Description: 
Query result is incorrect when both index hint and limit are used.

To reproduce the problem,
1. Create an HBase table with a column family. The column family has 4 columns (a1, a2, a3, a4). Create an index on (a1, a2).

2. Populate the table with the following data
{code}
a1  a2     a3      a4
--  -----  -----   -----
1   Small  Red     USA
1   Small  Yellow  UK
1   Small  Green   China
{code}

3. Run the following Phoenix queries, and the query results are correct.
{code}
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' ;
{code}
Three rows are returned as expected
{code}
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
{code}
One row is returned as expected
{code}
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow";
{code}
One row is returned as expected

4. However, with the combination of index hint and limit clause, the query result is incorrect.

{code}
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' and a3 = "Yellow";
{code}
Zero row is returned. The expected result is 1 row (i.e. the second row in the example).

*The explain plan of the problematic query* 
{code}
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 11-CHUNK PARALLEL 1-WAY ... MY_TABLE |
|     SERVER FILTER BY A.A3 = 'Yellow' |
| CLIENT 1 ROW LIMIT                       |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_INDEX [[52,48,48,45,69,79,84,45,48,48,57],23] |
|             SERVER FILTER BY FIRST KEY ONLY |
|             SERVER 1 ROW LIMIT           |
|         CLIENT 1 ROW LIMIT               |
|     DYNAMIC SERVER FILTER BY ("MY_ROW_KEY") IN (($148.$150, $1 |
|     JOIN-SCANNER 1 ROW LIMIT             |
+------------------------------------------+
{code}
 It looks like the query plan uses the index first.  Although all three rows match the index (a1 = '1' and 'a2 = 'Small'), but the intermediate result only has the first row due the "limit 1" clause. In this case, the first row (a3 = 'Red') is returned. Then the query engine tries to apply the (a3 = 'Yellow') to the intermediate result, and get zero row as the final result.

  was:
Query result is incorrect when both index hint and limit are used.

To reproduce the problem,
1. Create an HBase table with a column family. The column family has 4 columns (a1, a2, a3, a4). Create an index on (a1, a2).

2. Populate the table with the following data
{code}
a1  a2     a3      a4
--  -----  -----   -----
1   Small  Red     USA
1   Small  Yellow  UK
1   Small  Green   China
{code}

3. Run the following Phoenix queries, and the query results are correct.
{code}
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' ;
{code}
3 rows are returned as expected
{code}
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' and a3 = "Yellow" limit 1;
{code}
1 row is returned as expected
{code}
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' and a3 = "Yellow";
{code}
1 row is returned as expected

4. However, with the combination of index hint and limit clause, the query result is incorrect.

{code}
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' and a3 = "Yellow";
{code}
0 row is returned. The expected result is 1 row (i.e. the second row in the example).

*The explain plan of the problematic query* 
{code}
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 11-CHUNK PARALLEL 1-WAY ... MY_TABLE |
|     SERVER FILTER BY A.A3 = 'Yellow' |
| CLIENT 1 ROW LIMIT                       |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_INDEX [[52,48,48,45,69,79,84,45,48,48,57],23] |
|             SERVER FILTER BY FIRST KEY ONLY |
|             SERVER 1 ROW LIMIT           |
|         CLIENT 1 ROW LIMIT               |
|     DYNAMIC SERVER FILTER BY ("MY_ROW_KEY") IN (($148.$150, $1 |
|     JOIN-SCANNER 1 ROW LIMIT             |
+------------------------------------------+
{code}
 It looks like the query plan uses the index first.  Although all three rows match the index (a1 = '1' and 'a2 = 'Small'), but the intermediate result only has the first row due the "limit 1" clause. In this case, the first row (a3 = 'Red') is returned. Then the query engine tries to apply the (a3 = 'Yellow') to the intermediate result, and get 0 row as the final result.


> Query result is incorrect when both index hint and limit are used
> -----------------------------------------------------------------
>
>                 Key: PHOENIX-2601
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2601
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.4.0
>         Environment: Linux
>            Reporter: Simon Lee
>
> Query result is incorrect when both index hint and limit are used.
> To reproduce the problem,
> 1. Create an HBase table with a column family. The column family has 4 columns (a1, a2, a3, a4). Create an index on (a1, a2).
> 2. Populate the table with the following data
> {code}
> a1  a2     a3      a4
> --  -----  -----   -----
> 1   Small  Red     USA
> 1   Small  Yellow  UK
> 1   Small  Green   China
> {code}
> 3. Run the following Phoenix queries, and the query results are correct.
> {code}
> select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' ;
> {code}
> Three rows are returned as expected
> {code}
> select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
> {code}
> One row is returned as expected
> {code}
> select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow";
> {code}
> One row is returned as expected
> 4. However, with the combination of index hint and limit clause, the query result is incorrect.
> {code}
> select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'small' and a3 = "Yellow";
> {code}
> Zero row is returned. The expected result is 1 row (i.e. the second row in the example).
> *The explain plan of the problematic query* 
> {code}
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | CLIENT 11-CHUNK PARALLEL 1-WAY ... MY_TABLE |
> |     SERVER FILTER BY A.A3 = 'Yellow' |
> | CLIENT 1 ROW LIMIT                       |
> |     SKIP-SCAN-JOIN TABLE 0               |
> |         CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_INDEX [[52,48,48,45,69,79,84,45,48,48,57],23] |
> |             SERVER FILTER BY FIRST KEY ONLY |
> |             SERVER 1 ROW LIMIT           |
> |         CLIENT 1 ROW LIMIT               |
> |     DYNAMIC SERVER FILTER BY ("MY_ROW_KEY") IN (($148.$150, $1 |
> |     JOIN-SCANNER 1 ROW LIMIT             |
> +------------------------------------------+
> {code}
>  It looks like the query plan uses the index first.  Although all three rows match the index (a1 = '1' and 'a2 = 'Small'), but the intermediate result only has the first row due the "limit 1" clause. In this case, the first row (a3 = 'Red') is returned. Then the query engine tries to apply the (a3 = 'Yellow') to the intermediate result, and get zero row as the final result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)