You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "alex (JIRA)" <ji...@apache.org> on 2014/03/13 19:21:43 UTC

[jira] [Updated] (PHOENIX-846) Select DISTINCT with LIMIT does full scans

     [ https://issues.apache.org/jira/browse/PHOENIX-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

alex updated PHOENIX-846:
-------------------------

    Description: 
When running SELECT DISTINCT with LIMIT it does full scan and aggregation (no pageFilter/limit used on server side), 
this severely affects performance  (query returns in 20sec vs 300ms without DISTINCT)

: jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
+------------+
|    PLAN    |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
|     SERVER FILTER BY FIRST KEY ONLY |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+

-------------------------------------------------
for comparison SELECT without  DISTINCT uses a limit PageFilter=100 on server side and doesn't do full scan (query returns in 300ms)

explain select ROWKEY from TEST_1M LIMIT 100;
+------------+
|    PLAN    |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
|     SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+




  was:
When running SELECT DISTINCT with LIMIT it does full scan on the server side (no pageFilter/limit used on server side), 
this severely affects performance  (query returns in 20sec vs 300ms without DISTINCT)

: jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
+------------+
|    PLAN    |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
|     SERVER FILTER BY FIRST KEY ONLY |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+

-------------------------------------------------
for comparison SELECT without  DISTINCT uses a limit PageFilter=100 on server side and doesn't do full scan (query returns in 300ms)

explain select ROWKEY from TEST_1M LIMIT 100;
+------------+
|    PLAN    |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
|     SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+





> Select DISTINCT with LIMIT does full scans
> ------------------------------------------
>
>                 Key: PHOENIX-846
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-846
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: alex
>            Priority: Critical
>
> When running SELECT DISTINCT with LIMIT it does full scan and aggregation (no pageFilter/limit used on server side), 
> this severely affects performance  (query returns in 20sec vs 300ms without DISTINCT)
> : jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
> +------------+
> |    PLAN    |
> +------------+
> | CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
> |     SERVER FILTER BY FIRST KEY ONLY |
> |     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
> | CLIENT MERGE SORT |
> | CLIENT 100 ROW LIMIT |
> +------------+
> -------------------------------------------------
> for comparison SELECT without  DISTINCT uses a limit PageFilter=100 on server side and doesn't do full scan (query returns in 300ms)
> explain select ROWKEY from TEST_1M LIMIT 100;
> +------------+
> |    PLAN    |
> +------------+
> | CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
> |     SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
> | CLIENT MERGE SORT |
> | CLIENT 100 ROW LIMIT |
> +------------+



--
This message was sent by Atlassian JIRA
(v6.2#6252)