You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Arthur (JIRA)" <ji...@apache.org> on 2017/03/02 09:15:45 UTC
[jira] [Comment Edited] (PHOENIX-3689) Not determinist order by with limit

    [ https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891859#comment-15891859 ] 

Arthur edited comment on PHOENIX-3689 at 3/2/17 9:14 AM:
---------------------------------------------------------

I'm using Phoenix 4.7.2. 
I encounter the issue with a more complete dataset. I have reproduced it with this simple generated dataset, so it's not an isolated issue.
With a little set of data, order by is fine because there is one region. I increased number of rows and I checked in hbase (scan 'hbase:meta',{FILTER=>"PrefixFilter('TT')"} ) when it was using more than one region. From this moment, order by returns bad result.

{noformat}
 explain select dt from TT group by dt order by dt  desc limit 1;
+---------------------------------------------------------------------------------------+
|                                         PLAN                                          |
+---------------------------------------------------------------------------------------+
| CLIENT 3-CHUNK 4915200 ROWS 314572800 BYTES PARALLEL 1-WAY REVERSE FULL SCAN OVER TT  |
|     SERVER FILTER BY FIRST KEY ONLY                                                   |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DT] LIMIT 1 GROUP                 |
| CLIENT MERGE SORT                                                                     |
| CLIENT 1 ROW LIMIT                                                                    |
+---------------------------------------------------------------------------------------+
{noformat}

{noformat}
explain select dt from TT order by dt  desc limit 1;
+-------------------------------------------------------------------------------------+
|                                        PLAN                                         |
+-------------------------------------------------------------------------------------+
| CLIENT 3-CHUNK 4915200 ROWS 314572800 BYTES SERIAL 1-WAY REVERSE FULL SCAN OVER TT  |
|     SERVER FILTER BY FIRST KEY ONLY                                                 |
|     SERVER 1 ROW LIMIT                                                              |
| CLIENT 1 ROW LIMIT                                                                  |
+-------------------------------------------------------------------------------------+
{noformat}


was (Author: arthurj):
I'm using Phoenix 4.7.2. 

{noformat}
 explain select dt from TT group by dt order by dt  desc limit 1;
+---------------------------------------------------------------------------------------+
|                                         PLAN                                          |
+---------------------------------------------------------------------------------------+
| CLIENT 3-CHUNK 4915200 ROWS 314572800 BYTES PARALLEL 1-WAY REVERSE FULL SCAN OVER TT  |
|     SERVER FILTER BY FIRST KEY ONLY                                                   |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DT] LIMIT 1 GROUP                 |
| CLIENT MERGE SORT                                                                     |
| CLIENT 1 ROW LIMIT                                                                    |
+---------------------------------------------------------------------------------------+
{noformat}

{noformat}
explain select dt from TT order by dt  desc limit 1;
+-------------------------------------------------------------------------------------+
|                                        PLAN                                         |
+-------------------------------------------------------------------------------------+
| CLIENT 3-CHUNK 4915200 ROWS 314572800 BYTES SERIAL 1-WAY REVERSE FULL SCAN OVER TT  |
|     SERVER FILTER BY FIRST KEY ONLY                                                 |
|     SERVER 1 ROW LIMIT                                                              |
| CLIENT 1 ROW LIMIT                                                                  |
+-------------------------------------------------------------------------------------+
{noformat}

> Not determinist order by with limit
> -----------------------------------
>
>                 Key: PHOENIX-3689
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3689
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>            Reporter: Arthur
>
> The following request does not return the last value of table TT:
> select * from TT order by dt desc limit 1;
> Adding a 'group by dt' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?
> Here is my DDL:
> {code}
> CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, id));
> {code}
> The issue occurs with a lot of data. I think the 'order by' clause is done by region and not for the whole result, so limit 1 returns the first region that answers and phoenix cache it. With only one region, this does not occur.
> This script generate enough data to throw the issue:
> {code}
> #!/usr/bin/python
> import string
> from datetime import datetime, timedelta
> dt = datetime(2017, 1, 1, 3)
> with open('data.csv', 'w') as file:
>         for i in range(0, 10000000):
>                 newdt = dt + timedelta(microseconds=i*10000)
>                 file.write("{};{};{};\n".format(datetime.strftime(newdt, "%Y-%m-%d %H:%M:%S.%f"), 91 if i  % 10  == 0 else 100, str(i).zfill(20)))
> {code}
> With this data set, the last data is : 2017-01-02 06:46:39.990000
> Result with order by clause is not the last value:
> {noformat}
> select dt from TT order by dt desc limit 1;
> +--------------------------+
> |            DT            |
> +--------------------------+
> | 2017-01-01 07:54:40.730  |
> {noformat}
> Correct result is given when using group by, but I need to get all columns.
> {noformat}
> select dt from TT group by dt order by dt  desc limit 1;
> +--------------------------+
> |            DT            |
> +--------------------------+
> | 2017-01-02 06:46:39.990  |
> +--------------------------+
> {noformat}
> I use a subquery as a workaroud, but performance are not good.
> {noformat}
> select * from TT where dt = ANY(select dt from TT group by dt order by dt desc limit 1);
> +--------------------------+----------+-----------------------+----------+
> |            DT            | MESSAGE  |          ID           | VERSION  |
> +--------------------------+----------+-----------------------+----------+
> | 2017-01-02 06:46:39.990  | 100      | 00000000000009999999  |          |
> +--------------------------+----------+-----------------------+----------+
> 1 row selected (8.393 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)