You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chun Chang (JIRA)" <ji...@apache.org> on 2015/04/28 20:15:05 UTC
[jira] [Closed] (DRILL-2083) order by on large dataset returns wrong results

     [ https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chun Chang closed DRILL-2083.
-----------------------------
    Assignee: Chun Chang  (was: Steven Phillips)

verified fix

{code}
0: jdbc:drill:schema=dfs.drillTestDirAdvanced> select count(*) from (select t.id from `complex.json` t order by t.id);
+------------+
|   EXPR$0   |
+------------+
| 1000000    |
+------------+
1 row selected (12.89 seconds)
0: jdbc:drill:schema=dfs.drillTestDirAdvanced> select * from sys.version;
+------------+----------------+-------------+-------------+------------+
| commit_id  | commit_message | commit_time | build_email | build_time |
+------------+----------------+-------------+-------------+------------+
| 57a96d200e12c0efcad3f3ca9d935c42647234b1 | DRILL-2083: Fix bug in merging receiver | 27.04.2015 @ 17:12:13 EDT | Unknown     | 27.04.2015 @ 23:22:07 EDT |
+------------+----------------+-------------+-------------+------------+
{code}

test case exist complex112.q

> order by on large dataset returns wrong results
> -----------------------------------------------
>
>                 Key: DRILL-2083
>                 URL: https://issues.apache.org/jira/browse/DRILL-2083
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types, Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Chun Chang
>            Assignee: Chun Chang
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: DRILL-2083.patch
>
>
> #Mon Jan 26 14:10:51 PST 2015
> git.commit.id.abbrev=3c6d0ef
> Test data has 1 million rows and can be accessed at 
> http://apache-drill.s3.amazonaws.com/files/complex.json.gz
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count (t.id) from `complex.json` t;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1000000    |
> +------------+
> {code}
> But order by returned 30 more rows.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.id from `complex.json` t order by t.id;
> ....
> | 999997     |
> | 999998     |
> | 999999     |
> | 1000000    |
> +------------+
> 1,000,030 rows selected (19.449 seconds)
> {code}
> physical plan
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select t.id from `complex.json` t order by t.id;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      SingleMergeExchange(sort0=[0 ASC])
> 01-01        SelectionVectorRemover
> 01-02          Sort(sort0=[$0], dir0=[ASC])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`id`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)