You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/04/22 21:02:01 UTC

[jira] [Commented] (DRILL-2083) order by on large dataset returns wrong results

    [ https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507681#comment-14507681 ] 

Steven Phillips commented on DRILL-2083:
----------------------------------------

This is a bug in MergingReceiver. When we reach a batch boundary on the outgoing batch, the last record copied gets copied again on the next batch.

> order by on large dataset returns wrong results
> -----------------------------------------------
>
>                 Key: DRILL-2083
>                 URL: https://issues.apache.org/jira/browse/DRILL-2083
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types, Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Chun Chang
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> #Mon Jan 26 14:10:51 PST 2015
> git.commit.id.abbrev=3c6d0ef
> Test data has 1 million rows and can be accessed at 
> http://apache-drill.s3.amazonaws.com/files/complex.json.gz
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count (t.id) from `complex.json` t;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1000000    |
> +------------+
> {code}
> But order by returned 30 more rows.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.id from `complex.json` t order by t.id;
> ....
> | 999997     |
> | 999998     |
> | 999999     |
> | 1000000    |
> +------------+
> 1,000,030 rows selected (19.449 seconds)
> {code}
> physical plan
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select t.id from `complex.json` t order by t.id;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      SingleMergeExchange(sort0=[0 ASC])
> 01-01        SelectionVectorRemover
> 01-02          Sort(sort0=[$0], dir0=[ASC])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`id`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)