You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/21 07:11:58 UTC

[jira] [Created] (DRILL-5144) Query with two identical windows produces incorrect results

Paul Rogers created DRILL-5144:
----------------------------------

             Summary: Query with two identical windows produces incorrect results
                 Key: DRILL-5144
                 URL: https://issues.apache.org/jira/browse/DRILL-5144
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


Consider the following query (from {{TestWindowFrame.test4657}}):

{code}
select row_number() over(order by position_id) rn, 
        rank() over(order by position_id) rnk
from dfs_test.`%s/window/b3.p2`
{code}

The planner produces the following plan:

{code}
{
  ...
  "graph" : [ {
    "pop" : "fs-scan",
    ...
    "columns" : [ "`position_id`" ],
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 8,
    "child" : 9,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    "@id" : 6,
    "child" : 7,
    "aggregations" : [ {
      "ref" : "`w0$o0`",
      "expr" : "row_number(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : true,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 5,
    "child" : 6,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    ...
    "aggregations" : [ {
      "ref" : "`w1$o0`",
      "expr" : "rank(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : false,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "project",
    "@id" : 2,
    "exprs" : [ {
      "ref" : "`$0`",
      "expr" : "`w0$o0`"
    }, {
      "ref" : "`$1`",
      "expr" : "`w1$o0`"
    } ],
    ...
  }, {
    "pop" : "project",
    ...
    "exprs" : [ {
      "ref" : "`rn`",
      "expr" : "`$0`"
    }, {
      "ref" : "`rnk`",
      "expr" : "`$1`"
    } ],
    ...
  }, {
    "pop" : "screen",
    ...
  } ]
}
{code}

Note that the plan sorts the input data on {{position_id}} twice. One would work just as well.

Note also that, since Drill's sort is unstable, the order of like keys can change between sorts; resulting in non-deterministic results (which show up in failures of the test mentioned earlier).

Indeed, earlier versions of the tests "fudged" the external sort to make it stable so that the tests would pass; but those tests hid the instability of this particular query. Without the fudge, results are incorrect:

{code}
rn, rnk
1,1
3,1
4,1
5,1
2,1 <-- Wrong results
31,6 <-- Wrong results
6,6
...
30,6
32,6
...
60,6
{code}

The expected results is the equivalent of the following (invalid) query:

{code}
select row_number() rn, rank() rnk
         over(order by position_id)
from dfs_test.`%s/window/b3.p2`
{code}

{{TestWindowFrame.test4657}} will be disabled until a fix for this problem is available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)