You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/21 07:11:58 UTC
[jira] [Created] (DRILL-5144) Query with two identical windows
produces incorrect results
Paul Rogers created DRILL-5144:
----------------------------------
Summary: Query with two identical windows produces incorrect results
Key: DRILL-5144
URL: https://issues.apache.org/jira/browse/DRILL-5144
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Consider the following query (from {{TestWindowFrame.test4657}}):
{code}
select row_number() over(order by position_id) rn,
rank() over(order by position_id) rnk
from dfs_test.`%s/window/b3.p2`
{code}
The planner produces the following plan:
{code}
{
...
"graph" : [ {
"pop" : "fs-scan",
...
"columns" : [ "`position_id`" ],
...
}, {
"pop" : "external-sort",
"@id" : 8,
"child" : 9,
"orderings" : [ {
"order" : "ASC",
"expr" : "`position_id`",
"nullDirection" : "UNSPECIFIED"
} ],
...
}, {
"pop" : "selection-vector-remover",
...
}, {
"pop" : "window",
"@id" : 6,
"child" : 7,
"aggregations" : [ {
"ref" : "`w0$o0`",
"expr" : "row_number(1) "
} ],
"orderings" : [ {
"order" : "ASC",
"expr" : "`position_id`",
"nullDirection" : "UNSPECIFIED"
} ],
"frameUnitsRows" : true,
"start" : {
"unbounded" : true,
"offset" : -9223372036854775808
},
"end" : {
"unbounded" : false,
"offset" : 0
},
...
}, {
"pop" : "external-sort",
"@id" : 5,
"child" : 6,
"orderings" : [ {
"order" : "ASC",
"expr" : "`position_id`",
"nullDirection" : "UNSPECIFIED"
} ],
...
}, {
"pop" : "selection-vector-remover",
...
}, {
"pop" : "window",
...
"aggregations" : [ {
"ref" : "`w1$o0`",
"expr" : "rank(1) "
} ],
"orderings" : [ {
"order" : "ASC",
"expr" : "`position_id`",
"nullDirection" : "UNSPECIFIED"
} ],
"frameUnitsRows" : false,
"start" : {
"unbounded" : true,
"offset" : -9223372036854775808
},
"end" : {
"unbounded" : false,
"offset" : 0
},
...
}, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
"ref" : "`$0`",
"expr" : "`w0$o0`"
}, {
"ref" : "`$1`",
"expr" : "`w1$o0`"
} ],
...
}, {
"pop" : "project",
...
"exprs" : [ {
"ref" : "`rn`",
"expr" : "`$0`"
}, {
"ref" : "`rnk`",
"expr" : "`$1`"
} ],
...
}, {
"pop" : "screen",
...
} ]
}
{code}
Note that the plan sorts the input data on {{position_id}} twice. One would work just as well.
Note also that, since Drill's sort is unstable, the order of like keys can change between sorts; resulting in non-deterministic results (which show up in failures of the test mentioned earlier).
Indeed, earlier versions of the tests "fudged" the external sort to make it stable so that the tests would pass; but those tests hid the instability of this particular query. Without the fudge, results are incorrect:
{code}
rn, rnk
1,1
3,1
4,1
5,1
2,1 <-- Wrong results
31,6 <-- Wrong results
6,6
...
30,6
32,6
...
60,6
{code}
The expected results is the equivalent of the following (invalid) query:
{code}
select row_number() rn, rank() rnk
over(order by position_id)
from dfs_test.`%s/window/b3.p2`
{code}
{{TestWindowFrame.test4657}} will be disabled until a fix for this problem is available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)