You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2015/06/23 16:16:00 UTC

[jira] [Commented] (DRILL-3325) Explicitly specified default window frame throws an error requiring order by

    [ https://issues.apache.org/jira/browse/DRILL-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597698#comment-14597698 ] 

Deneche A. Hakim commented on DRILL-3325:
-----------------------------------------

On the latest master I'm getting some really misleading error messages, I am reporting them here as they seem related to this issue:

When the ORDER BY clause is missing, the following queries fail with the same error message:
{noformat}
SELECT SUM(salary) OVER(PARTITION BY position_id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) FROM cp.`employee.json`;
Jun 23, 2015 6:57:59 AM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Window specification must contain an ORDER BY clause
Jun 23, 2015 6:57:59 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 37 to line 1, column 113: Window specification must contain an ORDER BY clause
Error: PARSE ERROR: From line 1, column 37 to line 1, column 113: Window specification must contain an ORDER BY clause

[Error Id: bf6f0c94-3254-44c9-b32b-7d769774ce34 on 172.30.1.107:31010] (state=,code=0)
{noformat}

{noformat}
SELECT SUM(salary) OVER(PARTITION BY position_id RANGE UNBOUNDED PRECEDING) FROM cp.`employee.json`;
Jun 23, 2015 7:01:05 AM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Window specification must contain an ORDER BY clause
Jun 23, 2015 7:01:05 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 24 to line 1, column 76: Window specification must contain an ORDER BY clause
Error: PARSE ERROR: From line 1, column 24 to line 1, column 76: Window specification must contain an ORDER BY clause

[Error Id: 6ebfef24-27a7-448b-8f2a-5c0ae77612a8 on 172.30.1.107:31010] (state=,code=0)
0: jdbc:drill:zk=local> 
{noformat}

{noformat}
SELECT SUM(salary) OVER(PARTITION BY position_id RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) from cp.`employee.json`;
Jun 23, 2015 7:09:13 AM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Window specification must contain an ORDER BY clause
Jun 23, 2015 7:09:13 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 24 to line 1, column 108: Window specification must contain an ORDER BY clause
Error: PARSE ERROR: From line 1, column 24 to line 1, column 108: Window specification must contain an ORDER BY clause

[Error Id: 1fc6fc0a-9a6d-4147-8098-d7251675753c on 172.30.1.107:31010] (state=,code=0)
{noformat}

Please note that in this case (missing ORDER By clause) all three frame definitions are equivalent to the default frame, and thus should be valid. All three frame definitions should not require an ORDER BY. The same queries work fine in Postgress.

It's interesting to note that the following query, which define the default frame using ROWS, actually works fine in Drill:
{noformat}
SELECT SUM(salary) OVER(PARTITION BY position_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM cp.`employee.json`;
{noformat}
 

> Explicitly specified default window frame throws an error requiring order by
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-3325
>                 URL: https://issues.apache.org/jira/browse/DRILL-3325
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Sean Hsuan-Yi Chu
>              Labels: window_function
>
> Calcite requires explicit ORDER BY clause when "default" frame ("UNBOUNDED PRECEDING AND CURRENT ROW")  is specified:
> {code}
> 0: jdbc:drill:schema=dfs> select sum(a1) over(partition by b1 range between unbounded preceding and unbounded following ) from t1;
> Error: PARSE ERROR: From line 1, column 20 to line 1, column 95: Window specification must contain an ORDER BY clause
> [Error Id: 446c7fd3-f588-4832-b79d-08cec610ff24 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> I thought that we decided to make query above to be equivalent to:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select sum(a1) over(partition by b1) from t1;
> {code}
> Explain plan (notice frame and an empty "order by"):
> {code}
> | 00-00    Screen
> 00-01      Project(EXPR$0=[CASE(>($2, 0), CAST($3):ANY, null)])
> 00-02        Window(window#0=[window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), $SUM0($0)])])
> 00-03          SelectionVectorRemover
> 00-04            Sort(sort0=[$1], dir0=[ASC])
> 00-05              Project(a1=[$1], b1=[$0])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t1]], selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`a1`, `b1`]]])
> {code}
> If I add order by, I get an error. Because based on our design, see DRILL-3188, order by only allows frames:  "RANGE UNBOUNDED PRECEDING" and  "RANGE BETEWEEN UNBOUNDED PRECEDING AND CURRENT ROW"
> {code}
> 0: jdbc:drill:schema=dfs> select sum(a1) over(partition by b1 order by c1 range between unbounded preceding and unbounded following ) from t1;
> Error: UNSUPPORTED_OPERATION ERROR: This type of window frame is currently not supported 
> See Apache Drill JIRA: DRILL-3188
> [Error Id: 7b2f1e39-0ad2-4584-aa4c-bdace84adfe4 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> "{color:red}ROWS{color} BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" works:
> {code}
> 0: jdbc:drill:schema=dfs> select sum(a1) over(partition by b1 rows between unbounded preceding and unbounded following ) from t1;
> +---------+
> | EXPR$0  |
> +---------+
> | 1       |
> | 2       |
> | 3       |
> | 5       |
> | 6       |
> | 7       |
> | null    |
> | 9       |
> | 10      |
> | 4       |
> +---------+
> 10 rows selected (0.312 seconds)
> {code}
> explain plan: notice empty "order by" as well.
> {code}
> 00-01      Project(EXPR$0=[CASE(>($2, 0), CAST($3):ANY, null)])
> 00-02        Window(window#0=[window(partition {1} order by [] rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), $SUM0($0)])])
> 00-03          SelectionVectorRemover
> 00-04            Sort(sort0=[$1], dir0=[ASC])
> 00-05              Project(a1=[$1], b1=[$0])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t1]], selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`a1`, `b1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)