You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chun Chang (JIRA)" <ji...@apache.org> on 2014/06/06 20:36:07 UTC
[jira] [Created] (DRILL-922) group by fails with csv file
Chun Chang created DRILL-922:
--------------------------------
Summary: group by fails with csv file
Key: DRILL-922
URL: https://issues.apache.org/jira/browse/DRILL-922
Project: Apache Drill
Issue Type: Bug
Components: Execution - Operators
Reporter: Chun Chang
#Fri Jun 06 10:06:50 PDT 2014
git.commit.id.abbrev=3db1d5a
Group by fails with csv type of data. It works with parquet. For example, I have the following csv data:
[root@qa-node120 ~]# cat jira.csv
1,a
2,b
3,ab
4,
5,abc
6,c
7,a
8,ab
The following query without group by works:
0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv`;
+------------+
| EXPR$0 |
+------------+
| 36 |
+------------+
But if I add group by, it fails:
0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv` group by columns[1];
Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "f1ffdac3-f454-4ba9-95db-374658db3654"
endpoint {
address: "qa-node117.qa.lab"
user_port: 31010
control_port: 31011
data_port: 31012
}
error_type: 0
message: "Failure while running fragment. < NumberFormatException:[ ]"
]
Error: exception while executing query (state=,code=0)
But if I add a row limit, then it works:
0: jdbc:drill:schema=dfs> select columns[1], sum(cast(columns[0] as bigint)) from `jira.csv` where columns[0] <= 8 group by columns[1];
+------------+------------+
| EXPR$0 | EXPR$1 |
+------------+------------+
| b | 2 |
| c | 6 |
| a | 8 |
| | 4 |
| ab | 11 |
| abc | 5 |
+------------+------------+
It seems to me that group by scanner does not know where the end of the column is.
--
This message was sent by Atlassian JIRA
(v6.2#6252)