You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chun Chang (JIRA)" <ji...@apache.org> on 2014/06/06 20:36:07 UTC

[jira] [Created] (DRILL-922) group by fails with csv file

Chun Chang created DRILL-922:
--------------------------------

             Summary: group by fails with csv file
                 Key: DRILL-922
                 URL: https://issues.apache.org/jira/browse/DRILL-922
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Operators
            Reporter: Chun Chang


#Fri Jun 06 10:06:50 PDT 2014
git.commit.id.abbrev=3db1d5a

Group by fails with csv type of data. It works with parquet. For example, I have the following csv data:

[root@qa-node120 ~]# cat jira.csv
1,a
2,b
3,ab
4,
5,abc
6,c
7,a
8,ab

The following query without group by works:

0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv`;
+------------+
|   EXPR$0   |
+------------+
| 36         |
+------------+

But if I add group by, it fails:

0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv` group by columns[1];
Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "f1ffdac3-f454-4ba9-95db-374658db3654"
endpoint {
  address: "qa-node117.qa.lab"
  user_port: 31010
  control_port: 31011
  data_port: 31012
}
error_type: 0
message: "Failure while running fragment. < NumberFormatException:[  ]"
]
Error: exception while executing query (state=,code=0)

But if I add a row limit, then it works:

0: jdbc:drill:schema=dfs> select columns[1], sum(cast(columns[0] as bigint)) from `jira.csv` where columns[0] <= 8 group by columns[1];
+------------+------------+
|   EXPR$0   |   EXPR$1   |
+------------+------------+
| b          | 2          |
| c          | 6          |
| a          | 8          |
|            | 4          |
| ab         | 11         |
| abc        | 5          |
+------------+------------+

It seems to me that group by scanner does not know where the end of the column is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)