You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chun Chang (JIRA)" <ji...@apache.org> on 2014/06/06 20:38:02 UTC
[jira] [Commented] (DRILL-922) group by fails with csv file

    [ https://issues.apache.org/jira/browse/DRILL-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020195#comment-14020195 ] 

Chun Chang commented on DRILL-922:
----------------------------------

java.lang.NumberFormatException: 
	org.apache.drill.exec.test.generated.ProjectorGen0.doEval(ProjectorTemplate.java:56) ~[na:na]
	org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords(ProjectorTemplate.java:66) ~[na:na]
	org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:95) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:71) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.next(ProjectRecordBatch.java:83) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:111) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.next(PartitionSenderRootExec.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:98) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
	java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]


> group by fails with csv file
> ----------------------------
>
>                 Key: DRILL-922
>                 URL: https://issues.apache.org/jira/browse/DRILL-922
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Operators
>            Reporter: Chun Chang
>
> #Fri Jun 06 10:06:50 PDT 2014
> git.commit.id.abbrev=3db1d5a
> Group by fails with csv type of data. It works with parquet. For example, I have the following csv data:
> [root@qa-node120 ~]# cat jira.csv
> 1,a
> 2,b
> 3,ab
> 4,
> 5,abc
> 6,c
> 7,a
> 8,ab
> The following query without group by works:
> 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 36         |
> +------------+
> But if I add group by, it fails:
> 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from `jira.csv` group by columns[1];
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "f1ffdac3-f454-4ba9-95db-374658db3654"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NumberFormatException:[  ]"
> ]
> Error: exception while executing query (state=,code=0)
> But if I add a row limit, then it works:
> 0: jdbc:drill:schema=dfs> select columns[1], sum(cast(columns[0] as bigint)) from `jira.csv` where columns[0] <= 8 group by columns[1];
> +------------+------------+
> |   EXPR$0   |   EXPR$1   |
> +------------+------------+
> | b          | 2          |
> | c          | 6          |
> | a          | 8          |
> |            | 4          |
> | ab         | 11         |
> | abc        | 5          |
> +------------+------------+
> It seems to me that group by scanner does not know where the end of the column is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)