You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chun Chang (JIRA)" <ji...@apache.org> on 2014/09/11 01:32:33 UTC
[jira] [Created] (DRILL-1401) join on csv files does not work
Chun Chang created DRILL-1401:
---------------------------------
Summary: join on csv files does not work
Key: DRILL-1401
URL: https://issues.apache.org/jira/browse/DRILL-1401
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 0.5.0
Reporter: Chun Chang
#Wed Sep 10 13:35:05 PDT 2014
git.commit.id.abbrev=686eb9e
join (inner,full outer,left,right) does not work if the join is directly applied to csv files. It will produce either null values on the projected columns or NumberFormatException depends on joining condition applied on which columns (first or second columns). If you create views on the csv files, then join works on the views.
For example, the following join give null values:
0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[0] as int) = cast(`join_100r.csv`.columns[0] as int);
+------------+------------+------------+
| EXPR$0 | EXPR$1 | EXPR$2 |
+------------+------------+------------+
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 0 | null | null |
| 1 | null | null |
| 1 | null | null |
| 1 | null | null |
| 1 | null | null |
The following give NumberFormatException:
0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[1] as int) = cast(`join_100r.csv`.columns[1] as int);
Query failed: Failure while running fragment. [3f67299e-f312-445b-8d6b-74984a820f0c]
Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0)
The following with views works:
0: jdbc:drill:schema=dfs> select `aggregate_100r_v`.c0, `aggregate_100r_v`.c1, `join_100r_v`.c1 from `aggregate_100r_v` inner join `join_100r_v` on `aggregate_100r_v`.c0 = `join_100r_v`.c0;
+------------+------------+------------+
| c0 | c1 | c10 |
+------------+------------+------------+
| 0 | 0 | 0 |
| 0 | 0 | 2 |
| 0 | 0 | 1 |
| 0 | 0 | 0 |
| 0 | 0 | 0 |
| 0 | 0 | 2 |
| 0 | 0 | 1 |
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 0 | 1 | 2 |
| 0 | 1 | 1 |
| 0 | 1 | 0 |
| 0 | 2 | 0 |
| 0 | 2 | 2 |
| 0 | 2 | 1 |
| 0 | 2 | 0 |
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 1 | 1 |
| 1 | 1 | 0 |
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)