You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2014/09/26 00:01:34 UTC
[jira] [Assigned] (DRILL-1408) SELECT column from CSV with JOIN
returns null if not part of JOIN condition
[ https://issues.apache.org/jira/browse/DRILL-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanifi Gunes reassigned DRILL-1408:
-----------------------------------
Assignee: Hanifi Gunes
> SELECT column from CSV with JOIN returns null if not part of JOIN condition
> ---------------------------------------------------------------------------
>
> Key: DRILL-1408
> URL: https://issues.apache.org/jira/browse/DRILL-1408
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 0.5.0
> Environment: CentOS 6.5, MapR M7
> Reporter: Mike Davis
> Assignee: Hanifi Gunes
>
> A SELECT for a column from a CSV file with a JOIN condition always returns null for columns not included in the JOIN condition. When querying each table separately, the values are returned as expected.
> Note that this works fine for any combination of JSON and Parquet, but fails when at least one of the files is CSV.
> Simple example with two small CSV files:
> beatles.csv:
> ---------------------------
> 1,John,Lennon
> 2,Paul,McCartney
> 3,George,Harrison
> 4,Ringo,Starr
> ----------------------------
> songs.csv:
> ----------------------------
> 1,Help
> 2,Yesterday
> 3,Blue Jay Way
> 4,Yellow Submarine
> ----------------------------
> This queries returns values as expected:
> SELECT columns[0] AS id, CONCAT(columns[1], ' ', columns[2]) AS singer FROM dfs.`beatles.csv`;
> This query returns 4 results, all with null values:
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];
> The only columns that return non-null values are the ones from the JOIN condition (B.columns[0] and S.columns[0] in the following query):
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer, S.columns[0] AS beatles_id, B.columns[0] AS id
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)