You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2014/09/26 00:01:34 UTC

[jira] [Assigned] (DRILL-1408) SELECT column from CSV with JOIN returns null if not part of JOIN condition

     [ https://issues.apache.org/jira/browse/DRILL-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hanifi Gunes reassigned DRILL-1408:
-----------------------------------

    Assignee: Hanifi Gunes

> SELECT column from CSV with JOIN returns null if not part of JOIN condition
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1408
>                 URL: https://issues.apache.org/jira/browse/DRILL-1408
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.5.0
>         Environment: CentOS 6.5, MapR M7
>            Reporter: Mike Davis
>            Assignee: Hanifi Gunes
>
> A SELECT for a column from a CSV file with a JOIN condition always returns null for columns not included in the JOIN condition.  When querying each table separately, the values are returned as expected.
> Note that this works fine for any combination of JSON and Parquet, but fails when at least one of the files is CSV.
> Simple example with two small CSV files:
> beatles.csv:
> ---------------------------
> 1,John,Lennon
> 2,Paul,McCartney
> 3,George,Harrison
> 4,Ringo,Starr
> ----------------------------
> songs.csv:
> ----------------------------
> 1,Help
> 2,Yesterday
> 3,Blue Jay Way
> 4,Yellow Submarine
> ----------------------------
> This queries returns values as expected:
> SELECT columns[0] AS id, CONCAT(columns[1], ' ', columns[2]) AS singer FROM dfs.`beatles.csv`;
> This query returns 4 results, all with null values:
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer 
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];
> The only columns that return non-null values are the ones from the JOIN condition (B.columns[0] and S.columns[0] in the following query):
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer, S.columns[0] AS beatles_id, B.columns[0] AS id 
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)