You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2017/07/06 07:02:01 UTC

[jira] [Updated] (DRILL-5550) SELECT non-existent column produces empty required VARCHAR

     [ https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Khurram Faraaz updated DRILL-5550:
----------------------------------
    Component/s: Storage - Text & CSV

> SELECT non-existent column produces empty required VARCHAR
> ----------------------------------------------------------
>
>                 Key: DRILL-5550
>                 URL: https://issues.apache.org/jira/browse/DRILL-5550
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
>     TextFormatConfig csvFormat = new TextFormatConfig();
>     csvFormat.fieldDelimiter = ',';
>     csvFormat.skipFirstLine = false;
>     csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather than nullable and NULL. In SQL, NULL means that the data is unknown, which is the case here.
> In the future, we may want to use some other indication for a missing column. Until then, the requested change is to make the type of a missing CSV column a nullable VARCHAR set to value NULL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)