You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2018/10/30 11:42:00 UTC
[jira] [Updated] (DRILL-5550) SELECT non-existent column produces
empty required VARCHAR
[ https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalii Diravka updated DRILL-5550:
-----------------------------------
Fix Version/s: Future
> SELECT non-existent column produces empty required VARCHAR
> ----------------------------------------------------------
>
> Key: DRILL-5550
> URL: https://issues.apache.org/jira/browse/DRILL-5550
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Priority: Minor
> Fix For: Future
>
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
> TextFormatConfig csvFormat = new TextFormatConfig();
> csvFormat.fieldDelimiter = ',';
> csvFormat.skipFirstLine = false;
> csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather than nullable and NULL. In SQL, NULL means that the data is unknown, which is the case here.
> In the future, we may want to use some other indication for a missing column. Until then, the requested change is to make the type of a missing CSV column a nullable VARCHAR set to value NULL.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)