You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/30 11:48:00 UTC
[jira] [Commented] (DRILL-5550) SELECT non-existent column produces empty required VARCHAR

    [ https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668606#comment-16668606 ] 

ASF GitHub Bot commented on DRILL-5550:
---------------------------------------

vdiravka commented on issue #939: DRILL-5550: Missing CSV column value set to null
URL: https://github.com/apache/drill/pull/939#issuecomment-434269947
 
 
   @prasadns14 Could you please respond to Paul's review comments? [DRILL-6147](https://issues.apache.org/jira/browse/DRILL-6147) is resolved, so it can also be helpful to proceed with this issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> SELECT non-existent column produces empty required VARCHAR
> ----------------------------------------------------------
>
>                 Key: DRILL-5550
>                 URL: https://issues.apache.org/jira/browse/DRILL-5550
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text &amp; CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Prasad Nagaraj Subramanya
>            Priority: Minor
>             Fix For: Future
>
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
>     TextFormatConfig csvFormat = new TextFormatConfig();
>     csvFormat.fieldDelimiter = ',';
>     csvFormat.skipFirstLine = false;
>     csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather than nullable and NULL. In SQL, NULL means that the data is unknown, which is the case here.
> In the future, we may want to use some other indication for a missing column. Until then, the requested change is to make the type of a missing CSV column a nullable VARCHAR set to value NULL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)