You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/29 22:29:04 UTC
[jira] [Created] (DRILL-5551) `columns` changes meaning for CSV
files depending on query
Paul Rogers created DRILL-5551:
----------------------------------
Summary: `columns` changes meaning for CSV files depending on query
Key: DRILL-5551
URL: https://issues.apache.org/jira/browse/DRILL-5551
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor
Drill's CSV column reader supports two forms of files:
* Files with column headers as the first line of the file.
* Files without column headers.
The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.
Suppose we have a CSV file with headers:
{code}
a,b,c
10,foo,bar
{code}
Suppose we configure a storage plugin to use headers:
{code}
TextFormatConfig csvFormat = new TextFormatConfig();
csvFormat.fieldDelimiter = ',';
csvFormat.skipFirstLine = false;
csvFormat.extractHeader = true;
{code}
(The above can also be done using JSON when running Drill as a server.)
Suppose we execute this query:
{code}
SELECT columns FROM `dfs.data.example.csv`
{code}
The result is a single column, the special {{columns}} array, that contains all three fields.
Suppose we alter the query just a bit:
{code}
SELECT columns, a FROM `dfs.data.example.csv`
{code}
Now the result set is two non-nullable Varchar columns:
{code}
columns,a
,10
{code}
It seems that the meaning of `columns` shifts depending on whether the value appears by itself or as part of a SELECT list.
Perhaps this handles the case of a file such as:
{code}
columns,values
a;b,10;10
c;d,20;30
{code}
That is fine. but what if I just wanted the first column:
{code}
SELECT columns FROM `dfs.data.strange.csv`
{code}
How would the code know if {{columns}} was the special column vs. the normal column called "columns"?
Perhaps one long-term solution is to make {{columns}} into a table function (as has been proposed for the implicit columns):
{code}
SELECT columns(t) FROM `dfs.data.strange.csv` AS t
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)