You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/29 21:14:05 UTC

[jira] [Created] (DRILL-5548) SELECT * against an empty CSV file with headers produces error

Paul Rogers created DRILL-5548:
----------------------------------

Summary: SELECT * against an empty CSV file with headers produces error
Key: DRILL-5548
URL: https://issues.apache.org/jira/browse/DRILL-5548
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor

Drill's CSV column reader supports two forms of files:

* Files with column headers as the first line of the file.
* Files without column headers.

The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.

Suppose we have a empty file. When queried in the CSV configuration without headers, the query works. The schema returned is the {{columns}} Varchar array, and the results contain no rows. Good.

Now, query the same file with the CSV plugin configured to use headers.

{code}
TextFormatConfig csvFormat = new TextFormatConfig();
csvFormat.fieldDelimiter = ',';
csvFormat.skipFirstLine = false;
csvFormat.extractHeader = true;
{code}

(The above can also be done using JSON when running Drill as a server.)

We get the following exception:

{code}
org.apache.drill.common.exceptions.UserRemoteException:
SYSTEM ERROR: IllegalStateException:
Incoming batch [#4, ProjectRecordBatch] has an empty schema.
This is not allowed.
{code}

This particular case is a bit tricky. First, we want headers, but there are none. We can interpret this as an error (a file with headers must have headers). Or, we an treat it as a file that happens to have no columns. The latter choice is a bit more general.

The file also has no data rows. This could be an error, or it too could just be treated as a result set of zero rows.

Combined, the result set is one with no columns and no rows: an empty result set. This is actually a valid (if not very useful) result in SQL.

Conversation with Jinfeng suggested that, in such a scenario, the reader is supposed to make up a dummy column so that the result is not empty. While this is a workaround, it seems to just push the problem from the Project operator into each of many record readers.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)