You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/29 21:29:04 UTC
[jira] [Commented] (DRILL-5548) SELECT * against an empty CSV file
with headers produces error
[ https://issues.apache.org/jira/browse/DRILL-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028631#comment-16028631 ]
Paul Rogers commented on DRILL-5548:
------------------------------------
See DRILL-5549. Drill has a problem with CSV files with headers when the header line is empty. The {{columns}} solution can fix both cases.
> SELECT * against an empty CSV file with headers produces error
> --------------------------------------------------------------
>
> Key: DRILL-5548
> URL: https://issues.apache.org/jira/browse/DRILL-5548
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Priority: Minor
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.
> Suppose we have a empty file. When queried in the CSV configuration without headers, the query works. The schema returned is the {{columns}} Varchar array, and the results contain no rows. Good.
> Now, query the same file with the CSV plugin configured to use headers.
> {code}
> TextFormatConfig csvFormat = new TextFormatConfig();
> csvFormat.fieldDelimiter = ',';
> csvFormat.skipFirstLine = false;
> csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> We get the following exception:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException:
> SYSTEM ERROR: IllegalStateException:
> Incoming batch [#4, ProjectRecordBatch] has an empty schema.
> This is not allowed.
> {code}
> This particular case is a bit tricky. First, we want headers, but there are none. We can interpret this as an error (a file with headers must have headers). Or, we an treat it as a file that happens to have no columns. The latter choice is a bit more general.
> The file also has no data rows. This could be an error, or it too could just be treated as a result set of zero rows.
> Combined, the result set is one with no columns and no rows: an empty result set. This is actually a valid (if not very useful) result in SQL.
> Conversation with Jinfeng suggested that, in such a scenario, the reader is supposed to make up a dummy column so that the result is not empty. While this is a workaround, it seems to just push the problem from the Project operator into each of many record readers.
> Another alternative is to revert to the {{columns}} column: generate a result set with the {{columns}} array, but with no data. This solution avoids the empty batch problem.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)