You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/09 05:03:04 UTC
[jira] [Created] (DRILL-5491) NPE when reading a CSV file, with
headers, but blank header line
Paul Rogers created DRILL-5491:
----------------------------------
Summary: NPE when reading a CSV file, with headers, but blank header line
Key: DRILL-5491
URL: https://issues.apache.org/jira/browse/DRILL-5491
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
See DRILL-5490 for background.
Try this unit test case:
{code}
FixtureBuilder builder = ClusterFixture.builder()
.maxParallelization(1);
try (ClusterFixture cluster = builder.build();
ClientFixture client = cluster.clientFixture()) {
TextFormatConfig csvFormat = new TextFormatConfig();
csvFormat.fieldDelimiter = ',';
csvFormat.skipFirstLine = false;
csvFormat.extractHeader = true;
cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
client.queryBuilder().sql(sql).printCsv();
}
}
{code}
The test can also be run as a query using your favorite client.
Using this input file:
{code}
a,b,c
d,e,f
{code}
(The first line is blank.)
The following is the result:
{code}
Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException:
SYSTEM ERROR: NullPointerException
{code}
The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490) to detect this case.
The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:
{code}
String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
{code}
Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:
{code}
public String [] getTextOutput () throws ExecutionSetupException {
if (recordCount == 0 || fieldIndex == -1) {
return null;
}
if (this.recordStart != characterData) {
throw new ExecutionSetupException("record text was requested before finishing record");
}
{code}
Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the {{recordCount}}. (BTW: {{recordCount}} is the total across-batch count, probably the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.
But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work.
The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third ({{recordStart}} is not set correctly so the exception would not be thrown.)
All that bad code is just fun and games until we get an NPE, however.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)