You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/09 05:03:04 UTC

[jira] [Created] (DRILL-5491) NPE when reading a CSV file, with headers, but blank header line

Paul Rogers created DRILL-5491:
----------------------------------

             Summary: NPE when reading a CSV file, with headers, but blank header line
                 Key: DRILL-5491
                 URL: https://issues.apache.org/jira/browse/DRILL-5491
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


See DRILL-5490 for background.

Try this unit test case:

{code}
    FixtureBuilder builder = ClusterFixture.builder()
        .maxParallelization(1);

    try (ClusterFixture cluster = builder.build();
         ClientFixture client = cluster.clientFixture()) {
      TextFormatConfig csvFormat = new TextFormatConfig();
      csvFormat.fieldDelimiter = ',';
      csvFormat.skipFirstLine = false;
      csvFormat.extractHeader = true;
      cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
      String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
      client.queryBuilder().sql(sql).printCsv();
    }
  }
{code}

The test can also be run as a query using your favorite client.

Using this input file:

{code}

a,b,c
d,e,f
{code}

(The first line is blank.)

The following is the result:

{code}
Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException
{code}

The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490) to detect this case.

The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:

{code}
    String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
{code}

Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:

{code}
  public String [] getTextOutput () throws ExecutionSetupException {
    if (recordCount == 0 || fieldIndex == -1) {
      return null;
    }

    if (this.recordStart != characterData) {
      throw new ExecutionSetupException("record text was requested before finishing record");
    }
{code}

Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the {{recordCount}}.  (BTW: {{recordCount}} is the total across-batch count, probably the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work.

The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third ({{recordStart}} is not set correctly so the exception would not be thrown.)

All that bad code is just fun and games until we get an NPE, however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)