You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Arina Ielchiieva (Jira)" <ji...@apache.org> on 2019/10/11 10:52:00 UTC
[jira] [Resolved] (DRILL-5491) NPE when reading a CSV file, with headers, but blank header line

     [ https://issues.apache.org/jira/browse/DRILL-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arina Ielchiieva resolved DRILL-5491.
-------------------------------------
    Resolution: Fixed

> NPE when reading a CSV file, with headers, but blank header line
> ----------------------------------------------------------------
>
>                 Key: DRILL-5491
>                 URL: https://issues.apache.org/jira/browse/DRILL-5491
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.17.0
>
>
> See DRILL-5490 for background.
> Try this unit test case:
> {code}
>     FixtureBuilder builder = ClusterFixture.builder()
>         .maxParallelization(1);
>     try (ClusterFixture cluster = builder.build();
>          ClientFixture client = cluster.clientFixture()) {
>       TextFormatConfig csvFormat = new TextFormatConfig();
>       csvFormat.fieldDelimiter = ',';
>       csvFormat.skipFirstLine = false;
>       csvFormat.extractHeader = true;
>       cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
>       String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
>       client.queryBuilder().sql(sql).printCsv();
>     }
>   }
> {code}
> The test can also be run as a query using your favorite client.
> Using this input file:
> {code}
> a,b,c
> d,e,f
> {code}
> (The first line is blank.)
> The following is the result:
> {code}
> Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: NullPointerException
> {code}
> The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490) to detect this case.
> The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:
> {code}
>     String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
> {code}
> Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:
> {code}
>   public String [] getTextOutput () throws ExecutionSetupException {
>     if (recordCount == 0 || fieldIndex == -1) {
>       return null;
>     }
>     if (this.recordStart != characterData) {
>       throw new ExecutionSetupException("record text was requested before finishing record");
>     }
> {code}
> Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the {{recordCount}}.  (BTW: {{recordCount}} is the total across-batch count, probably the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.
> But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work.
> The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third ({{recordStart}} is not set correctly so the exception would not be thrown.)
> All that bad code is just fun and games until we get an NPE, however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)