You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Arina Ielchiieva (Jira)" <ji...@apache.org> on 2019/10/11 10:52:00 UTC
[jira] [Resolved] (DRILL-5491) NPE when reading a CSV file, with
headers, but blank header line
[ https://issues.apache.org/jira/browse/DRILL-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arina Ielchiieva resolved DRILL-5491.
-------------------------------------
Resolution: Fixed
> NPE when reading a CSV file, with headers, but blank header line
> ----------------------------------------------------------------
>
> Key: DRILL-5491
> URL: https://issues.apache.org/jira/browse/DRILL-5491
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.17.0
>
>
> See DRILL-5490 for background.
> Try this unit test case:
> {code}
> FixtureBuilder builder = ClusterFixture.builder()
> .maxParallelization(1);
> try (ClusterFixture cluster = builder.build();
> ClientFixture client = cluster.clientFixture()) {
> TextFormatConfig csvFormat = new TextFormatConfig();
> csvFormat.fieldDelimiter = ',';
> csvFormat.skipFirstLine = false;
> csvFormat.extractHeader = true;
> cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
> String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
> client.queryBuilder().sql(sql).printCsv();
> }
> }
> {code}
> The test can also be run as a query using your favorite client.
> Using this input file:
> {code}
> a,b,c
> d,e,f
> {code}
> (The first line is blank.)
> The following is the result:
> {code}
> Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException:
> SYSTEM ERROR: NullPointerException
> {code}
> The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490) to detect this case.
> The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:
> {code}
> String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
> {code}
> Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:
> {code}
> public String [] getTextOutput () throws ExecutionSetupException {
> if (recordCount == 0 || fieldIndex == -1) {
> return null;
> }
> if (this.recordStart != characterData) {
> throw new ExecutionSetupException("record text was requested before finishing record");
> }
> {code}
> Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the {{recordCount}}. (BTW: {{recordCount}} is the total across-batch count, probably the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.
> But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work.
> The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third ({{recordStart}} is not set correctly so the exception would not be thrown.)
> All that bad code is just fun and games until we get an NPE, however.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)