You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/07/06 05:32:00 UTC

[jira] [Created] (DRILL-5662) Compliant text reader (CSV) opens, closes, reopens file with headers

Paul Rogers created DRILL-5662:
----------------------------------

             Summary: Compliant text reader (CSV) opens, closes, reopens file with headers
                 Key: DRILL-5662
                 URL: https://issues.apache.org/jira/browse/DRILL-5662
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor
             Fix For: Future


The "compliant" (CSV) reader can optional read headers from a file. To do so, the reader:

* Opens the input stream
* Reads headers
* Closes the input stream
* Opens the input stream
* Reads data (skipping headers)
* Closes the input stream

While the above certainly works, it has an unnecessary close/open cycle. Many CSV readers simply read the header and use the same stream to read data. Drill should do so also.

In fact, Drill has historically coded its own headers scanner. The first was badly broken, but DRILL-5498 improved the parsing (though not file handling.)

Given that Drill's "compliant" text reader is based on the UniVocity library, and that library can parse headers, we should probably just reuse that existing code which has, very likely, evolved to handle the header usages seen in the wild.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)