You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2015/07/21 15:25:04 UTC

[jira] [Created] (DRILL-3529) Drill proper DESCRIBE support for CSV

Hari Sekhon created DRILL-3529:
----------------------------------

             Summary: Drill proper DESCRIBE support for CSV
                 Key: DRILL-3529
                 URL: https://issues.apache.org/jira/browse/DRILL-3529
             Project: Apache Drill
          Issue Type: Bug
          Components: Metadata, Storage - Text & CSV
    Affects Versions: 1.1.0
            Reporter: Hari Sekhon
            Assignee: Steven Phillips


Request to add full DESCRIBE support for CSV files.

Currently the describe command results in a blank table being printed instead of the CSV header / schema.

This is dependent on DRILL-624 actually reading the header line as the schema of the CSV.

After DRILL-624 is completed, I propose the following solution:

When dealing with a directory with multiple CSV files it would might make sense to read N number of CSV file headers by default. Extend the DESCRIBE command to have a user-configurable number of CSV file headers be read and presented, as well as an ALL keywords to scan all CSV file headers to be able to authoritatively print the schema of all the data.

It might also make sense to read the newest and oldest CSV files by timestamp or by file name formatting conventions to pick up the different headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)