You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2015/07/21 15:13:05 UTC

[jira] [Updated] (DRILL-3525) Drill proper DESCRIBE support for Parquet

     [ https://issues.apache.org/jira/browse/DRILL-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sekhon updated DRILL-3525:
-------------------------------
    Description: 
Request to add full DESCRIBE support for Parquet.

Currently the describe command results in a blank table being printed instead of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of Parquet data could be inefficient, I propose the following solution:

Read the first parquet file and assume that is the schema. Extend the DESCRIBE command to have a user-configurable number of parquet files to read to present a merged schema for the data source, as well as an ALL keywords to scan all parquet files to create true global schema.

In case of schema evolution you could try reading the newest and oldest parquet files.

  was:
Request to add full DESCRIBE support for Parquet.

Currently the describe command results in a blank table being printed instead of the schema, which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of Parquet data could be inefficient, I propose the following solution:

Read the first parquet file and assume that is the schema. Extend the DESCRIBE command to have a user-configurable number of parquet files to read to present a merged schema for the data source, as well as an ALL keywords to scan all parquet files to create true global schema.


> Drill proper DESCRIBE support for Parquet
> -----------------------------------------
>
>                 Key: DRILL-3525
>                 URL: https://issues.apache.org/jira/browse/DRILL-3525
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - Parquet
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> Request to add full DESCRIBE support for Parquet.
> Currently the describe command results in a blank table being printed instead of the schema, which is unhelpful, so I do a select * limit 1 instead.
> While trying to describe lots of Parquet data could be inefficient, I propose the following solution:
> Read the first parquet file and assume that is the schema. Extend the DESCRIBE command to have a user-configurable number of parquet files to read to present a merged schema for the data source, as well as an ALL keywords to scan all parquet files to create true global schema.
> In case of schema evolution you could try reading the newest and oldest parquet files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)