You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Vova Vysotskyi <vv...@gmail.com> on 2018/02/28 17:47:08 UTC

Avro storage format behaviour

Hi all,

I am working on DRILL-4120: dir0 does not work when the directory structure
contains Avro files.

In DRILL-3810 was added validation of query using avro schema before start
executing the query.
Therefore with these changes Drill throws an exception when the
query contains non-existent column and table has avro format.
Other storage formats such as json or parquet allow usage of non-existing
fields.

So here is my question: should we continue to treat avro as a format with
fixed schema, or we should start treating avro as a dynamic format to be
consistent with other storage formats?

-- 
Kind regards,
Volodymyr Vysotskyi

Re: Avro storage format behaviour

Posted by Arina Yelchiyeva <ar...@gmail.com>.
As Paul has mentioned in PR [1] when we move to new scan framework it will
handle implicit columns for all file readers.
I guess till that let's treat avro as other file formats (for example,
parquet) so users can benefit from implicit columns for this format as well.

[1] https://github.com/apache/drill/pull/1138

On Wed, Feb 28, 2018 at 7:47 PM, Vova Vysotskyi <vv...@gmail.com> wrote:

> Hi all,
>
> I am working on DRILL-4120: dir0 does not work when the directory structure
> contains Avro files.
>
> In DRILL-3810 was added validation of query using avro schema before start
> executing the query.
> Therefore with these changes Drill throws an exception when the
> query contains non-existent column and table has avro format.
> Other storage formats such as json or parquet allow usage of non-existing
> fields.
>
> So here is my question: should we continue to treat avro as a format with
> fixed schema, or we should start treating avro as a dynamic format to be
> consistent with other storage formats?
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>

Re: Avro storage format behaviour

Posted by Arina Yelchiyeva <ar...@gmail.com>.
As Paul has mentioned in PR [1] when we move to new scan framework it will
handle implicit columns for all file readers.
I guess till that let's treat avro as other file formats (for example,
parquet) so users can benefit from implicit columns for this format as well.

[1] https://github.com/apache/drill/pull/1138

On Wed, Feb 28, 2018 at 7:47 PM, Vova Vysotskyi <vv...@gmail.com> wrote:

> Hi all,
>
> I am working on DRILL-4120: dir0 does not work when the directory structure
> contains Avro files.
>
> In DRILL-3810 was added validation of query using avro schema before start
> executing the query.
> Therefore with these changes Drill throws an exception when the
> query contains non-existent column and table has avro format.
> Other storage formats such as json or parquet allow usage of non-existing
> fields.
>
> So here is my question: should we continue to treat avro as a format with
> fixed schema, or we should start treating avro as a dynamic format to be
> consistent with other storage formats?
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>