You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jacques Nadeau <ja...@dremio.com> on 2015/10/28 20:53:33 UTC

Broader feedback on DRILL-3810

Hey Guys,

DRILL-3810 is a patch adding schema to a format plugin. In order to do
this, Kamesh has suggested a change to the FormatPlugin that basically has
a secondary call called getDrillTable(Object selection) that is called
after the FormatMatcher. However, it seems weird that there is a
multi-stage interaction here between the engine and a format plugin. One
idea I had is that the FormatMatcher should return the Table object
directly (and thus have the ability to return a schema'd pattern). Kamesh's
most recent patch presents this approach. I wanted to get some more
feedback from others on this issue before we finalize a particular
direction since this should ultimately be a stable external API.

What do others think? For reference, it is my expectation is that CSV and
Parquet should ultimately also implement this interface.)

https://issues.apache.org/jira/browse/DRILL-3810

--
Jacques Nadeau
CTO and Co-Founder, Dremio

Re: Broader feedback on DRILL-3810

Posted by AnilKumar B <ak...@gmail.com>.
Hi,

I have provided the below review comment for Avro implementation, but I
think it is common for all schema based files, so just want to ask these
questions in this mailing chain.

1) This approach only works, if input data satisfies below points. So are
we going to impose the below conditions for all schema based FormatPlugin's?
    i. If the input directory is a leaf directory, then all the files in it
should have the same schema
    ii. If the input directory contains directories, then all the files in
sub-directories should have same schema.

2) What if directory has different files with different schemas? then it
will break. How do we handle this scenario?



Thanks & Regards,
B Anil Kumar.

On Thu, Oct 29, 2015 at 1:23 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> Hey Guys,
>
> DRILL-3810 is a patch adding schema to a format plugin. In order to do
> this, Kamesh has suggested a change to the FormatPlugin that basically has
> a secondary call called getDrillTable(Object selection) that is called
> after the FormatMatcher. However, it seems weird that there is a
> multi-stage interaction here between the engine and a format plugin. One
> idea I had is that the FormatMatcher should return the Table object
> directly (and thus have the ability to return a schema'd pattern). Kamesh's
> most recent patch presents this approach. I wanted to get some more
> feedback from others on this issue before we finalize a particular
> direction since this should ultimately be a stable external API.
>
> What do others think? For reference, it is my expectation is that CSV and
> Parquet should ultimately also implement this interface.)
>
> https://issues.apache.org/jira/browse/DRILL-3810
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>