You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Adam Gilmore <dr...@gmail.com> on 2015/07/30 03:44:00 UTC

DRILL-1257

Wanted to touch base to see what the status was of DRILL-1257.

We've run into a few instances where JSON/Mongo data is changing types and
Drill is unable to query it (e.g. a numeric type becomes a string type).

I know this is a pretty massive change with a lot of tough decisions to
make on how to handle that, but wanted to see what the roadmap looked like
- that is, is it in the near future?

At the moment I'm trying to work out some sort of temporary fix (i.e..
"upgrading" vectors, e.g. converting a float vector to varchar vector in my
above example).

As we're allowing users to run aggregations etc. against their data without
having knowledge of the schema, we can't really use "all_text_mode" and do
our own casting (apart from the huge performance degradation associated
with it).

Re: DRILL-1257

Posted by Adam Gilmore <dr...@gmail.com>.

Interesting.  I'm quite interested in how this would translate into the
creation of Parquet files too, considering Parquet as a format doesn't
support embedded types (as far as I know).  In our implementations, we have
ended up manually checking schemas first before Parquet creation (i.e.
splitting Parquet files).

Do you have any thoughts on that?

On Thu, Jul 30, 2015 at 12:45 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Well the "good news" is that this is such an important issue that we
> recreated it:
>
> https://issues.apache.org/jira/browse/DRILL-3228
>
> :)
>
> We're starting discussions about it now.  Realistically, it will take a
> little time to get right.  Simple promotion is easier to achieve but it
> would only work as long as it was done in the first batch until we fix
> schema change in all the operators.  (This might be good enough for your
> use cases... and could be a good start to this work).
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jul 29, 2015 at 6:44 PM, Adam Gilmore <dr...@gmail.com>
> wrote:
>
> > Wanted to touch base to see what the status was of DRILL-1257.
> >
> > We've run into a few instances where JSON/Mongo data is changing types
> and
> > Drill is unable to query it (e.g. a numeric type becomes a string type).
> >
> > I know this is a pretty massive change with a lot of tough decisions to
> > make on how to handle that, but wanted to see what the roadmap looked
> like
> > - that is, is it in the near future?
> >
> > At the moment I'm trying to work out some sort of temporary fix (i.e..
> > "upgrading" vectors, e.g. converting a float vector to varchar vector in
> my
> > above example).
> >
> > As we're allowing users to run aggregations etc. against their data
> without
> > having knowledge of the schema, we can't really use "all_text_mode" and
> do
> > our own casting (apart from the huge performance degradation associated
> > with it).
> >
>

Re: DRILL-1257

Posted by Jacques Nadeau <ja...@dremio.com>.

Well the "good news" is that this is such an important issue that we
recreated it:

https://issues.apache.org/jira/browse/DRILL-3228

:)

We're starting discussions about it now.  Realistically, it will take a
little time to get right.  Simple promotion is easier to achieve but it
would only work as long as it was done in the first batch until we fix
schema change in all the operators.  (This might be good enough for your
use cases... and could be a good start to this work).

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Jul 29, 2015 at 6:44 PM, Adam Gilmore <dr...@gmail.com> wrote:

> Wanted to touch base to see what the status was of DRILL-1257.
>
> We've run into a few instances where JSON/Mongo data is changing types and
> Drill is unable to query it (e.g. a numeric type becomes a string type).
>
> I know this is a pretty massive change with a lot of tough decisions to
> make on how to handle that, but wanted to see what the roadmap looked like
> - that is, is it in the near future?
>
> At the moment I'm trying to work out some sort of temporary fix (i.e..
> "upgrading" vectors, e.g. converting a float vector to varchar vector in my
> above example).
>
> As we're allowing users to run aggregations etc. against their data without
> having knowledge of the schema, we can't really use "all_text_mode" and do
> our own casting (apart from the huge performance degradation associated
> with it).
>