You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Nitin Pawar <ni...@gmail.com> on 2018/12/05 10:46:13 UTC

Help for statistic functions

Hi,

We have a multistep workflow system and in one of the step we do sum(x)
this step results the column being float-optional for next queries and then
functions start failing if the value is large float number (more than 8
digits)

Is there any setting where we can change this or it needs to be fixed in
code?
error we are getting is

Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
expression.
Error in expression at index -1.  Error: Missing function implementation:
[covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression: --UNKNOWN
EXPRESSION--.




-- 
Nitin Pawar

Re: Help for statistic functions

Posted by Nitin Pawar <ni...@gmail.com>.
Thank you Anton.

I will watch the jira for further progress

On Mon, Dec 10, 2018 at 10:01 PM Anton Gozhiy <an...@gmail.com> wrote:

> This is really a bug, reported it here:
> https://issues.apache.org/jira/browse/DRILL-6891.
> Thanks for finding this case.
>
> On Sat, Dec 8, 2018 at 7:15 AM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > here is the link
> > https://drive.google.com/open?id=1PSJMjIvwNObhGsc9jmD9wPapSPI1jQWb
> >
> > We have tried with direct fields as well but it keeps failing
> > Basically, we have a query which creates above parquet file
> > and the sequential query which fails, I already provided
> >
> >
> >
> > On Thu, Dec 6, 2018 at 10:09 PM Anton Gozhiy <an...@gmail.com>
> wrote:
> >
> > > Nitin, I don't see the attachment, maybe due to apache politics. Could
> > you
> > > share it by google drive?
> > > Regarding explicit casting, nullable and non-nullable double are
> > > represented by different types inside Drill and cannot be cast that
> way.
> > > That may cause the error.
> > >
> > > On Thu, Dec 6, 2018 at 5:55 PM Nitin Pawar <ni...@gmail.com>
> > > wrote:
> > >
> > > > Hello Anton,
> > > > Thanks for the reply.
> > > > I have tried explicit casting as well as with subquery mechanism
> > > > I have attached the parquet file along with this email
> > > >
> > > > following is the query
> > > > select covar_samp(cast(id_dist as double), cast(num2 as double)) from
> > > > dfs.tmp.`/nitin`;
> > > >
> > > >
> > > > On Thu, Dec 6, 2018 at 7:23 PM Anton Gozhiy <an...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Nitin Pawar,
> > > >> I was investigating this. Indeed, when one of the parameters has
> > > optional
> > > >> data mode, Drill cannot cast the parameters to the same type, and
> > there
> > > is
> > > >> no "covar_samp" UDF that takes parameters with different types.
> > > >> To reproduce this, I used a nullable column, bu I'm not sure if it
> is
> > > your
> > > >> case.
> > > >> You mentioned that it depends on the float number size.
> > > >> It would be helpful if you share the whole query and describe what
> > data
> > > >> did
> > > >> you use.
> > > >>
> > > >>
> > > >> On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <nitinpawar432@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > any help on this ??
> > > >> >
> > > >> > just to put some more data on this
> > > >> > if a query has select count(1), sum(b) from c
> > > >> > then we keep getting the error mentioned above as count ends up
> > being
> > > >> > bigint and sum ends being double and it is read as float-optional
> > for
> > > >> large
> > > >> > numbers
> > > >> >
> > > >> > On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <
> nitinpawar432@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > We have a multistep workflow system and in one of the step we do
> > > >> sum(x)
> > > >> > > this step results the column being float-optional for next
> queries
> > > and
> > > >> > > then functions start failing if the value is large float number
> > > (more
> > > >> > than
> > > >> > > 8 digits)
> > > >> > >
> > > >> > > Is there any setting where we can change this or it needs to be
> > > fixed
> > > >> in
> > > >> > > code?
> > > >> > > error we are getting is
> > > >> > >
> > > >> > > Error: SYSTEM ERROR: SchemaChangeException: Failure while
> > > >> materializing
> > > >> > > expression.
> > > >> > > Error in expression at index -1.  Error: Missing function
> > > >> implementation:
> > > >> > > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full
> expression:
> > > >> > --UNKNOWN
> > > >> > > EXPRESSION--.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Nitin Pawar
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Nitin Pawar
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Sincerely, Anton Gozhiy
> > > >> anton5813@gmail.com
> > > >>
> > > >
> > > >
> > > > --
> > > > Nitin Pawar
> > > >
> > >
> > >
> > > --
> > > Sincerely, Anton Gozhiy
> > > anton5813@gmail.com
> > >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Sincerely, Anton Gozhiy
> anton5813@gmail.com
>


-- 
Nitin Pawar

Re: Help for statistic functions

Posted by Anton Gozhiy <an...@gmail.com>.
This is really a bug, reported it here:
https://issues.apache.org/jira/browse/DRILL-6891.
Thanks for finding this case.

On Sat, Dec 8, 2018 at 7:15 AM Nitin Pawar <ni...@gmail.com> wrote:

> here is the link
> https://drive.google.com/open?id=1PSJMjIvwNObhGsc9jmD9wPapSPI1jQWb
>
> We have tried with direct fields as well but it keeps failing
> Basically, we have a query which creates above parquet file
> and the sequential query which fails, I already provided
>
>
>
> On Thu, Dec 6, 2018 at 10:09 PM Anton Gozhiy <an...@gmail.com> wrote:
>
> > Nitin, I don't see the attachment, maybe due to apache politics. Could
> you
> > share it by google drive?
> > Regarding explicit casting, nullable and non-nullable double are
> > represented by different types inside Drill and cannot be cast that way.
> > That may cause the error.
> >
> > On Thu, Dec 6, 2018 at 5:55 PM Nitin Pawar <ni...@gmail.com>
> > wrote:
> >
> > > Hello Anton,
> > > Thanks for the reply.
> > > I have tried explicit casting as well as with subquery mechanism
> > > I have attached the parquet file along with this email
> > >
> > > following is the query
> > > select covar_samp(cast(id_dist as double), cast(num2 as double)) from
> > > dfs.tmp.`/nitin`;
> > >
> > >
> > > On Thu, Dec 6, 2018 at 7:23 PM Anton Gozhiy <an...@gmail.com>
> wrote:
> > >
> > >> Hi Nitin Pawar,
> > >> I was investigating this. Indeed, when one of the parameters has
> > optional
> > >> data mode, Drill cannot cast the parameters to the same type, and
> there
> > is
> > >> no "covar_samp" UDF that takes parameters with different types.
> > >> To reproduce this, I used a nullable column, bu I'm not sure if it is
> > your
> > >> case.
> > >> You mentioned that it depends on the float number size.
> > >> It would be helpful if you share the whole query and describe what
> data
> > >> did
> > >> you use.
> > >>
> > >>
> > >> On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com>
> > >> wrote:
> > >>
> > >> > any help on this ??
> > >> >
> > >> > just to put some more data on this
> > >> > if a query has select count(1), sum(b) from c
> > >> > then we keep getting the error mentioned above as count ends up
> being
> > >> > bigint and sum ends being double and it is read as float-optional
> for
> > >> large
> > >> > numbers
> > >> >
> > >> > On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <nitinpawar432@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > We have a multistep workflow system and in one of the step we do
> > >> sum(x)
> > >> > > this step results the column being float-optional for next queries
> > and
> > >> > > then functions start failing if the value is large float number
> > (more
> > >> > than
> > >> > > 8 digits)
> > >> > >
> > >> > > Is there any setting where we can change this or it needs to be
> > fixed
> > >> in
> > >> > > code?
> > >> > > error we are getting is
> > >> > >
> > >> > > Error: SYSTEM ERROR: SchemaChangeException: Failure while
> > >> materializing
> > >> > > expression.
> > >> > > Error in expression at index -1.  Error: Missing function
> > >> implementation:
> > >> > > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
> > >> > --UNKNOWN
> > >> > > EXPRESSION--.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Nitin Pawar
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Nitin Pawar
> > >> >
> > >>
> > >>
> > >> --
> > >> Sincerely, Anton Gozhiy
> > >> anton5813@gmail.com
> > >>
> > >
> > >
> > > --
> > > Nitin Pawar
> > >
> >
> >
> > --
> > Sincerely, Anton Gozhiy
> > anton5813@gmail.com
> >
>
>
> --
> Nitin Pawar
>


-- 
Sincerely, Anton Gozhiy
anton5813@gmail.com

Re: Help for statistic functions

Posted by Nitin Pawar <ni...@gmail.com>.
here is the link
https://drive.google.com/open?id=1PSJMjIvwNObhGsc9jmD9wPapSPI1jQWb

We have tried with direct fields as well but it keeps failing
Basically, we have a query which creates above parquet file
and the sequential query which fails, I already provided



On Thu, Dec 6, 2018 at 10:09 PM Anton Gozhiy <an...@gmail.com> wrote:

> Nitin, I don't see the attachment, maybe due to apache politics. Could you
> share it by google drive?
> Regarding explicit casting, nullable and non-nullable double are
> represented by different types inside Drill and cannot be cast that way.
> That may cause the error.
>
> On Thu, Dec 6, 2018 at 5:55 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > Hello Anton,
> > Thanks for the reply.
> > I have tried explicit casting as well as with subquery mechanism
> > I have attached the parquet file along with this email
> >
> > following is the query
> > select covar_samp(cast(id_dist as double), cast(num2 as double)) from
> > dfs.tmp.`/nitin`;
> >
> >
> > On Thu, Dec 6, 2018 at 7:23 PM Anton Gozhiy <an...@gmail.com> wrote:
> >
> >> Hi Nitin Pawar,
> >> I was investigating this. Indeed, when one of the parameters has
> optional
> >> data mode, Drill cannot cast the parameters to the same type, and there
> is
> >> no "covar_samp" UDF that takes parameters with different types.
> >> To reproduce this, I used a nullable column, bu I'm not sure if it is
> your
> >> case.
> >> You mentioned that it depends on the float number size.
> >> It would be helpful if you share the whole query and describe what data
> >> did
> >> you use.
> >>
> >>
> >> On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com>
> >> wrote:
> >>
> >> > any help on this ??
> >> >
> >> > just to put some more data on this
> >> > if a query has select count(1), sum(b) from c
> >> > then we keep getting the error mentioned above as count ends up being
> >> > bigint and sum ends being double and it is read as float-optional for
> >> large
> >> > numbers
> >> >
> >> > On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > We have a multistep workflow system and in one of the step we do
> >> sum(x)
> >> > > this step results the column being float-optional for next queries
> and
> >> > > then functions start failing if the value is large float number
> (more
> >> > than
> >> > > 8 digits)
> >> > >
> >> > > Is there any setting where we can change this or it needs to be
> fixed
> >> in
> >> > > code?
> >> > > error we are getting is
> >> > >
> >> > > Error: SYSTEM ERROR: SchemaChangeException: Failure while
> >> materializing
> >> > > expression.
> >> > > Error in expression at index -1.  Error: Missing function
> >> implementation:
> >> > > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
> >> > --UNKNOWN
> >> > > EXPRESSION--.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Nitin Pawar
> >> > >
> >> >
> >> >
> >> > --
> >> > Nitin Pawar
> >> >
> >>
> >>
> >> --
> >> Sincerely, Anton Gozhiy
> >> anton5813@gmail.com
> >>
> >
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Sincerely, Anton Gozhiy
> anton5813@gmail.com
>


-- 
Nitin Pawar

Re: Help for statistic functions

Posted by Anton Gozhiy <an...@gmail.com>.
Nitin, I don't see the attachment, maybe due to apache politics. Could you
share it by google drive?
Regarding explicit casting, nullable and non-nullable double are
represented by different types inside Drill and cannot be cast that way.
That may cause the error.

On Thu, Dec 6, 2018 at 5:55 PM Nitin Pawar <ni...@gmail.com> wrote:

> Hello Anton,
> Thanks for the reply.
> I have tried explicit casting as well as with subquery mechanism
> I have attached the parquet file along with this email
>
> following is the query
> select covar_samp(cast(id_dist as double), cast(num2 as double)) from
> dfs.tmp.`/nitin`;
>
>
> On Thu, Dec 6, 2018 at 7:23 PM Anton Gozhiy <an...@gmail.com> wrote:
>
>> Hi Nitin Pawar,
>> I was investigating this. Indeed, when one of the parameters has optional
>> data mode, Drill cannot cast the parameters to the same type, and there is
>> no "covar_samp" UDF that takes parameters with different types.
>> To reproduce this, I used a nullable column, bu I'm not sure if it is your
>> case.
>> You mentioned that it depends on the float number size.
>> It would be helpful if you share the whole query and describe what data
>> did
>> you use.
>>
>>
>> On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>> > any help on this ??
>> >
>> > just to put some more data on this
>> > if a query has select count(1), sum(b) from c
>> > then we keep getting the error mentioned above as count ends up being
>> > bigint and sum ends being double and it is read as float-optional for
>> large
>> > numbers
>> >
>> > On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > We have a multistep workflow system and in one of the step we do
>> sum(x)
>> > > this step results the column being float-optional for next queries and
>> > > then functions start failing if the value is large float number (more
>> > than
>> > > 8 digits)
>> > >
>> > > Is there any setting where we can change this or it needs to be fixed
>> in
>> > > code?
>> > > error we are getting is
>> > >
>> > > Error: SYSTEM ERROR: SchemaChangeException: Failure while
>> materializing
>> > > expression.
>> > > Error in expression at index -1.  Error: Missing function
>> implementation:
>> > > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
>> > --UNKNOWN
>> > > EXPRESSION--.
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Nitin Pawar
>> > >
>> >
>> >
>> > --
>> > Nitin Pawar
>> >
>>
>>
>> --
>> Sincerely, Anton Gozhiy
>> anton5813@gmail.com
>>
>
>
> --
> Nitin Pawar
>


-- 
Sincerely, Anton Gozhiy
anton5813@gmail.com

Re: Help for statistic functions

Posted by Nitin Pawar <ni...@gmail.com>.
Hello Anton,
Thanks for the reply.
I have tried explicit casting as well as with subquery mechanism
I have attached the parquet file along with this email

following is the query
select covar_samp(cast(id_dist as double), cast(num2 as double)) from
dfs.tmp.`/nitin`;


On Thu, Dec 6, 2018 at 7:23 PM Anton Gozhiy <an...@gmail.com> wrote:

> Hi Nitin Pawar,
> I was investigating this. Indeed, when one of the parameters has optional
> data mode, Drill cannot cast the parameters to the same type, and there is
> no "covar_samp" UDF that takes parameters with different types.
> To reproduce this, I used a nullable column, bu I'm not sure if it is your
> case.
> You mentioned that it depends on the float number size.
> It would be helpful if you share the whole query and describe what data did
> you use.
>
>
> On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > any help on this ??
> >
> > just to put some more data on this
> > if a query has select count(1), sum(b) from c
> > then we keep getting the error mentioned above as count ends up being
> > bigint and sum ends being double and it is read as float-optional for
> large
> > numbers
> >
> > On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > We have a multistep workflow system and in one of the step we do sum(x)
> > > this step results the column being float-optional for next queries and
> > > then functions start failing if the value is large float number (more
> > than
> > > 8 digits)
> > >
> > > Is there any setting where we can change this or it needs to be fixed
> in
> > > code?
> > > error we are getting is
> > >
> > > Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
> > > expression.
> > > Error in expression at index -1.  Error: Missing function
> implementation:
> > > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
> > --UNKNOWN
> > > EXPRESSION--.
> > >
> > >
> > >
> > >
> > > --
> > > Nitin Pawar
> > >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Sincerely, Anton Gozhiy
> anton5813@gmail.com
>


-- 
Nitin Pawar

Fwd: Help for statistic functions

Posted by Anton Gozhiy <an...@gmail.com>.
---------- Forwarded message ---------
From: Anton Gozhiy <an...@gmail.com>
Date: Thu, Dec 6, 2018 at 3:44 PM
Subject: Re: Help for statistic functions
To: <de...@drill.apache.org>


Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has optional
data mode, Drill cannot cast the parameters to the same type, and there is
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is your
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.


On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com> wrote:

> any help on this ??
>
> just to put some more data on this
> if a query has select count(1), sum(b) from c
> then we keep getting the error mentioned above as count ends up being
> bigint and sum ends being double and it is read as float-optional for large
> numbers
>
> On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have a multistep workflow system and in one of the step we do sum(x)
> > this step results the column being float-optional for next queries and
> > then functions start failing if the value is large float number (more
> than
> > 8 digits)
> >
> > Is there any setting where we can change this or it needs to be fixed in
> > code?
> > error we are getting is
> >
> > Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
> > expression.
> > Error in expression at index -1.  Error: Missing function implementation:
> > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
> --UNKNOWN
> > EXPRESSION--.
> >
> >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>

-- 
Sincerely, Anton Gozhiy
anton5813@gmail.com

Re: Help for statistic functions

Posted by Anton Gozhiy <an...@gmail.com>.
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has optional
data mode, Drill cannot cast the parameters to the same type, and there is
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is your
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.


On Thu, Dec 6, 2018 at 3:23 PM Nitin Pawar <ni...@gmail.com> wrote:

> any help on this ??
>
> just to put some more data on this
> if a query has select count(1), sum(b) from c
> then we keep getting the error mentioned above as count ends up being
> bigint and sum ends being double and it is read as float-optional for large
> numbers
>
> On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have a multistep workflow system and in one of the step we do sum(x)
> > this step results the column being float-optional for next queries and
> > then functions start failing if the value is large float number (more
> than
> > 8 digits)
> >
> > Is there any setting where we can change this or it needs to be fixed in
> > code?
> > error we are getting is
> >
> > Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
> > expression.
> > Error in expression at index -1.  Error: Missing function implementation:
> > [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression:
> --UNKNOWN
> > EXPRESSION--.
> >
> >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>


-- 
Sincerely, Anton Gozhiy
anton5813@gmail.com

Re: Help for statistic functions

Posted by Nitin Pawar <ni...@gmail.com>.
any help on this ??

just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for large
numbers

On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com> wrote:

> Hi,
>
> We have a multistep workflow system and in one of the step we do sum(x)
> this step results the column being float-optional for next queries and
> then functions start failing if the value is large float number (more than
> 8 digits)
>
> Is there any setting where we can change this or it needs to be fixed in
> code?
> error we are getting is
>
> Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
> expression.
> Error in expression at index -1.  Error: Missing function implementation:
> [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression: --UNKNOWN
> EXPRESSION--.
>
>
>
>
> --
> Nitin Pawar
>


-- 
Nitin Pawar

Re: Help for statistic functions

Posted by Nitin Pawar <ni...@gmail.com>.
any help on this ??

just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for large
numbers

On Wed, Dec 5, 2018 at 4:16 PM Nitin Pawar <ni...@gmail.com> wrote:

> Hi,
>
> We have a multistep workflow system and in one of the step we do sum(x)
> this step results the column being float-optional for next queries and
> then functions start failing if the value is large float number (more than
> 8 digits)
>
> Is there any setting where we can change this or it needs to be fixed in
> code?
> error we are getting is
>
> Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
> expression.
> Error in expression at index -1.  Error: Missing function implementation:
> [covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)].  Full expression: --UNKNOWN
> EXPRESSION--.
>
>
>
>
> --
> Nitin Pawar
>


-- 
Nitin Pawar