You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Gayathri Selvaraj <ga...@gmail.com> on 2019/04/09 11:39:27 UTC

Issue faced in Apache drill

Hi Team,


Facing some issues with the following case:

Json file (*sample.json*) is having the following content:
{"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}

*Query:*

SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop
sample.json` group by a;

*Error:*

UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions
supported for VarChar type

*Observation:*

If we query without using group by, then it is working fine without any
error. If group by is used, then sum of null values is throwing the above
error.



Can anyone please let us know the solution for this or if there are any
alternative. I have raised a JIRA ticket for the same -
https://issues.apache.org/jira/browse/DRILL-7161


Regards,

Gayathri

RE: Issue faced in Apache drill

Posted by "Lee, David" <Da...@blackrock.com>.

This is the basic problem with JSON and Schema Learning..

If all the values are null you don’t know what datatype the object is..

Forcing a CAST() is really the only option.. This also implies that you almost always should do a CAST() to INT or FLOAT if you except possible NULL values..

I think the best solution is to try to introduce: https://json-schema.org/ as an alternative to schema learning which is also expensive..

When type is JSON allow a JSON schema to be passed in as well...

-----Original Message-----
From: Aman Sinha <am...@gmail.com> 
Sent: Tuesday, April 9, 2019 6:03 PM
To: user <us...@drill.apache.org>
Subject: Re: Issue faced in Apache drill

External Email: Use caution with links and attachments

The last suggestion from Paul about CASTing to desired type should work:
   SELECT a, SUM(CAST(b as INT) ) FROM dfs.`C:\\Users\\user\\Desktop  sample.json` group by a;

I suggest filing a JIRA for the original query because for some reason if all values are NULLs,  (and this is with group-by), Drill's aggregate function code generator is defaulting to a varchar specific function.

Aman

On Tue, Apr 9, 2019 at 4:58 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Gayathri,
>
> If you only have the content shown, then note that your "b" columns 
> are always null. Drill has no way to know what type b is supposed to be.
> Normally, Drill guesses Nullable Int.
>
> I wonder, have you turned on the "store.json.all_text_mode" session 
> option to tell JSON to read all columns as VarChar? If so, that would 
> explain why the error message says that b is VarChar.
>
> The team is in the process of adding a schema system to resolve this 
> kind of ambiguity: you'll be able to say, "b is a DOUBLE. Even if it 
> is all nulls, go ahead and treat it as double."
>
> In the mean time, you can try Rahul's suggestion. Or, you can ensure 
> that the b column has actual data. Or, you can try casting the (NULL 
> VarChar) b columns to the desired type.
>
> Thanks,
> - Paul
>
>
>
>     On Tuesday, April 9, 2019, 4:32:13 PM PDT, Gayathri Selvaraj < 
> gayathri.selvaraaj@gmail.com> wrote:
>
>  Hi Team,
>
>
> Facing some issues with the following case:
>
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}
>
> *Query:*
>
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop sample.json` group 
> by a;
>
> *Error:*
>
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate 
> functions supported for VarChar type
>
> *Observation:*
>
> If we query without using group by, then it is working fine without 
> any error. If group by is used, then sum of null values is throwing 
> the above error.
>
>
>
> Can anyone please let us know the solution for this or if there are 
> any alternative. I have raised a JIRA ticket for the same - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_DRILL-2D7161&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBT
> ifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=IH3U7J4Wr70dhjl2Nht2IfigsUXlPIIa
> b5nyYMJZ3wE&s=SQSZK0g-nvdnIa6wJIr-c8zYDmowCrLnkktKyrYXP1Y&e=
>
>
> Regards,
>
> Gayathri
>

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

Re: Issue faced in Apache drill

Posted by Aman Sinha <am...@gmail.com>.

The last suggestion from Paul about CASTing to desired type should work:
   SELECT a, SUM(CAST(b as INT) ) FROM dfs.`C:\\Users\\user\\Desktop
 sample.json` group by a;

I suggest filing a JIRA for the original query because for some reason if
all values are NULLs,  (and this is with group-by),
Drill's aggregate function code generator is defaulting to a varchar
specific function.

Aman

On Tue, Apr 9, 2019 at 4:58 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Gayathri,
>
> If you only have the content shown, then note that your "b" columns are
> always null. Drill has no way to know what type b is supposed to be.
> Normally, Drill guesses Nullable Int.
>
> I wonder, have you turned on the "store.json.all_text_mode" session option
> to tell JSON to read all columns as VarChar? If so, that would explain why
> the error message says that b is VarChar.
>
> The team is in the process of adding a schema system to resolve this kind
> of ambiguity: you'll be able to say, "b is a DOUBLE. Even if it is all
> nulls, go ahead and treat it as double."
>
> In the mean time, you can try Rahul's suggestion. Or, you can ensure that
> the b column has actual data. Or, you can try casting the (NULL VarChar) b
> columns to the desired type.
>
> Thanks,
> - Paul
>
>
>
>     On Tuesday, April 9, 2019, 4:32:13 PM PDT, Gayathri Selvaraj <
> gayathri.selvaraaj@gmail.com> wrote:
>
>  Hi Team,
>
>
> Facing some issues with the following case:
>
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}
>
> *Query:*
>
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop
> sample.json` group by a;
>
> *Error:*
>
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions
> supported for VarChar type
>
> *Observation:*
>
> If we query without using group by, then it is working fine without any
> error. If group by is used, then sum of null values is throwing the above
> error.
>
>
>
> Can anyone please let us know the solution for this or if there are any
> alternative. I have raised a JIRA ticket for the same -
> https://issues.apache.org/jira/browse/DRILL-7161
>
>
> Regards,
>
> Gayathri
>

Re: Issue faced in Apache drill

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.

Hi Gayathri,

If you only have the content shown, then note that your "b" columns are always null. Drill has no way to know what type b is supposed to be. Normally, Drill guesses Nullable Int.

I wonder, have you turned on the "store.json.all_text_mode" session option to tell JSON to read all columns as VarChar? If so, that would explain why the error message says that b is VarChar.

The team is in the process of adding a schema system to resolve this kind of ambiguity: you'll be able to say, "b is a DOUBLE. Even if it is all nulls, go ahead and treat it as double."

In the mean time, you can try Rahul's suggestion. Or, you can ensure that the b column has actual data. Or, you can try casting the (NULL VarChar) b columns to the desired type.

Thanks,
- Paul

 

    On Tuesday, April 9, 2019, 4:32:13 PM PDT, Gayathri Selvaraj <ga...@gmail.com> wrote:  
 
 Hi Team,


Facing some issues with the following case:

Json file (*sample.json*) is having the following content:
{"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}

*Query:*

SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop
sample.json` group by a;

*Error:*

UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions
supported for VarChar type

*Observation:*

If we query without using group by, then it is working fine without any
error. If group by is used, then sum of null values is throwing the above
error.



Can anyone please let us know the solution for this or if there are any
alternative. I have raised a JIRA ticket for the same -
https://issues.apache.org/jira/browse/DRILL-7161


Regards,

Gayathri

Re: Issue faced in Apache drill

Posted by rahul challapalli <ch...@gmail.com>.

My above solution made an implicit assumption that we return null even if a
single value in the column b is null. However you can modify the query to
replace nulls with 0's if that is what you want to do.

On Tue, Apr 9, 2019 at 4:41 PM rahul challapalli <ch...@gmail.com>
wrote:

> I haven't tried it myself but something like the below workaround should
> be helpful
>
> select
>   a,
>   case
>     when exists (select 1 from dfs.`sample.json` where b is null) then
> null
>    else sum(b)
>   end
> from dfs.`sample.json`
> group by a
>
> - Rahul
>
> On Tue, Apr 9, 2019 at 4:32 PM Gayathri Selvaraj <
> gayathri.selvaraaj@gmail.com> wrote:
>
>> Hi Team,
>>
>>
>> Facing some issues with the following case:
>>
>> Json file (*sample.json*) is having the following content:
>> {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}
>>
>> *Query:*
>>
>> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop
>> sample.json` group by a;
>>
>> *Error:*
>>
>> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions
>> supported for VarChar type
>>
>> *Observation:*
>>
>> If we query without using group by, then it is working fine without any
>> error. If group by is used, then sum of null values is throwing the above
>> error.
>>
>>
>>
>> Can anyone please let us know the solution for this or if there are any
>> alternative. I have raised a JIRA ticket for the same -
>> https://issues.apache.org/jira/browse/DRILL-7161
>>
>>
>> Regards,
>>
>> Gayathri
>>
>

Re: Issue faced in Apache drill

Posted by rahul challapalli <ch...@gmail.com>.

I haven't tried it myself but something like the below workaround should be
helpful

select
  a,
  case
    when exists (select 1 from dfs.`sample.json` where b is null) then null
   else sum(b)
  end
from dfs.`sample.json`
group by a

- Rahul

On Tue, Apr 9, 2019 at 4:32 PM Gayathri Selvaraj <
gayathri.selvaraaj@gmail.com> wrote:

> Hi Team,
>
>
> Facing some issues with the following case:
>
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}
>
> *Query:*
>
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop
> sample.json` group by a;
>
> *Error:*
>
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions
> supported for VarChar type
>
> *Observation:*
>
> If we query without using group by, then it is working fine without any
> error. If group by is used, then sum of null values is throwing the above
> error.
>
>
>
> Can anyone please let us know the solution for this or if there are any
> alternative. I have raised a JIRA ticket for the same -
> https://issues.apache.org/jira/browse/DRILL-7161
>
>
> Regards,
>
> Gayathri
>