You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Divya Gehlot <di...@gmail.com> on 2018/05/31 02:56:06 UTC

Read complex json file gives list type doesn't support different data types

Hi,
I am reading a complex json file, I am getting format doesn't support while
reading below :
 "Coordinates":[
            [
               23.53,
               4.99,
               11
            ],
            [
               35.09,
               7.7,
               16
            ]
]


Error : Query execution error. Details:[
> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value
> of type BIGINT. Drill does not support lists of different types.
> Line  15
> Column  19
> Field  Coordinates
> Line  15
> Column  19
> Field  Coordinates
> Line  15
> Column  19
> Field  Coordinates
> Fragment 0:0


If I remove the third coordinates(11,16) which is integer it works like
charm .

Does that means Drill doesn't support values of different data types in
array list?

Appreciate the help !

Thanks,
Divya

RE: Read complex json file gives list type doesn't support different data types

Posted by "Lee, David" <Da...@blackrock.com>.
Parquet is the target format, but getting to parquet is difficult. Using Drill's CTAS to create parquet is really easy, but there are pitfalls with converting JSON to Parquet. I think the only libraries that support nested Parquet creation with schema are in C++ so there aren't many options out there to generate Parquet.

-----Original Message-----
From: Paul Rogers [mailto:par0328@yahoo.com.INVALID] 
Sent: Thursday, May 31, 2018 11:42 AM
To: user@drill.apache.org
Subject: Re: Read complex json file gives list type doesn't support different data types

[EXTERNAL EMAIL]


+1

We had a long discussion on this topic on the dev list a month or so. The conclusion seemed to be that Drill is intended to pull schema out of data without an external schema; the data must support Drill's schema inference. There remain holes where knowing the schema up front would be a huge win. For now, the solution is to ETL to Parquet, which does carry a schema.

Thanks,
- Paul



    On Thursday, May 31, 2018, 8:25:00 AM PDT, Lee, David <Da...@blackrock.com> wrote:

 I think I opened an enhancement ticket to pass in a json schema object to a query to bypass schema learning to avoid problems like this. Coordinates could be typed as float in a schema object so drill can cast it to float without converting everything to doubles.

It also addresses the issues if some key value is NULL in the entire file. Drill will cast NULL to an int which results in a schema error if the next file read has non-Null string values.

Turning on read everything as string is a hack and even that fails once you start hitting Null key values which are nested keys or nested arrays.

An alternative short term solution would be to not include NULLs.

Sent from my iPad

> On May 31, 2018, at 2:23 AM, Divya Gehlot <di...@gmail.com> wrote:
>
> [EXTERNAL EMAIL]
>
>
> I tried  exec.enable_union_type it didnt work for me ,however below helped :
>
> ALTER SESSION SET `store.json.read_numbers_as_double` = true;
>
>
>> On 31 May 2018 at 11:28, Padma Penumarthy <pp...@mapr.com> wrote:
>>
>> yes, that is correct.
>> You can try setting the option “exec.enable_union_type” for that to 
>> work with the caveat that union type is not fully supported in drill.
>>
>> Thanks
>> Padma
>>
>>
>>> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> I am reading a complex json file, I am getting format doesn't 
>>> support
>> while
>>> reading below :
>>> "Coordinates":[
>>>          [
>>>              23.53,
>>>              4.99,
>>>              11
>>>          ],
>>>          [
>>>              35.09,
>>>              7.7,
>>>              16
>>>          ]
>>> ]
>>>
>>>
>>> Error : Query execution error. Details:[
>>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered 
>>>> a
>> value
>>>> of type BIGINT. Drill does not support lists of different types.
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Fragment 0:0
>>>
>>>
>>> If I remove the third coordinates(11,16) which is integer it works 
>>> like charm .
>>>
>>> Does that means Drill doesn't support values of different data types 
>>> in array list?
>>>
>>> Appreciate the help !
>>>
>>> Thanks,
>>> Divya
>>
>>


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.

Re: Read complex json file gives list type doesn't support different data types

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
+1

We had a long discussion on this topic on the dev list a month or so. The conclusion seemed to be that Drill is intended to pull schema out of data without an external schema; the data must support Drill's schema inference. There remain holes where knowing the schema up front would be a huge win. For now, the solution is to ETL to Parquet, which does carry a schema.

Thanks,
- Paul

 

    On Thursday, May 31, 2018, 8:25:00 AM PDT, Lee, David <Da...@blackrock.com> wrote:  
 
 I think I opened an enhancement ticket to pass in a json schema object to a query to bypass schema learning to avoid problems like this. Coordinates could be typed as float in a schema object so drill can cast it to float without converting everything to doubles.

It also addresses the issues if some key value is NULL in the entire file. Drill will cast NULL to an int which results in a schema error if the next file read has non-Null string values.

Turning on read everything as string is a hack and even that fails once you start hitting Null key values which are nested keys or nested arrays.

An alternative short term solution would be to not include NULLs.

Sent from my iPad

> On May 31, 2018, at 2:23 AM, Divya Gehlot <di...@gmail.com> wrote:
> 
> [EXTERNAL EMAIL]
> 
> 
> I tried  exec.enable_union_type it didnt work for me ,however below helped :
> 
> ALTER SESSION SET `store.json.read_numbers_as_double` = true;
> 
> 
>> On 31 May 2018 at 11:28, Padma Penumarthy <pp...@mapr.com> wrote:
>> 
>> yes, that is correct.
>> You can try setting the option “exec.enable_union_type” for that to work
>> with the caveat that
>> union type is not fully supported in drill.
>> 
>> Thanks
>> Padma
>> 
>> 
>>> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> I am reading a complex json file, I am getting format doesn't support
>> while
>>> reading below :
>>> "Coordinates":[
>>>          [
>>>              23.53,
>>>              4.99,
>>>              11
>>>          ],
>>>          [
>>>              35.09,
>>>              7.7,
>>>              16
>>>          ]
>>> ]
>>> 
>>> 
>>> Error : Query execution error. Details:[
>>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a
>> value
>>>> of type BIGINT. Drill does not support lists of different types.
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Fragment 0:0
>>> 
>>> 
>>> If I remove the third coordinates(11,16) which is integer it works like
>>> charm .
>>> 
>>> Does that means Drill doesn't support values of different data types in
>>> array list?
>>> 
>>> Appreciate the help !
>>> 
>>> Thanks,
>>> Divya
>> 
>> 


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.
  

Re: Read complex json file gives list type doesn't support different data types

Posted by "Lee, David" <Da...@blackrock.com>.
I think I opened an enhancement ticket to pass in a json schema object to a query to bypass schema learning to avoid problems like this. Coordinates could be typed as float in a schema object so drill can cast it to float without converting everything to doubles.

It also addresses the issues if some key value is NULL in the entire file. Drill will cast NULL to an int which results in a schema error if the next file read has non-Null string values.

Turning on read everything as string is a hack and even that fails once you start hitting Null key values which are nested keys or nested arrays.

An alternative short term solution would be to not include NULLs.

Sent from my iPad

> On May 31, 2018, at 2:23 AM, Divya Gehlot <di...@gmail.com> wrote:
> 
> [EXTERNAL EMAIL]
> 
> 
> I tried  exec.enable_union_type it didnt work for me ,however below helped :
> 
> ALTER SESSION SET `store.json.read_numbers_as_double` = true;
> 
> 
>> On 31 May 2018 at 11:28, Padma Penumarthy <pp...@mapr.com> wrote:
>> 
>> yes, that is correct.
>> You can try setting the option “exec.enable_union_type” for that to work
>> with the caveat that
>> union type is not fully supported in drill.
>> 
>> Thanks
>> Padma
>> 
>> 
>>> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> I am reading a complex json file, I am getting format doesn't support
>> while
>>> reading below :
>>> "Coordinates":[
>>>           [
>>>              23.53,
>>>              4.99,
>>>              11
>>>           ],
>>>           [
>>>              35.09,
>>>              7.7,
>>>              16
>>>           ]
>>> ]
>>> 
>>> 
>>> Error : Query execution error. Details:[
>>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a
>> value
>>>> of type BIGINT. Drill does not support lists of different types.
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Line  15
>>>> Column  19
>>>> Field  Coordinates
>>>> Fragment 0:0
>>> 
>>> 
>>> If I remove the third coordinates(11,16) which is integer it works like
>>> charm .
>>> 
>>> Does that means Drill doesn't support values of different data types in
>>> array list?
>>> 
>>> Appreciate the help !
>>> 
>>> Thanks,
>>> Divya
>> 
>> 


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.

Re: Read complex json file gives list type doesn't support different data types

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Actually, you want to enable all-text mode. Unions won't help as you'd need a union in a list (Drill's repeated list type) which, frankly, does not yet work.

With all-text mode, you'd need to do the conversion, which is awkward for lists...

Thanks,
- Paul

 

    On Thursday, May 31, 2018, 2:23:25 AM PDT, Divya Gehlot <di...@gmail.com> wrote:  
 
 I tried  exec.enable_union_type it didnt work for me ,however below helped :

ALTER SESSION SET `store.json.read_numbers_as_double` = true;


On 31 May 2018 at 11:28, Padma Penumarthy <pp...@mapr.com> wrote:

> yes, that is correct.
> You can try setting the option “exec.enable_union_type” for that to work
> with the caveat that
> union type is not fully supported in drill.
>
> Thanks
> Padma
>
>
> > On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
> wrote:
> >
> > Hi,
> > I am reading a complex json file, I am getting format doesn't support
> while
> > reading below :
> > "Coordinates":[
> >            [
> >              23.53,
> >              4.99,
> >              11
> >            ],
> >            [
> >              35.09,
> >              7.7,
> >              16
> >            ]
> > ]
> >
> >
> > Error : Query execution error. Details:[
> >> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a
> value
> >> of type BIGINT. Drill does not support lists of different types.
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Fragment 0:0
> >
> >
> > If I remove the third coordinates(11,16) which is integer it works like
> > charm .
> >
> > Does that means Drill doesn't support values of different data types in
> > array list?
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya
>
>  

Re: Read complex json file gives list type doesn't support different data types

Posted by Divya Gehlot <di...@gmail.com>.
I tried  exec.enable_union_type it didnt work for me ,however below helped :

ALTER SESSION SET `store.json.read_numbers_as_double` = true;


On 31 May 2018 at 11:28, Padma Penumarthy <pp...@mapr.com> wrote:

> yes, that is correct.
> You can try setting the option “exec.enable_union_type” for that to work
> with the caveat that
> union type is not fully supported in drill.
>
> Thanks
> Padma
>
>
> > On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
> wrote:
> >
> > Hi,
> > I am reading a complex json file, I am getting format doesn't support
> while
> > reading below :
> > "Coordinates":[
> >            [
> >               23.53,
> >               4.99,
> >               11
> >            ],
> >            [
> >               35.09,
> >               7.7,
> >               16
> >            ]
> > ]
> >
> >
> > Error : Query execution error. Details:[
> >> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a
> value
> >> of type BIGINT. Drill does not support lists of different types.
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Fragment 0:0
> >
> >
> > If I remove the third coordinates(11,16) which is integer it works like
> > charm .
> >
> > Does that means Drill doesn't support values of different data types in
> > array list?
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya
>
>

Re: Read complex json file gives list type doesn't support different data types

Posted by "Lee, David" <Da...@blackrock.com>.
11 and 16 are ints. 11.0 and 16.0 are floats in json. Everything in the array needs to be a float.

Sent from my iPad

> On May 30, 2018, at 8:29 PM, Padma Penumarthy <pp...@mapr.com> wrote:
> 
> [EXTERNAL EMAIL]
> 
> 
> yes, that is correct.
> You can try setting the option “exec.enable_union_type” for that to work with the caveat that
> union type is not fully supported in drill.
> 
> Thanks
> Padma
> 
> 
>> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com> wrote:
>> 
>> Hi,
>> I am reading a complex json file, I am getting format doesn't support while
>> reading below :
>> "Coordinates":[
>>           [
>>              23.53,
>>              4.99,
>>              11
>>           ],
>>           [
>>              35.09,
>>              7.7,
>>              16
>>           ]
>> ]
>> 
>> 
>> Error : Query execution error. Details:[
>>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value
>>> of type BIGINT. Drill does not support lists of different types.
>>> Line  15
>>> Column  19
>>> Field  Coordinates
>>> Line  15
>>> Column  19
>>> Field  Coordinates
>>> Line  15
>>> Column  19
>>> Field  Coordinates
>>> Fragment 0:0
>> 
>> 
>> If I remove the third coordinates(11,16) which is integer it works like
>> charm .
>> 
>> Does that means Drill doesn't support values of different data types in
>> array list?
>> 
>> Appreciate the help !
>> 
>> Thanks,
>> Divya
> 


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.

Re: Read complex json file gives list type doesn't support different data types

Posted by Divya Gehlot <di...@gmail.com>.
Hi Paul,
Thanks for all your inputs .
Now I found the workaround if I cast the values I am able to read but
unable to read  when just pick the columns without casting.
Is it the right way of doing it ?
Is it the same case with users in the Drill Community ?


Thanks,
Divya

On 1 June 2018 at 02:11, Paul Rogers <pa...@yahoo.com.invalid> wrote:

> Hi Divya,
> Please file a bug. Your file has a float, followed by an int. It is
> entirely possible for Drill to read the int as a float. There is work in
> progress that contains this fix, but would be good to file a bug (and get a
> fix) even with the current reader.
>
> What Drill can't do is "predict the future": the following cannot work:
> [10, 10.2].
>
> Short-term fix: change the file so that the "11" is "11.0", etc.
>
> Thanks,
> - Paul
>
>
>
>     On Wednesday, May 30, 2018, 8:29:11 PM PDT, Padma Penumarthy <
> ppenumarthy@mapr.com> wrote:
>
>  yes, that is correct.
> You can try setting the option “exec.enable_union_type” for that to work
> with the caveat that
> union type is not fully supported in drill.
>
> Thanks
> Padma
>
>
> > On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com>
> wrote:
> >
> > Hi,
> > I am reading a complex json file, I am getting format doesn't support
> while
> > reading below :
> > "Coordinates":[
> >            [
> >              23.53,
> >              4.99,
> >              11
> >            ],
> >            [
> >              35.09,
> >              7.7,
> >              16
> >            ]
> > ]
> >
> >
> > Error : Query execution error. Details:[
> >> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a
> value
> >> of type BIGINT. Drill does not support lists of different types.
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Line  15
> >> Column  19
> >> Field  Coordinates
> >> Fragment 0:0
> >
> >
> > If I remove the third coordinates(11,16) which is integer it works like
> > charm .
> >
> > Does that means Drill doesn't support values of different data types in
> > array list?
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya
>
>
>

Re: Read complex json file gives list type doesn't support different data types

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Divya,
Please file a bug. Your file has a float, followed by an int. It is entirely possible for Drill to read the int as a float. There is work in progress that contains this fix, but would be good to file a bug (and get a fix) even with the current reader.

What Drill can't do is "predict the future": the following cannot work: [10, 10.2].

Short-term fix: change the file so that the "11" is "11.0", etc.

Thanks,
- Paul

 

    On Wednesday, May 30, 2018, 8:29:11 PM PDT, Padma Penumarthy <pp...@mapr.com> wrote:  
 
 yes, that is correct.
You can try setting the option “exec.enable_union_type” for that to work with the caveat that
union type is not fully supported in drill.

Thanks
Padma


> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com> wrote:
> 
> Hi,
> I am reading a complex json file, I am getting format doesn't support while
> reading below :
> "Coordinates":[
>            [
>              23.53,
>              4.99,
>              11
>            ],
>            [
>              35.09,
>              7.7,
>              16
>            ]
> ]
> 
> 
> Error : Query execution error. Details:[
>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value
>> of type BIGINT. Drill does not support lists of different types.
>> Line  15
>> Column  19
>> Field  Coordinates
>> Line  15
>> Column  19
>> Field  Coordinates
>> Line  15
>> Column  19
>> Field  Coordinates
>> Fragment 0:0
> 
> 
> If I remove the third coordinates(11,16) which is integer it works like
> charm .
> 
> Does that means Drill doesn't support values of different data types in
> array list?
> 
> Appreciate the help !
> 
> Thanks,
> Divya

  

Re: Read complex json file gives list type doesn't support different data types

Posted by Padma Penumarthy <pp...@mapr.com>.
yes, that is correct.
You can try setting the option “exec.enable_union_type” for that to work with the caveat that
union type is not fully supported in drill.

Thanks
Padma


> On May 30, 2018, at 7:56 PM, Divya Gehlot <di...@gmail.com> wrote:
> 
> Hi,
> I am reading a complex json file, I am getting format doesn't support while
> reading below :
> "Coordinates":[
>            [
>               23.53,
>               4.99,
>               11
>            ],
>            [
>               35.09,
>               7.7,
>               16
>            ]
> ]
> 
> 
> Error : Query execution error. Details:[
>> UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value
>> of type BIGINT. Drill does not support lists of different types.
>> Line  15
>> Column  19
>> Field  Coordinates
>> Line  15
>> Column  19
>> Field  Coordinates
>> Line  15
>> Column  19
>> Field  Coordinates
>> Fragment 0:0
> 
> 
> If I remove the third coordinates(11,16) which is integer it works like
> charm .
> 
> Does that means Drill doesn't support values of different data types in
> array list?
> 
> Appreciate the help !
> 
> Thanks,
> Divya