You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Pranav Yogi Lodha <pr...@cloudera.com> on 2022/09/19 03:44:25 UTC

[C++] Decimal support for unquoted json values

Hi, I was trying to work with arrow 8.0.0 to incorporate decimal support
for json scanner. I'm getting a parse error saying that it was converted
from string.
When I tried reading the same file with quoted values, it worked. Is there
a way to workaround with unquoted values as well, since a lot of pre-built
decimal data is unquoted.
I'd really appreciate any kind of pointers or a quick chat to explain more
on this.

Regards

Re: [C++] Decimal support for unquoted json values

Posted by Quanlong Huang <hu...@gmail.com>.
Hi Jin,

We've verified the fix works in the above helloworld program. We are
verifying the arrow-json lib in Impala with the fix.
Thanks a lot for your help!

Best,
Quanlong

On Thu, Sep 29, 2022 at 12:22 PM Jin Shang <sh...@gmail.com> wrote:

> This PR is merged. Could you try again with the latest master branch?
>
> Best,
> Jin
>
> 2022年9月27日 13:56,Quanlong Huang <hu...@gmail.com> 写道:
>
> Hi Jin,
>
> Thanks for working on this! I see you uploaded a PR at
> https://github.com/apache/arrow/pull/14242
> Looking forward to this feature!
>
> Thanks,
> Quanlong
>
> On Mon, Sep 26, 2022 at 11:31 PM Jin Shang <sh...@gmail.com> wrote:
>
>> Hi Quanlong and Pranav,
>>
>> Thanks for reporting this issue and providing an example! We are
>> currently working on unquoted decimal support for our JSON parser. It
>> should be done within a few days. I will send you an update once it’s ready.
>>
>> Best regards,
>> Jin
>>
>>
>> 2022年9月26日 20:12,Quanlong Huang <hu...@gmail.com> 写道:
>>
>> FWIW, here is an example to reproduce the issue:
>> https://github.com/stiga-huang/arrow-helloworld
>>
>> It seems the cpp lib expects JSON decimals represented as strings
>> (quoted) instead of numbers (unquoted):
>>
>> https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107
>>
>> Decimal128Type is a subclass of DecimalType which extends
>> FixedSizeBinaryType. So the expected type is kString. It'd be nice if
>> someone can confirm this, i.e. currently the cpp arrow lib can only read
>> JSON decimals represented as strings.
>>
>> Note that Hive writes decimals as (unquoted) numbers in JSON. So reading
>> unquoted decimals in JSON is an important feature for us.
>>
>> Thanks,
>> Quanlong
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <
>> pranav.lodha@cloudera.com> wrote:
>>
>>> All the values are unquoted.
>>>
>>> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <an...@python.org> wrote:
>>>
>>>> On Thu, 22 Sep 2022 16:43:58 +0530
>>>> Pranav Yogi Lodha <pr...@cloudera.com> wrote:
>>>> > The json scanner would be used for impala and this is the error that's
>>>> > shown when unquoted values are read:
>>>> >
>>>> > ERROR: JSON parse error: Column(/age) changed from string to number
>>>> in row 0
>>>> >
>>>> > Age is decimal type column. I and my team have been stuck on this for
>>>> a
>>>> > while, any pointers would be highly appreciated.
>>>>
>>>> This probably means you've got mixed types in the JSON column. Some
>>>> values are quoted, some are not. Is that right?
>>>>
>>>>
>>>>
>>
>

Re: [C++] Decimal support for unquoted json values

Posted by Jin Shang <sh...@gmail.com>.
This PR is merged. Could you try again with the latest master branch?

Best,
Jin

> 2022年9月27日 13:56,Quanlong Huang <hu...@gmail.com> 写道:
> 
> Hi Jin,
> 
> Thanks for working on this! I see you uploaded a PR at https://github.com/apache/arrow/pull/14242 <https://github.com/apache/arrow/pull/14242>
> Looking forward to this feature!
> 
> Thanks,
> Quanlong
> 
> On Mon, Sep 26, 2022 at 11:31 PM Jin Shang <shangjin1997@gmail.com <ma...@gmail.com>> wrote:
> Hi Quanlong and Pranav,
> 
> Thanks for reporting this issue and providing an example! We are currently working on unquoted decimal support for our JSON parser. It should be done within a few days. I will send you an update once it’s ready.
> 
> Best regards,
> Jin
> 
> 
>> 2022年9月26日 20:12,Quanlong Huang <huangquanlong@gmail.com <ma...@gmail.com>> 写道:
>> 
>> FWIW, here is an example to reproduce the issue: https://github.com/stiga-huang/arrow-helloworld <https://github.com/stiga-huang/arrow-helloworld>
>> 
>> It seems the cpp lib expects JSON decimals represented as strings (quoted) instead of numbers (unquoted):
>> https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107 <https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107>
>> 
>> Decimal128Type is a subclass of DecimalType which extends FixedSizeBinaryType. So the expected type is kString. It'd be nice if someone can confirm this, i.e. currently the cpp arrow lib can only read JSON decimals represented as strings.
>> 
>> Note that Hive writes decimals as (unquoted) numbers in JSON. So reading unquoted decimals in JSON is an important feature for us.
>> 
>> Thanks,
>> Quanlong
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <pranav.lodha@cloudera.com <ma...@cloudera.com>> wrote:
>> All the values are unquoted.
>> 
>> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <antoine@python.org <ma...@python.org>> wrote:
>> On Thu, 22 Sep 2022 16:43:58 +0530
>> Pranav Yogi Lodha <pranav.lodha@cloudera.com <ma...@cloudera.com>> wrote:
>> > The json scanner would be used for impala and this is the error that's
>> > shown when unquoted values are read:
>> > 
>> > ERROR: JSON parse error: Column(/age) changed from string to number in row 0
>> > 
>> > Age is decimal type column. I and my team have been stuck on this for a
>> > while, any pointers would be highly appreciated.
>> 
>> This probably means you've got mixed types in the JSON column. Some
>> values are quoted, some are not. Is that right?
>> 
>> 
> 


Re: [C++] Decimal support for unquoted json values

Posted by Pranav Yogi Lodha <pr...@cloudera.com>.
Thanks so much Jin! Really appreciate it!

Best regards,
Pranav

On Tue, Sep 27, 2022 at 11:27 AM Quanlong Huang <hu...@gmail.com>
wrote:

> Hi Jin,
>
> Thanks for working on this! I see you uploaded a PR at
> https://github.com/apache/arrow/pull/14242
> Looking forward to this feature!
>
> Thanks,
> Quanlong
>
> On Mon, Sep 26, 2022 at 11:31 PM Jin Shang <sh...@gmail.com> wrote:
>
>> Hi Quanlong and Pranav,
>>
>> Thanks for reporting this issue and providing an example! We are
>> currently working on unquoted decimal support for our JSON parser. It
>> should be done within a few days. I will send you an update once it’s ready.
>>
>> Best regards,
>> Jin
>>
>>
>> 2022年9月26日 20:12,Quanlong Huang <hu...@gmail.com> 写道:
>>
>> FWIW, here is an example to reproduce the issue:
>> https://github.com/stiga-huang/arrow-helloworld
>>
>> It seems the cpp lib expects JSON decimals represented as strings
>> (quoted) instead of numbers (unquoted):
>>
>> https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107
>>
>> Decimal128Type is a subclass of DecimalType which extends
>> FixedSizeBinaryType. So the expected type is kString. It'd be nice if
>> someone can confirm this, i.e. currently the cpp arrow lib can only read
>> JSON decimals represented as strings.
>>
>> Note that Hive writes decimals as (unquoted) numbers in JSON. So reading
>> unquoted decimals in JSON is an important feature for us.
>>
>> Thanks,
>> Quanlong
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <
>> pranav.lodha@cloudera.com> wrote:
>>
>>> All the values are unquoted.
>>>
>>> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <an...@python.org> wrote:
>>>
>>>> On Thu, 22 Sep 2022 16:43:58 +0530
>>>> Pranav Yogi Lodha <pr...@cloudera.com> wrote:
>>>> > The json scanner would be used for impala and this is the error that's
>>>> > shown when unquoted values are read:
>>>> >
>>>> > ERROR: JSON parse error: Column(/age) changed from string to number
>>>> in row 0
>>>> >
>>>> > Age is decimal type column. I and my team have been stuck on this for
>>>> a
>>>> > while, any pointers would be highly appreciated.
>>>>
>>>> This probably means you've got mixed types in the JSON column. Some
>>>> values are quoted, some are not. Is that right?
>>>>
>>>>
>>>>
>>

Re: [C++] Decimal support for unquoted json values

Posted by Quanlong Huang <hu...@gmail.com>.
Hi Jin,

Thanks for working on this! I see you uploaded a PR at
https://github.com/apache/arrow/pull/14242
Looking forward to this feature!

Thanks,
Quanlong

On Mon, Sep 26, 2022 at 11:31 PM Jin Shang <sh...@gmail.com> wrote:

> Hi Quanlong and Pranav,
>
> Thanks for reporting this issue and providing an example! We are currently
> working on unquoted decimal support for our JSON parser. It should be done
> within a few days. I will send you an update once it’s ready.
>
> Best regards,
> Jin
>
>
> 2022年9月26日 20:12,Quanlong Huang <hu...@gmail.com> 写道:
>
> FWIW, here is an example to reproduce the issue:
> https://github.com/stiga-huang/arrow-helloworld
>
> It seems the cpp lib expects JSON decimals represented as strings (quoted)
> instead of numbers (unquoted):
>
> https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107
>
> Decimal128Type is a subclass of DecimalType which extends
> FixedSizeBinaryType. So the expected type is kString. It'd be nice if
> someone can confirm this, i.e. currently the cpp arrow lib can only read
> JSON decimals represented as strings.
>
> Note that Hive writes decimals as (unquoted) numbers in JSON. So reading
> unquoted decimals in JSON is an important feature for us.
>
> Thanks,
> Quanlong
>
>
>
>
>
>
>
> On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <
> pranav.lodha@cloudera.com> wrote:
>
>> All the values are unquoted.
>>
>> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <an...@python.org> wrote:
>>
>>> On Thu, 22 Sep 2022 16:43:58 +0530
>>> Pranav Yogi Lodha <pr...@cloudera.com> wrote:
>>> > The json scanner would be used for impala and this is the error that's
>>> > shown when unquoted values are read:
>>> >
>>> > ERROR: JSON parse error: Column(/age) changed from string to number in
>>> row 0
>>> >
>>> > Age is decimal type column. I and my team have been stuck on this for a
>>> > while, any pointers would be highly appreciated.
>>>
>>> This probably means you've got mixed types in the JSON column. Some
>>> values are quoted, some are not. Is that right?
>>>
>>>
>>>
>

Re: [C++] Decimal support for unquoted json values

Posted by Jin Shang <sh...@gmail.com>.
Hi Quanlong and Pranav,

Thanks for reporting this issue and providing an example! We are currently working on unquoted decimal support for our JSON parser. It should be done within a few days. I will send you an update once it’s ready.

Best regards,
Jin


> 2022年9月26日 20:12,Quanlong Huang <hu...@gmail.com> 写道:
> 
> FWIW, here is an example to reproduce the issue: https://github.com/stiga-huang/arrow-helloworld <https://github.com/stiga-huang/arrow-helloworld>
> 
> It seems the cpp lib expects JSON decimals represented as strings (quoted) instead of numbers (unquoted):
> https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107 <https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107>
> 
> Decimal128Type is a subclass of DecimalType which extends FixedSizeBinaryType. So the expected type is kString. It'd be nice if someone can confirm this, i.e. currently the cpp arrow lib can only read JSON decimals represented as strings.
> 
> Note that Hive writes decimals as (unquoted) numbers in JSON. So reading unquoted decimals in JSON is an important feature for us.
> 
> Thanks,
> Quanlong
> 
> 
> 
> 
> 
> 
> 
> On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <pranav.lodha@cloudera.com <ma...@cloudera.com>> wrote:
> All the values are unquoted.
> 
> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <antoine@python.org <ma...@python.org>> wrote:
> On Thu, 22 Sep 2022 16:43:58 +0530
> Pranav Yogi Lodha <pranav.lodha@cloudera.com <ma...@cloudera.com>> wrote:
> > The json scanner would be used for impala and this is the error that's
> > shown when unquoted values are read:
> > 
> > ERROR: JSON parse error: Column(/age) changed from string to number in row 0
> > 
> > Age is decimal type column. I and my team have been stuck on this for a
> > while, any pointers would be highly appreciated.
> 
> This probably means you've got mixed types in the JSON column. Some
> values are quoted, some are not. Is that right?
> 
> 


Re: [C++] Decimal support for unquoted json values

Posted by Quanlong Huang <hu...@gmail.com>.
FWIW, here is an example to reproduce the issue:
https://github.com/stiga-huang/arrow-helloworld

It seems the cpp lib expects JSON decimals represented as strings (quoted)
instead of numbers (unquoted):
https://github.com/apache/arrow/blob/release-8.0.0/cpp/src/arrow/json/parser.cc#L107

Decimal128Type is a subclass of DecimalType which extends
FixedSizeBinaryType. So the expected type is kString. It'd be nice if
someone can confirm this, i.e. currently the cpp arrow lib can only read
JSON decimals represented as strings.

Note that Hive writes decimals as (unquoted) numbers in JSON. So reading
unquoted decimals in JSON is an important feature for us.

Thanks,
Quanlong







On Thu, Sep 22, 2022 at 9:28 PM Pranav Yogi Lodha <pr...@cloudera.com>
wrote:

> All the values are unquoted.
>
> On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <an...@python.org> wrote:
>
>> On Thu, 22 Sep 2022 16:43:58 +0530
>> Pranav Yogi Lodha <pr...@cloudera.com> wrote:
>> > The json scanner would be used for impala and this is the error that's
>> > shown when unquoted values are read:
>> >
>> > ERROR: JSON parse error: Column(/age) changed from string to number in
>> row 0
>> >
>> > Age is decimal type column. I and my team have been stuck on this for a
>> > while, any pointers would be highly appreciated.
>>
>> This probably means you've got mixed types in the JSON column. Some
>> values are quoted, some are not. Is that right?
>>
>>
>>

Re: [C++] Decimal support for unquoted json values

Posted by Pranav Yogi Lodha <pr...@cloudera.com>.
All the values are unquoted.

On Thu, 22 Sept 2022, 18:55 Antoine Pitrou, <an...@python.org> wrote:

> On Thu, 22 Sep 2022 16:43:58 +0530
> Pranav Yogi Lodha <pr...@cloudera.com> wrote:
> > The json scanner would be used for impala and this is the error that's
> > shown when unquoted values are read:
> >
> > ERROR: JSON parse error: Column(/age) changed from string to number in
> row 0
> >
> > Age is decimal type column. I and my team have been stuck on this for a
> > while, any pointers would be highly appreciated.
>
> This probably means you've got mixed types in the JSON column. Some
> values are quoted, some are not. Is that right?
>
>
>

Re: [C++] Decimal support for unquoted json values

Posted by Antoine Pitrou <an...@python.org>.
On Thu, 22 Sep 2022 16:43:58 +0530
Pranav Yogi Lodha <pr...@cloudera.com> wrote:
> The json scanner would be used for impala and this is the error that's
> shown when unquoted values are read:
> 
> ERROR: JSON parse error: Column(/age) changed from string to number in row 0
> 
> Age is decimal type column. I and my team have been stuck on this for a
> while, any pointers would be highly appreciated.

This probably means you've got mixed types in the JSON column. Some
values are quoted, some are not. Is that right?



Re: [C++] Decimal support for unquoted json values

Posted by Pranav Yogi Lodha <pr...@cloudera.com>.
The json scanner would be used for impala and this is the error that's
shown when unquoted values are read:

ERROR: JSON parse error: Column(/age) changed from string to number in row 0

Age is decimal type column. I and my team have been stuck on this for a
while, any pointers would be highly appreciated.

On Mon, 19 Sept 2022, 09:14 Pranav Yogi Lodha, <pr...@cloudera.com>
wrote:

>
> Hi, I was trying to work with arrow 8.0.0 to incorporate decimal support
> for json scanner. I'm getting a parse error saying that it was converted
> from string.
> When I tried reading the same file with quoted values, it worked. Is there
> a way to workaround with unquoted values as well, since a lot of pre-built
> decimal data is unquoted.
> I'd really appreciate any kind of pointers or a quick chat to explain more
> on this.
>
> Regards
>