You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/12 00:39:00 UTC

[jira] [Updated] (ARROW-17995) [C++] arrow::json::DecimalConverter should rescale values based on the explicit_schema

     [ https://issues.apache.org/jira/browse/ARROW-17995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-17995:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] arrow::json::DecimalConverter should rescale values based on the explicit_schema
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-17995
>                 URL: https://issues.apache.org/jira/browse/ARROW-17995
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 6.0.0, 6.0.1, 6.0.2, 7.0.0, 7.0.1, 8.0.0, 8.0.1, 9.0.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The C++ lib doesn't read JSON decimal values correctly based on the explicit_schema. This can be reproduced by this helloworld program: [https://github.com/stiga-huang/arrow-helloworld/tree/d267862]
> The input JSON file has the following rows:
> {code:json}
> {"id":1,"str":"Some","price":"30.04"}
> {"id":2,"str":"data","price":"1.234"} {code}
> If we read the price column using decimal128(9, 2), the values are
> {noformat}
>       30.04,
>       12.34
> {noformat}
> If we use decimal128(9, 3) instead, the values are
> {noformat}
>       3.004,
>       1.234
> {noformat}
> The decimal type in the explicit_schema is set here: https://github.com/stiga-huang/arrow-helloworld/blob/d26786270e87d9ab847658ead96a96190461b98f/json_decimal_example.cc#L38
> The cause is {{arrow::json::DecimalConverter}} doesn't rescale the value based on the out_type_:
> {code:cpp}
>   Status Convert(const std::shared_ptr<Array>& in, std::shared_ptr<Array>* out) override {
>     if (in->type_id() == Type::NA) {
>       return MakeArrayOfNull(out_type_, in->length(), pool_).Value(out);
>     }
>     const auto& dict_array = GetDictionaryArray(in);
>     using Builder = typename TypeTraits<T>::BuilderType;
>     Builder builder(out_type_, pool_);
>     RETURN_NOT_OK(builder.Resize(dict_array.indices()->length()));
>     auto visit_valid = [&builder](string_view repr) {
>       ARROW_ASSIGN_OR_RAISE(value_type value,
>                             TypeTraits<T>::BuilderType::ValueType::FromString(repr));
>       //////////// Should rescale the value based on out_type_ here
>       builder.UnsafeAppend(value);
>       return Status::OK();
>     };
>     auto visit_null = [&builder]() {
>       builder.UnsafeAppendNull();
>       return Status::OK();
>     };
>     RETURN_NOT_OK(VisitDictionaryEntries(dict_array, visit_valid, visit_null));
>     return builder.Finish(out);
>   }
> {code}
> https://github.com/apache/arrow/blob/cdd0fdf39033b9cf132a5cfc9caa5ed60713845a/cpp/src/arrow/json/converter.cc#L171-L173



--
This message was sent by Atlassian Jira
(v8.20.10#820010)