You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/10/12 00:33:00 UTC

[jira] [Created] (ARROW-17995) [C++] arrow::json::DecimalConverter should rescale values based on the explicit_schema

Quanlong Huang created ARROW-17995:
--------------------------------------

             Summary: [C++] arrow::json::DecimalConverter should rescale values based on the explicit_schema
                 Key: ARROW-17995
                 URL: https://issues.apache.org/jira/browse/ARROW-17995
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 9.0.0, 8.0.1, 8.0.0, 7.0.1, 7.0.0, 6.0.2, 6.0.1, 6.0.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


The C++ lib doesn't read JSON decimal values correctly based on the explicit_schema. This can be reproduced by this helloworld program: [https://github.com/stiga-huang/arrow-helloworld/tree/d267862]

The input JSON file has the following rows:
{code:json}
{"id":1,"str":"Some","price":"30.04"}
{"id":2,"str":"data","price":"1.234"} {code}

If we read the price column using decimal128(9, 2), the values are
{noformat}
      30.04,
      12.34
{noformat}
If we use decimal128(9, 3) instead, the values are
{noformat}
      3.004,
      1.234
{noformat}
The decimal type in the explicit_schema is set here: https://github.com/stiga-huang/arrow-helloworld/blob/d26786270e87d9ab847658ead96a96190461b98f/json_decimal_example.cc#L38

The cause is {{arrow::json::DecimalConverter}} doesn't rescale the value based on the out_type_:
{code:cpp}
  Status Convert(const std::shared_ptr<Array>& in, std::shared_ptr<Array>* out) override {
    if (in->type_id() == Type::NA) {
      return MakeArrayOfNull(out_type_, in->length(), pool_).Value(out);
    }
    const auto& dict_array = GetDictionaryArray(in);

    using Builder = typename TypeTraits<T>::BuilderType;
    Builder builder(out_type_, pool_);
    RETURN_NOT_OK(builder.Resize(dict_array.indices()->length()));

    auto visit_valid = [&builder](string_view repr) {
      ARROW_ASSIGN_OR_RAISE(value_type value,
                            TypeTraits<T>::BuilderType::ValueType::FromString(repr));
      //////////// Should rescale the value based on out_type_ here
      builder.UnsafeAppend(value);
      return Status::OK();
    };

    auto visit_null = [&builder]() {
      builder.UnsafeAppendNull();
      return Status::OK();
    };

    RETURN_NOT_OK(VisitDictionaryEntries(dict_array, visit_valid, visit_null));
    return builder.Finish(out);
  }
{code}
https://github.com/apache/arrow/blob/cdd0fdf39033b9cf132a5cfc9caa5ed60713845a/cpp/src/arrow/json/converter.cc#L171-L173



--
This message was sent by Atlassian Jira
(v8.20.10#820010)