You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/30 17:15:29 UTC

[GitHub] [arrow] pitrou opened a new pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

pitrou opened a new pull request #8309:
URL: https://github.com/apache/arrow/pull/8309


   Simple value types are supported: integers, string-like, decimal


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8309:
URL: https://github.com/apache/arrow/pull/8309


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#issuecomment-701534126


   https://issues.apache.org/jira/browse/ARROW-7372


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r498102852



##########
File path: cpp/src/arrow/array/builder_dict.h
##########
@@ -190,6 +190,12 @@ class DictionaryBuilderBase : public ArrayBuilder {
   /// \brief The current number of entries in the dictionary
   int64_t dictionary_length() const { return memo_table_->size(); }
 
+  /// \brief The value byte width (for FixedSizeBinaryType)
+  template <typename T1 = T>
+  enable_if_fixed_size_binary<T1, int32_t> byte_width() const {
+    return byte_width_;
+  }
+

Review comment:
       The property is already exposed in `FixedSizeBinaryBuilder`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#issuecomment-702009333


   Thanks for the suggestions. I'm going to merge when CI is green.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r498099713



##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -50,13 +51,35 @@ namespace json {
 using ::arrow::internal::checked_cast;
 using ::arrow::internal::checked_pointer_cast;
 
-static constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+namespace {
 
-static Status JSONTypeError(const char* expected_type, rj::Type json_type) {
+constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+
+Status JSONTypeError(const char* expected_type, rj::Type json_type) {
   return Status::Invalid("Expected ", expected_type, " or null, got JSON type ",
                          json_type);
 }
 
+template <typename Type>
+struct RegularBuilderTraits {
+  using BuilderType = typename TypeTraits<Type>::BuilderType;
+
+  static const std::shared_ptr<DataType>& value_type(
+      const std::shared_ptr<DataType>& type) {
+    return type;
+  }
+};
+
+template <typename Type>
+struct DictionaryBuilderTraits {
+  using BuilderType = DictionaryBuilder<Type>;
+
+  static const std::shared_ptr<DataType>& value_type(
+      const std::shared_ptr<DataType>& type) {
+    return checked_cast<const DictionaryType&>(*type).value_type();
+  }
+};
+

Review comment:
       Ah, right, thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r497762312



##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -50,13 +51,35 @@ namespace json {
 using ::arrow::internal::checked_cast;
 using ::arrow::internal::checked_pointer_cast;
 
-static constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+namespace {
 
-static Status JSONTypeError(const char* expected_type, rj::Type json_type) {
+constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+
+Status JSONTypeError(const char* expected_type, rj::Type json_type) {
   return Status::Invalid("Expected ", expected_type, " or null, got JSON type ",
                          json_type);
 }
 
+template <typename Type>
+struct RegularBuilderTraits {
+  using BuilderType = typename TypeTraits<Type>::BuilderType;
+
+  static const std::shared_ptr<DataType>& value_type(
+      const std::shared_ptr<DataType>& type) {
+    return type;
+  }
+};
+
+template <typename Type>
+struct DictionaryBuilderTraits {
+  using BuilderType = DictionaryBuilder<Type>;
+
+  static const std::shared_ptr<DataType>& value_type(
+      const std::shared_ptr<DataType>& type) {
+    return checked_cast<const DictionaryType&>(*type).value_type();
+  }
+};
+

Review comment:
       I think the converters would be simplified by passing the `BuilderType` as the template parameter directly, rather than passing a trait with which to look it up. `value_type` need not be a trait member at all, I think:
   ```c++
   inline const std::shared_ptr<DataType>& value_type(const std::shared_ptr<DataType>& type) {
     if (type->id() != Type::DICTIONARY) return type;
     return checked_cast<const DictionaryType&>(*type).value_type();
   }
   ```

##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -412,12 +463,22 @@ class StringConverter final : public ConcreteConverter<StringConverter<TYPE>> {
 // ------------------------------------------------------------------------
 // Converter for fixed-size binary arrays
 
+template <template <typename T> class BuilderTraits = RegularBuilderTraits>
 class FixedSizeBinaryConverter final
-    : public ConcreteConverter<FixedSizeBinaryConverter> {
+    : public ConcreteConverter<FixedSizeBinaryConverter<BuilderTraits>> {
+  using BuilderType = typename BuilderTraits<FixedSizeBinaryType>::BuilderType;
+
  public:
   explicit FixedSizeBinaryConverter(const std::shared_ptr<DataType>& type) {
     this->type_ = type;
-    builder_ = std::make_shared<FixedSizeBinaryBuilder>(type, default_memory_pool());
+  }
+
+  Status Init() override {
+    std::unique_ptr<ArrayBuilder> builder;
+    RETURN_NOT_OK(MakeBuilder(default_memory_pool(), this->type_, &builder));
+    builder_ = checked_pointer_cast<BuilderType>(std::move(builder));
+    DCHECK(builder_);
+    return Status::OK();

Review comment:
       This is repeated several times, could it be simplified by adding a helper to ConcreteConverter:
   ```suggestion
       return this->MakeBuilder(&builder_);
   ```

##########
File path: cpp/src/arrow/array/builder_dict.h
##########
@@ -190,6 +190,12 @@ class DictionaryBuilderBase : public ArrayBuilder {
   /// \brief The current number of entries in the dictionary
   int64_t dictionary_length() const { return memo_table_->size(); }
 
+  /// \brief The value byte width (for FixedSizeBinaryType)
+  template <typename T1 = T>
+  enable_if_fixed_size_binary<T1, int32_t> byte_width() const {
+    return byte_width_;
+  }
+

Review comment:
       This is probably fine. Nit: I'd prefer to get the `FixedSizeBinaryType` inside the converter and use it's `byte_width()` over propagating the property to builders




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org