You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/30 17:15:29 UTC
[GitHub] [arrow] pitrou opened a new pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
pitrou opened a new pull request #8309:
URL: https://github.com/apache/arrow/pull/8309
Simple value types are supported: integers, string-like, decimal
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8309:
URL: https://github.com/apache/arrow/pull/8309
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#issuecomment-701534126
https://issues.apache.org/jira/browse/ARROW-7372
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r498102852
##########
File path: cpp/src/arrow/array/builder_dict.h
##########
@@ -190,6 +190,12 @@ class DictionaryBuilderBase : public ArrayBuilder {
/// \brief The current number of entries in the dictionary
int64_t dictionary_length() const { return memo_table_->size(); }
+ /// \brief The value byte width (for FixedSizeBinaryType)
+ template <typename T1 = T>
+ enable_if_fixed_size_binary<T1, int32_t> byte_width() const {
+ return byte_width_;
+ }
+
Review comment:
The property is already exposed in `FixedSizeBinaryBuilder`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#issuecomment-702009333
Thanks for the suggestions. I'm going to merge when CI is green.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r498099713
##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -50,13 +51,35 @@ namespace json {
using ::arrow::internal::checked_cast;
using ::arrow::internal::checked_pointer_cast;
-static constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+namespace {
-static Status JSONTypeError(const char* expected_type, rj::Type json_type) {
+constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+
+Status JSONTypeError(const char* expected_type, rj::Type json_type) {
return Status::Invalid("Expected ", expected_type, " or null, got JSON type ",
json_type);
}
+template <typename Type>
+struct RegularBuilderTraits {
+ using BuilderType = typename TypeTraits<Type>::BuilderType;
+
+ static const std::shared_ptr<DataType>& value_type(
+ const std::shared_ptr<DataType>& type) {
+ return type;
+ }
+};
+
+template <typename Type>
+struct DictionaryBuilderTraits {
+ using BuilderType = DictionaryBuilder<Type>;
+
+ static const std::shared_ptr<DataType>& value_type(
+ const std::shared_ptr<DataType>& type) {
+ return checked_cast<const DictionaryType&>(*type).value_type();
+ }
+};
+
Review comment:
Ah, right, thank you.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #8309: ARROW-7372: [C++] Allow creating dictionary array from simple JSON
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8309:
URL: https://github.com/apache/arrow/pull/8309#discussion_r497762312
##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -50,13 +51,35 @@ namespace json {
using ::arrow::internal::checked_cast;
using ::arrow::internal::checked_pointer_cast;
-static constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+namespace {
-static Status JSONTypeError(const char* expected_type, rj::Type json_type) {
+constexpr auto kParseFlags = rj::kParseFullPrecisionFlag | rj::kParseNanAndInfFlag;
+
+Status JSONTypeError(const char* expected_type, rj::Type json_type) {
return Status::Invalid("Expected ", expected_type, " or null, got JSON type ",
json_type);
}
+template <typename Type>
+struct RegularBuilderTraits {
+ using BuilderType = typename TypeTraits<Type>::BuilderType;
+
+ static const std::shared_ptr<DataType>& value_type(
+ const std::shared_ptr<DataType>& type) {
+ return type;
+ }
+};
+
+template <typename Type>
+struct DictionaryBuilderTraits {
+ using BuilderType = DictionaryBuilder<Type>;
+
+ static const std::shared_ptr<DataType>& value_type(
+ const std::shared_ptr<DataType>& type) {
+ return checked_cast<const DictionaryType&>(*type).value_type();
+ }
+};
+
Review comment:
I think the converters would be simplified by passing the `BuilderType` as the template parameter directly, rather than passing a trait with which to look it up. `value_type` need not be a trait member at all, I think:
```c++
inline const std::shared_ptr<DataType>& value_type(const std::shared_ptr<DataType>& type) {
if (type->id() != Type::DICTIONARY) return type;
return checked_cast<const DictionaryType&>(*type).value_type();
}
```
##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -412,12 +463,22 @@ class StringConverter final : public ConcreteConverter<StringConverter<TYPE>> {
// ------------------------------------------------------------------------
// Converter for fixed-size binary arrays
+template <template <typename T> class BuilderTraits = RegularBuilderTraits>
class FixedSizeBinaryConverter final
- : public ConcreteConverter<FixedSizeBinaryConverter> {
+ : public ConcreteConverter<FixedSizeBinaryConverter<BuilderTraits>> {
+ using BuilderType = typename BuilderTraits<FixedSizeBinaryType>::BuilderType;
+
public:
explicit FixedSizeBinaryConverter(const std::shared_ptr<DataType>& type) {
this->type_ = type;
- builder_ = std::make_shared<FixedSizeBinaryBuilder>(type, default_memory_pool());
+ }
+
+ Status Init() override {
+ std::unique_ptr<ArrayBuilder> builder;
+ RETURN_NOT_OK(MakeBuilder(default_memory_pool(), this->type_, &builder));
+ builder_ = checked_pointer_cast<BuilderType>(std::move(builder));
+ DCHECK(builder_);
+ return Status::OK();
Review comment:
This is repeated several times, could it be simplified by adding a helper to ConcreteConverter:
```suggestion
return this->MakeBuilder(&builder_);
```
##########
File path: cpp/src/arrow/array/builder_dict.h
##########
@@ -190,6 +190,12 @@ class DictionaryBuilderBase : public ArrayBuilder {
/// \brief The current number of entries in the dictionary
int64_t dictionary_length() const { return memo_table_->size(); }
+ /// \brief The value byte width (for FixedSizeBinaryType)
+ template <typename T1 = T>
+ enable_if_fixed_size_binary<T1, int32_t> byte_width() const {
+ return byte_width_;
+ }
+
Review comment:
This is probably fine. Nit: I'd prefer to get the `FixedSizeBinaryType` inside the converter and use it's `byte_width()` over propagating the property to builders
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org