You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:26:53 UTC

[GitHub] [arrow] bkietz opened a new pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

bkietz opened a new pull request #7378:
URL: https://github.com/apache/arrow/pull/7378


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r436932116



##########
File path: cpp/src/arrow/c/bridge.cc
##########
@@ -1320,14 +1324,16 @@ struct ArrayImporter {
 
   Status Visit(const UnionType& type) {
     auto mode = type.mode();
-    RETURN_NOT_OK(CheckNumBuffers(3));
+    if (mode == UnionMode::SPARSE) {
+      RETURN_NOT_OK(CheckNumBuffers(2));
+    } else {
+      RETURN_NOT_OK(CheckNumBuffers(3));
+    }

Review comment:
       That looks fine to me. The more nagging question is: is the IPC format changed?
   (given that the integration tests weren't changed in this PR, I would assume "no": is it true?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439478099



##########
File path: cpp/src/arrow/type_fwd.h
##########
@@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit);
 std::shared_ptr<DataType> ARROW_EXPORT
 struct_(const std::vector<std::shared_ptr<Field>>& fields);
 
-/// \brief Create a UnionType instance
-std::shared_ptr<DataType> ARROW_EXPORT
-union_(const std::vector<std::shared_ptr<Field>>& child_fields,

Review comment:
       UnionMode still exists but I've already factored out usage of `union_`. If we need it later it'll be straightforward to add




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-643390508


   Sorry about the noise I think I've got it now. We might need to create an "arrow-deprecated-test" at some point where we verify that deprecated APIs still work as advertised


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-640878504


   @wesm no problem it didn't take much time. Sparse and Dense were basically distinct code paths already in vector_selection_internal. No need to rebase yours; rebasing this one across that file shouldn't be a problem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-643413102


   thanks @bkietz!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-640849143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439479004



##########
File path: cpp/src/arrow/type.h
##########
@@ -1015,25 +1018,12 @@ class ARROW_EXPORT Decimal128Type : public DecimalType {
 /// \brief Concrete type class for union data
 class ARROW_EXPORT UnionType : public NestedType {
  public:
-  static constexpr Type::type type_id = Type::UNION;
   static constexpr int8_t kMaxTypeCode = 127;
   static constexpr int kInvalidChildId = -1;
 
-  static constexpr const char* type_name() { return "union"; }
-
-  UnionType(const std::vector<std::shared_ptr<Field>>& fields,

Review comment:
       Ah, I see. At least keep the static factory function then?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r438027429



##########
File path: cpp/src/arrow/array/array_nested.h
##########
@@ -396,70 +445,43 @@ class ARROW_EXPORT UnionArray : public Array {
   /// \param[in] children Vector of children Arrays containing the data for each type.
   /// \param[in] field_names Vector of strings containing the name of each field.
   /// \param[in] type_codes Vector of type codes.
-  static Result<std::shared_ptr<Array>> MakeSparse(
-      const Array& type_ids, const std::vector<std::shared_ptr<Array>>& children,
-      const std::vector<std::string>& field_names = {},
-      const std::vector<type_code_t>& type_codes = {});
+  static Result<std::shared_ptr<Array>> Make(const Array& type_ids,
+                                             const Array& value_offsets,
+                                             ArrayVector children,
+                                             std::vector<std::string> field_names = {},
+                                             std::vector<type_code_t> type_codes = {});
 
-  /// \brief Construct Sparse UnionArray from type_ids and children
+  /// \brief Construct Dense UnionArray from type_ids and children

Review comment:
       "DenseUnionArray"

##########
File path: cpp/src/arrow/array/array_nested.h
##########
@@ -337,57 +338,105 @@ class ARROW_EXPORT StructArray : public Array {
 /// Concrete Array class for union data
 class ARROW_EXPORT UnionArray : public Array {
  public:
-  using TypeClass = UnionType;
-
   using type_code_t = int8_t;
 
-  explicit UnionArray(const std::shared_ptr<ArrayData>& data);
+  /// Note that this buffer does not account for any slice offset
+  std::shared_ptr<Buffer> type_codes() const { return data_->buffers[1]; }
+
+  const type_code_t* raw_type_codes() const { return raw_type_codes_ + data_->offset; }
+
+  /// The physical child id containing value at index.
+  int child_id(int64_t i) const {
+    return union_type_->child_ids()[raw_type_codes_[i + data_->offset]];
+  }
+
+  const UnionType* union_type() const { return union_type_; }
+
+  UnionMode::type mode() const { return union_type_->mode(); }
+
+  // Return the given field as an individual array.
+  // For sparse unions, the returned array has its offset, length and null
+  // count adjusted.
+  ARROW_DEPRECATED("Deprecated in 1.0.0. Use field(pos)")
+  std::shared_ptr<Array> child(int pos) const;
+
+  /// \brief Return the given field as an individual array.
+  ///
+  /// For sparse unions, the returned array has its offset, length and null
+  /// count adjusted.
+  std::shared_ptr<Array> field(int pos) const;
+
+ protected:
+  void SetData(std::shared_ptr<ArrayData> data);
+
+  const type_code_t* raw_type_codes_;
+  const UnionType* union_type_;
+
+  // For caching boxed child data
+  mutable std::vector<std::shared_ptr<Array>> boxed_fields_;
+};
+
+/// Concrete Array class for union data

Review comment:
       Need to update docstring.

##########
File path: cpp/src/arrow/array/array_nested.h
##########
@@ -337,57 +338,105 @@ class ARROW_EXPORT StructArray : public Array {
 /// Concrete Array class for union data
 class ARROW_EXPORT UnionArray : public Array {
  public:
-  using TypeClass = UnionType;
-
   using type_code_t = int8_t;
 
-  explicit UnionArray(const std::shared_ptr<ArrayData>& data);
+  /// Note that this buffer does not account for any slice offset
+  std::shared_ptr<Buffer> type_codes() const { return data_->buffers[1]; }
+
+  const type_code_t* raw_type_codes() const { return raw_type_codes_ + data_->offset; }
+
+  /// The physical child id containing value at index.
+  int child_id(int64_t i) const {
+    return union_type_->child_ids()[raw_type_codes_[i + data_->offset]];
+  }
+
+  const UnionType* union_type() const { return union_type_; }
+
+  UnionMode::type mode() const { return union_type_->mode(); }
+
+  // Return the given field as an individual array.
+  // For sparse unions, the returned array has its offset, length and null
+  // count adjusted.
+  ARROW_DEPRECATED("Deprecated in 1.0.0. Use field(pos)")
+  std::shared_ptr<Array> child(int pos) const;
+
+  /// \brief Return the given field as an individual array.
+  ///
+  /// For sparse unions, the returned array has its offset, length and null
+  /// count adjusted.
+  std::shared_ptr<Array> field(int pos) const;
+
+ protected:
+  void SetData(std::shared_ptr<ArrayData> data);
+
+  const type_code_t* raw_type_codes_;
+  const UnionType* union_type_;
+
+  // For caching boxed child data
+  mutable std::vector<std::shared_ptr<Array>> boxed_fields_;
+};
+
+/// Concrete Array class for union data
+class ARROW_EXPORT SparseUnionArray : public UnionArray {
+ public:
+  using TypeClass = SparseUnionType;
+
+  explicit SparseUnionArray(std::shared_ptr<ArrayData> data);
 
-  UnionArray(const std::shared_ptr<DataType>& type, int64_t length,
-             const std::vector<std::shared_ptr<Array>>& children,
-             const std::shared_ptr<Buffer>& type_ids,
-             const std::shared_ptr<Buffer>& value_offsets = NULLPTR,
-             const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
-             int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+  SparseUnionArray(std::shared_ptr<DataType> type, int64_t length, ArrayVector children,
+                   std::shared_ptr<Buffer> type_ids,
+                   std::shared_ptr<Buffer> null_bitmap = NULLPTR,
+                   int64_t null_count = kUnknownNullCount, int64_t offset = 0);
 
-  /// \brief Construct Dense UnionArray from types_ids, value_offsets and children
+  /// \brief Construct SparseUnionArray from type_ids and children
   ///
   /// This function does the bare minimum of validation of the offsets and
-  /// input types. The value_offsets are assumed to be well-formed.
+  /// input types.
   ///
   /// \param[in] type_ids An array of logical type ids for the union type
-  /// \param[in] value_offsets An array of signed int32 values indicating the
-  /// relative offset into the respective child array for the type in a given slot.
-  /// The respective offsets for each child value array must be in order / increasing.
   /// \param[in] children Vector of children Arrays containing the data for each type.
   /// \param[in] field_names Vector of strings containing the name of each field.
   /// \param[in] type_codes Vector of type codes.
-  static Result<std::shared_ptr<Array>> MakeDense(
-      const Array& type_ids, const Array& value_offsets,
-      const std::vector<std::shared_ptr<Array>>& children,
-      const std::vector<std::string>& field_names = {},
-      const std::vector<type_code_t>& type_codes = {});
+  static Result<std::shared_ptr<Array>> Make(const Array& type_ids, ArrayVector children,
+                                             std::vector<std::string> field_names = {},
+                                             std::vector<type_code_t> type_codes = {});
 
-  /// \brief Construct Dense UnionArray from types_ids, value_offsets and children
+  /// \brief Construct Sparse UnionArray from type_ids and children
   ///
   /// This function does the bare minimum of validation of the offsets and
-  /// input types. The value_offsets are assumed to be well-formed.
+  /// input types.
   ///
   /// \param[in] type_ids An array of logical type ids for the union type
-  /// \param[in] value_offsets An array of signed int32 values indicating the
-  /// relative offset into the respective child array for the type in a given slot.
-  /// The respective offsets for each child value array must be in order / increasing.
   /// \param[in] children Vector of children Arrays containing the data for each type.
   /// \param[in] type_codes Vector of type codes.
-  static Result<std::shared_ptr<Array>> MakeDense(
-      const Array& type_ids, const Array& value_offsets,
-      const std::vector<std::shared_ptr<Array>>& children,
-      const std::vector<type_code_t>& type_codes) {
-    return MakeDense(type_ids, value_offsets, children, std::vector<std::string>{},
-                     type_codes);
+  static Result<std::shared_ptr<Array>> Make(const Array& type_ids, ArrayVector children,
+                                             std::vector<type_code_t> type_codes) {
+    return Make(std::move(type_ids), std::move(children), std::vector<std::string>{},
+                std::move(type_codes));
   }
 
-  /// \brief Construct Sparse UnionArray from type_ids and children
+  const SparseUnionType* union_type() const {
+    return internal::checked_cast<const SparseUnionType*>(union_type_);
+  }
+
+ protected:
+  void SetData(std::shared_ptr<ArrayData> data);
+};
+
+/// Concrete Array class for union data

Review comment:
       Need to update docstring.

##########
File path: cpp/src/arrow/array/array_primitive.h
##########
@@ -108,11 +108,11 @@ class ARROW_EXPORT DayTimeIntervalArray : public PrimitiveArray {
   // For compatibility with Take kernel.
   TypeClass::DayMilliseconds GetView(int64_t i) const { return GetValue(i); }
 
-  int32_t byte_width() const { return sizeof(TypeClass::DayMilliseconds); }
-
   const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }
 
  protected:
+  static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }

Review comment:
       Not sure why you made this protected? The byte width may be useful.

##########
File path: cpp/src/arrow/type_fwd.h
##########
@@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit);
 std::shared_ptr<DataType> ARROW_EXPORT
 struct_(const std::vector<std::shared_ptr<Field>>& fields);
 
-/// \brief Create a UnionType instance
-std::shared_ptr<DataType> ARROW_EXPORT
-union_(const std::vector<std::shared_ptr<Field>>& child_fields,

Review comment:
       I think we should keep this factory function, for compatibility and for convenience. `UnionMode` still exists, right?

##########
File path: cpp/src/arrow/type.h
##########
@@ -1015,25 +1018,12 @@ class ARROW_EXPORT Decimal128Type : public DecimalType {
 /// \brief Concrete type class for union data
 class ARROW_EXPORT UnionType : public NestedType {
  public:
-  static constexpr Type::type type_id = Type::UNION;
   static constexpr int8_t kMaxTypeCode = 127;
   static constexpr int kInvalidChildId = -1;
 
-  static constexpr const char* type_name() { return "union"; }
-
-  UnionType(const std::vector<std::shared_ptr<Field>>& fields,

Review comment:
       I'm not sure it's a good idea to remove these constructors, while external code may rely on them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-643412670


   Appveyor was passing two commits ago: https://ci.appveyor.com/project/BenjaminKietzman/arrow/builds/33489614. The Ursabot CI failure looks transient. I'll go ahead and merge this so I can rebase what I'm working on 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439480534



##########
File path: cpp/src/arrow/array/array_primitive.h
##########
@@ -108,11 +108,11 @@ class ARROW_EXPORT DayTimeIntervalArray : public PrimitiveArray {
   // For compatibility with Take kernel.
   TypeClass::DayMilliseconds GetView(int64_t i) const { return GetValue(i); }
 
-  int32_t byte_width() const { return sizeof(TypeClass::DayMilliseconds); }
-
   const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }
 
  protected:
+  static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }

Review comment:
       Well, it was public so could be used by anybody really. I'm ok with not making the public API too plethoric but this seems a rather useful function (and trivial to maintain).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-640849143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-643004964


   @bkietz can you rebase and address @pitrou's comments tomorrow? This is going to collide with ARROW-9075 so would prefer that this go in first and then I can rebase my patch on that


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439479405



##########
File path: cpp/src/arrow/array/array_primitive.h
##########
@@ -108,11 +108,11 @@ class ARROW_EXPORT DayTimeIntervalArray : public PrimitiveArray {
   // For compatibility with Take kernel.
   TypeClass::DayMilliseconds GetView(int64_t i) const { return GetValue(i); }
 
-  int32_t byte_width() const { return sizeof(TypeClass::DayMilliseconds); }
-
   const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }
 
  protected:
+  static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }

Review comment:
       It's not used anywhere except in this class. If needed it can easily be reintroduced but it seemed better to minimize the public API




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-641498953


   My patch #7382 should not conflict with this so no need to worry about rebasing


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439539505



##########
File path: cpp/src/arrow/type_fwd.h
##########
@@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit);
 std::shared_ptr<DataType> ARROW_EXPORT
 struct_(const std::vector<std::shared_ptr<Field>>& fields);
 
-/// \brief Create a UnionType instance
-std::shared_ptr<DataType> ARROW_EXPORT
-union_(const std::vector<std::shared_ptr<Field>>& child_fields,

Review comment:
       I'm quickly restoring backward compatibility functions




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439539160



##########
File path: cpp/src/arrow/array/array_primitive.h
##########
@@ -108,11 +108,11 @@ class ARROW_EXPORT DayTimeIntervalArray : public PrimitiveArray {
   // For compatibility with Take kernel.
   TypeClass::DayMilliseconds GetView(int64_t i) const { return GetValue(i); }
 
-  int32_t byte_width() const { return sizeof(TypeClass::DayMilliseconds); }
-
   const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }
 
  protected:
+  static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }

Review comment:
       I also don't think this change belongs in this PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-640813632


   https://issues.apache.org/jira/browse/ARROW-8866


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7378:
URL: https://github.com/apache/arrow/pull/7378


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r436930007



##########
File path: cpp/src/arrow/c/bridge.cc
##########
@@ -1320,14 +1324,16 @@ struct ArrayImporter {
 
   Status Visit(const UnionType& type) {
     auto mode = type.mode();
-    RETURN_NOT_OK(CheckNumBuffers(3));
+    if (mode == UnionMode::SPARSE) {
+      RETURN_NOT_OK(CheckNumBuffers(2));
+    } else {
+      RETURN_NOT_OK(CheckNumBuffers(3));
+    }

Review comment:
       @pitrou

##########
File path: cpp/src/arrow/c/bridge.cc
##########
@@ -1320,14 +1324,16 @@ struct ArrayImporter {
 
   Status Visit(const UnionType& type) {
     auto mode = type.mode();
-    RETURN_NOT_OK(CheckNumBuffers(3));
+    if (mode == UnionMode::SPARSE) {
+      RETURN_NOT_OK(CheckNumBuffers(2));
+    } else {
+      RETURN_NOT_OK(CheckNumBuffers(3));
+    }

Review comment:
       I did not change the IPC format, which has always stored just two buffers in the sparse case. See https://github.com/apache/arrow/pull/7378/files#diff-78fdd845f659c6d03d17716eb6e27cebR316




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439480534



##########
File path: cpp/src/arrow/array/array_primitive.h
##########
@@ -108,11 +108,11 @@ class ARROW_EXPORT DayTimeIntervalArray : public PrimitiveArray {
   // For compatibility with Take kernel.
   TypeClass::DayMilliseconds GetView(int64_t i) const { return GetValue(i); }
 
-  int32_t byte_width() const { return sizeof(TypeClass::DayMilliseconds); }
-
   const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }
 
  protected:
+  static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }

Review comment:
       Well, it was public so could be used by anybody really.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#issuecomment-643386870


   @wesm thanks, I've been trying to get those to pass locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439477647



##########
File path: cpp/src/arrow/type.h
##########
@@ -1015,25 +1018,12 @@ class ARROW_EXPORT Decimal128Type : public DecimalType {
 /// \brief Concrete type class for union data
 class ARROW_EXPORT UnionType : public NestedType {
  public:
-  static constexpr Type::type type_id = Type::UNION;
   static constexpr int8_t kMaxTypeCode = 127;
   static constexpr int kInvalidChildId = -1;
 
-  static constexpr const char* type_name() { return "union"; }
-
-  UnionType(const std::vector<std::shared_ptr<Field>>& fields,

Review comment:
       I'm not sure what you mean; UnionType is an abstract base class now and should never be constructed except as a base object of SparseUnionType or DenseUnionType. If we want to maintain compatibility of code which constructs UnionType directly then we'll need to have a single concrete `UnionType` which corresponds to both `Type::SPARSE_UNION` and `Type::DENSE_UNION`. That's possible but very much not our pattern with `DataType` subclasses.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7378: ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7378:
URL: https://github.com/apache/arrow/pull/7378#discussion_r439478696



##########
File path: cpp/src/arrow/type_fwd.h
##########
@@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit);
 std::shared_ptr<DataType> ARROW_EXPORT
 struct_(const std::vector<std::shared_ptr<Field>>& fields);
 
-/// \brief Create a UnionType instance
-std::shared_ptr<DataType> ARROW_EXPORT
-union_(const std::vector<std::shared_ptr<Field>>& child_fields,

Review comment:
       This is more of a "don't break third-party code" concern.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org