You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 17:11:24 UTC

[GitHub] [arrow] lidavidm opened a new pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

lidavidm opened a new pull request #12000:
URL: https://github.com/apache/arrow/pull/12000


   This expands UnifySchemas to handle more types, e.g. int32 + float64 = float64, by adding flags for various types.
   
   The flags here may be overly fine-grained.
   
   Not implemented here:
   - Confirming that all these various casts exist
   - Adding the bindings to the Python datasets module (would need PR #8912 first)
   - Binding individual flags in Python instead of all-or-nothing
   - Adding the bindings to the R datasets bindings


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#discussion_r774603800



##########
File path: cpp/src/arrow/type.cc
##########
@@ -275,6 +256,429 @@ std::shared_ptr<Field> Field::WithNullable(const bool nullable) const {
   return std::make_shared<Field>(name_, type_, nullable, metadata_);
 }
 
+Field::MergeOptions Field::MergeOptions::Permissive() {
+  MergeOptions options = Defaults();
+  options.promote_nullability = true;
+  options.promote_numeric_width = true;
+  options.promote_integer_float = true;
+  options.promote_integer_decimal = true;
+  options.promote_decimal_float = true;
+  options.promote_date = true;
+  options.promote_time = true;
+  options.promote_duration = true;
+  options.promote_timestamp = true;
+  options.promote_nested = true;
+  options.promote_dictionary = true;
+  options.promote_integer_sign = true;
+  options.promote_large = true;
+  options.promote_binary = true;
+  return options;

Review comment:
       For reading clarity, can we list here the ones that are still left to False in a comment? (eg `promote_dictionary_ordered` is still False, which I think is good)
   
   `promote_decimal` should probably be added as true?

##########
File path: cpp/src/arrow/type.h
##########
@@ -303,14 +303,73 @@ class ARROW_EXPORT Field : public detail::Fingerprintable {
   /// \brief Options that control the behavior of `MergeWith`.
   /// Options are to be added to allow type conversions, including integer
   /// widening, promotion from integer to float, or conversion to or from boolean.
-  struct MergeOptions {
+  struct ARROW_EXPORT MergeOptions : public util::ToStringOstreamable<MergeOptions> {
     /// If true, a Field of NullType can be unified with a Field of another type.
     /// The unified field will be of the other type and become nullable.
     /// Nullability will be promoted to the looser option (nullable if one is not
     /// nullable).
     bool promote_nullability = true;
 
+    /// Allow a decimal to be unified with another decimal of the same
+    /// width, adjusting scale and precision as appropriate. May fail
+    /// if the adjustment is not possible.
+    bool promote_decimal = false;
+
+    /// Allow a decimal to be promoted to a float. The float type will
+    /// not itself be promoted (e.g. Decimal128 + Float32 = Float32).
+    bool promote_decimal_float = false;
+
+    /// Allow an integer of a given bit width to be promoted to a
+    /// float of an equal or greater bit width.

Review comment:
       Just to be clear, that would mean that int64 cannot be unified with float32? (since that bitwidth is not "equal or greater") Or that it would be unified to float64? 
   
   For reference, numpy promotes that specific case to float64.

##########
File path: cpp/src/arrow/type.h
##########
@@ -303,14 +303,73 @@ class ARROW_EXPORT Field : public detail::Fingerprintable {
   /// \brief Options that control the behavior of `MergeWith`.
   /// Options are to be added to allow type conversions, including integer
   /// widening, promotion from integer to float, or conversion to or from boolean.
-  struct MergeOptions {
+  struct ARROW_EXPORT MergeOptions : public util::ToStringOstreamable<MergeOptions> {
     /// If true, a Field of NullType can be unified with a Field of another type.
     /// The unified field will be of the other type and become nullable.
     /// Nullability will be promoted to the looser option (nullable if one is not
     /// nullable).
     bool promote_nullability = true;
 
+    /// Allow a decimal to be unified with another decimal of the same
+    /// width, adjusting scale and precision as appropriate. May fail
+    /// if the adjustment is not possible.
+    bool promote_decimal = false;
+
+    /// Allow a decimal to be promoted to a float. The float type will
+    /// not itself be promoted (e.g. Decimal128 + Float32 = Float32).
+    bool promote_decimal_float = false;
+
+    /// Allow an integer of a given bit width to be promoted to a
+    /// float of an equal or greater bit width.
+    bool promote_integer_float = false;
+
+    /// Allow an unsigned integer of a given bit width to be promoted
+    /// to a signed integer of the same bit width.

Review comment:
       .. or greater bit width?
   
   Eg in numpy uint32 + int64 -> int64




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#issuecomment-1002168092


   Corrected the doc comments.
   
   For concat_tables, this requires us to have access to Cast() inside table.cc, so we need ARROW-8891 or some #ifdef'ing to implement this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#discussion_r775957592



##########
File path: cpp/src/arrow/type.h
##########
@@ -303,14 +303,73 @@ class ARROW_EXPORT Field : public detail::Fingerprintable {
   /// \brief Options that control the behavior of `MergeWith`.
   /// Options are to be added to allow type conversions, including integer
   /// widening, promotion from integer to float, or conversion to or from boolean.
-  struct MergeOptions {
+  struct ARROW_EXPORT MergeOptions : public util::ToStringOstreamable<MergeOptions> {
     /// If true, a Field of NullType can be unified with a Field of another type.
     /// The unified field will be of the other type and become nullable.
     /// Nullability will be promoted to the looser option (nullable if one is not
     /// nullable).
     bool promote_nullability = true;
 
+    /// Allow a decimal to be unified with another decimal of the same
+    /// width, adjusting scale and precision as appropriate. May fail
+    /// if the adjustment is not possible.
+    bool promote_decimal = false;
+
+    /// Allow a decimal to be promoted to a float. The float type will
+    /// not itself be promoted (e.g. Decimal128 + Float32 = Float32).
+    bool promote_decimal_float = false;
+
+    /// Allow an integer of a given bit width to be promoted to a
+    /// float of an equal or greater bit width.

Review comment:
       Oh wait, I'm wrong (awkward). Yes, int64 + float32 = float64. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#discussion_r774614012



##########
File path: cpp/src/arrow/type.cc
##########
@@ -275,6 +256,429 @@ std::shared_ptr<Field> Field::WithNullable(const bool nullable) const {
   return std::make_shared<Field>(name_, type_, nullable, metadata_);
 }
 
+Field::MergeOptions Field::MergeOptions::Permissive() {
+  MergeOptions options = Defaults();
+  options.promote_nullability = true;
+  options.promote_numeric_width = true;
+  options.promote_integer_float = true;
+  options.promote_integer_decimal = true;
+  options.promote_decimal_float = true;
+  options.promote_date = true;
+  options.promote_time = true;
+  options.promote_duration = true;
+  options.promote_timestamp = true;
+  options.promote_nested = true;
+  options.promote_dictionary = true;
+  options.promote_integer_sign = true;
+  options.promote_large = true;
+  options.promote_binary = true;
+  return options;

Review comment:
       Whoops, I didn't update this after adding some of the additional options. I'll document what's not enabled as you suggest, thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#discussion_r774613715



##########
File path: cpp/src/arrow/type.h
##########
@@ -303,14 +303,73 @@ class ARROW_EXPORT Field : public detail::Fingerprintable {
   /// \brief Options that control the behavior of `MergeWith`.
   /// Options are to be added to allow type conversions, including integer
   /// widening, promotion from integer to float, or conversion to or from boolean.
-  struct MergeOptions {
+  struct ARROW_EXPORT MergeOptions : public util::ToStringOstreamable<MergeOptions> {
     /// If true, a Field of NullType can be unified with a Field of another type.
     /// The unified field will be of the other type and become nullable.
     /// Nullability will be promoted to the looser option (nullable if one is not
     /// nullable).
     bool promote_nullability = true;
 
+    /// Allow a decimal to be unified with another decimal of the same
+    /// width, adjusting scale and precision as appropriate. May fail
+    /// if the adjustment is not possible.
+    bool promote_decimal = false;
+
+    /// Allow a decimal to be promoted to a float. The float type will
+    /// not itself be promoted (e.g. Decimal128 + Float32 = Float32).
+    bool promote_decimal_float = false;
+
+    /// Allow an integer of a given bit width to be promoted to a
+    /// float of an equal or greater bit width.

Review comment:
       As implemented, you need also `promote_numeric_width` to unify int64 and float32 (to float64).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#issuecomment-998117030


   https://issues.apache.org/jira/browse/ARROW-14705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12000: ARROW-14705: [C++][Python] Handle more types in UnifySchemas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12000:
URL: https://github.com/apache/arrow/pull/12000#issuecomment-1008852544


   I've used some #ifdefs so that ConcatenateTables can cast columns as necessary now. I still have some TODOs listed up top before I can undraft this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org