You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "js8544 (via GitHub)" <gi...@apache.org> on 2023/06/10 08:06:23 UTC

[GitHub] [arrow] js8544 commented on a diff in pull request #36020: GH-32190: [C++][Compute] Implement cumulative prod, max and min functions

js8544 commented on code in PR #36020:
URL: https://github.com/apache/arrow/pull/36020#discussion_r1225201776


##########
cpp/src/arrow/compute/api_vector.h:
##########
@@ -210,21 +210,29 @@ class ARROW_EXPORT PartitionNthOptions : public FunctionOptions {
   NullPlacement null_placement;
 };
 
-/// \brief Options for cumulative sum function
-class ARROW_EXPORT CumulativeSumOptions : public FunctionOptions {
+/// \brief Options for cumulative functions
+/// \note Also aliased as CumulativeSumOptions for backward compatibility
+class ARROW_EXPORT CumulativeOptions : public FunctionOptions {
  public:
-  explicit CumulativeSumOptions(double start = 0, bool skip_nulls = false);
-  explicit CumulativeSumOptions(std::shared_ptr<Scalar> start, bool skip_nulls = false);
-  static constexpr char const kTypeName[] = "CumulativeSumOptions";
-  static CumulativeSumOptions Defaults() { return CumulativeSumOptions(); }
-
-  /// Optional starting value for cumulative operation computation
-  std::shared_ptr<Scalar> start;
+  explicit CumulativeOptions(bool skip_nulls = false);
+  explicit CumulativeOptions(double start, bool skip_nulls = false);
+  explicit CumulativeOptions(std::shared_ptr<Scalar> start, bool skip_nulls = false);
+  static constexpr char const kTypeName[] = "CumulativeOptions";
+  static CumulativeOptions Defaults() { return CumulativeOptions(); }
+
+  /// Optional starting value for cumulative operation computation, default depends on the
+  /// operation and input type.
+  /// - sum: 0
+  /// - prod: 1
+  /// - min: maximum of the input type
+  /// - max: minimum of the input type
+  std::optional<std::shared_ptr<Scalar>> start;

Review Comment:
   The current implementation defaults `start` to a DoubleScalar with value zero. This is undesirable because:
   1. An unnecessary `Cast` needs to happen if input type is not double.
   2. For min and max we can't determine the default start value before input type is known.
   So I changed this to an optional. If it is not set, the kernel will determine the start value at initialization. It's implemented with templates so there is zero cost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org