You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/20 12:07:06 UTC
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6729: Update documentation for creating User Defined Aggregates (AggregateUDF)

alamb commented on code in PR #6729:
URL: https://github.com/apache/arrow-datafusion/pull/6729#discussion_r1235168106


##########
datafusion/core/src/lib.rs:
##########
@@ -17,9 +17,9 @@
 #![warn(missing_docs, clippy::needless_borrow)]
 
 //! [DataFusion] is an extensible query engine written in Rust that
-//! uses [Apache Arrow] as its in-memory format. DataFusion's [use
-//! cases] include building very fast database and analytic systems,
-//! customized to particular workloads.
+//! uses [Apache Arrow] as its in-memory format. DataFusion's many [use

Review Comment:
   this was a drive by cleanup as I pretended to be a new user navigating to the AggregateUDF page



##########
datafusion/expr/src/function.rs:
##########
@@ -42,7 +42,7 @@ pub type ReturnTypeFunction =
 
 /// Factory that returns an accumulator for the given aggregate, given
 /// its return datatype.
-pub type AccumulatorFunctionImplementation =
+pub type AccumulatorFactoryFunction =

Review Comment:
   This was just misleading, so I changed the name



##########
datafusion/expr/src/accumulator.rs:
##########
@@ -21,49 +21,161 @@ use arrow::array::ArrayRef;
 use datafusion_common::{DataFusionError, Result, ScalarValue};
 use std::fmt::Debug;
 
-/// Accumulates an aggregate's state.
+/// Describes an aggregate functions's state.
 ///
-/// `Accumulator`s are stateful objects that lives throughout the
+/// `Accumulator`s are stateful objects that live throughout the
 /// evaluation of multiple rows and aggregate multiple values together
 /// into a final output aggregate.
 ///
 /// An accumulator knows how to:
-/// * update its state from inputs via `update_batch`
-/// * (optionally) retract an update to its state from given inputs via `retract_batch`
-/// * convert its internal state to a vector of aggregate values
-/// * update its state from multiple accumulators' states via `merge_batch`
-/// * compute the final value from its internal state via `evaluate`
+/// * update its state from inputs via [`update_batch`]
+///
+/// * compute the final value from its internal state via [`evaluate`]
+///
+/// * retract an update to its state from given inputs via
+/// [`retract_batch`] (when used as a window aggregate [window
+/// function])
+///
+/// * convert its internal state to a vector of aggregate values via
+/// [`state`] and combine the state from multiple accumulators'
+/// via [`merge_batch`], as part of efficient multi-phase grouping.
+///
+/// [`update_batch`]: Self::update_batch
+/// [`retract_batch`]: Self::retract_batch
+/// [`state`]: Self::state
+/// [`evaluate`]: Self::evaluate
+/// [`merge_batch`]: Self::merge_batch
+/// [window function]: https://en.wikipedia.org/wiki/Window_function_(SQL)
 pub trait Accumulator: Send + Sync + Debug {
-    /// Returns the partial intermediate state of the accumulator. This
-    /// partial state is serialied as `Arrays` and then combined with
-    /// other partial states from different instances of this
-    /// accumulator (that ran on different partitions, for
-    /// example).
+    /// Updates the accumulator's state from its input.

Review Comment:
   The trait is the same -- I did reorderd the methods to be better grouped together by use, but the actual methods are the same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org