You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/06 18:17:38 UTC

[GitHub] [arrow] lidavidm opened a new pull request #10895: ARROW-13574: [C++] Add count_all and hash_count_all kernels

lidavidm opened a new pull request #10895:
URL: https://github.com/apache/arrow/pull/10895


   Pandas calls this 'size'. I opted not to extend ScalarAggregateOptions as the option wouldn't apply to any other kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10895: ARROW-13574: [C++] Add count_all and hash_count_all kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10895:
URL: https://github.com/apache/arrow/pull/10895#issuecomment-894435744


   https://issues.apache.org/jira/browse/ARROW-13574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10895: ARROW-13574: [C++] Add count_all and hash_count_all kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10895:
URL: https://github.com/apache/arrow/pull/10895#issuecomment-896158213


   Rebased, fixed conflicts, added CountOptions, added GLib/Python/R bindings for CountOptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10895: ARROW-13574: [C++] Add 'count all' option to count kernels

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10895:
URL: https://github.com/apache/arrow/pull/10895


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10895: ARROW-13574: [C++] Add count_all and hash_count_all kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10895:
URL: https://github.com/apache/arrow/pull/10895#issuecomment-895376223


   Hmm... It's weird that `Count` takes `ScalarAggregateOptions`, yet 1) completely ignores `min_count` 2) has an unexpected interpretation of `skip_nulls`.
   
   Can we instead make `Count` take a `CountOptions` that would have a enum describing whether one wants to 1) count non-nulls 2) count nulls 3) count everything?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10895: ARROW-13574: [C++] Add 'count all' option to count kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10895:
URL: https://github.com/apache/arrow/pull/10895#discussion_r686223277



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -82,23 +82,31 @@ struct CountImpl : public ScalarAggregator {
 
   Status Finalize(KernelContext* ctx, Datum* out) override {
     const auto& state = checked_cast<const CountImpl&>(*ctx->state());
-    if (state.options.skip_nulls) {
-      *out = Datum(state.non_nulls);
-    } else {
-      *out = Datum(state.nulls);
+    switch (state.options.mode) {
+      case CountOptions::NON_NULL:
+        *out = Datum(state.non_nulls);
+        break;
+      case CountOptions::NULLS:
+        *out = Datum(state.nulls);
+        break;
+      case CountOptions::ALL:
+        *out = Datum(state.non_nulls + state.nulls);

Review comment:
       In this case, it's probably not necessary to popcount the null bitmap?

##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -53,6 +53,26 @@ class ARROW_EXPORT ScalarAggregateOptions : public FunctionOptions {
   uint32_t min_count;
 };
 
+/// \brief Control count aggregate kernel behavior.
+///
+/// By default, only non-null values are counted.
+class ARROW_EXPORT CountOptions : public FunctionOptions {
+ public:
+  enum CountMode {
+    /// Count only non-null values.
+    NON_NULL = 0,
+    /// Count only null values.
+    NULLS,

Review comment:
       Nit, but the plural / singular is a bit inconsistent. Perhaps ONLY_NULL vs ONLY_VALID (or NULLS vs NON_NULLS, or NULL_VALUES vs VALID_VALUES...)?

##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -562,7 +570,7 @@ const FunctionDoc count_doc{"Count the number of null / non-null values",
                             ("By default, only non-null values are counted.\n"
                              "This can be changed through ScalarAggregateOptions."),

Review comment:
       "CountOptions"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org