You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/14 18:55:54 UTC

[GitHub] [spark] amaliujia commented on a diff in pull request #39057: [SPARK-41513][SQL] Implement an accumulator to collect per mapper row count metrics

amaliujia commented on code in PR #39057:
URL: https://github.com/apache/spark/pull/39057#discussion_r1048850181


##########
core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:
##########
@@ -513,3 +513,81 @@ class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] {
     getOrCreate.addAll(newValue)
   }
 }
+
+
+/**
+ * An [[AccumulatorV2 counter]] for collecting a list of (mapper id, row count).
+ *
+ * @since 3.4.0
+ */
+class MapperRowCounter extends AccumulatorV2[jl.Long, java.util.List[java.util.List[jl.Long]]] {
+
+  private var _agg: java.util.List[java.util.List[jl.Long]] = _

Review Comment:
   Which of the following that you are thinking:
   1. a list of integers where every two forms a pair for partition id and its row count?
   2. a list of integers that the index is the mapper/partition id and the value is the row count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org