You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/06/16 06:50:13 UTC

[GitHub] [beam] robinyqiu commented on a change in pull request #11855: [BEAM-10005] | combinefn for ApproximateQuantiles and ApproximateUnique

robinyqiu commented on a change in pull request #11855:
URL: https://github.com/apache/beam/pull/11855#discussion_r440609700



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
##########
@@ -54,7 +57,9 @@
  *     input.apply(HllCount.Init.forStrings().globally()).apply(HllCount.Extract.globally());
  * }</pre>
  *
- * For more details about using {@code HllCount} and the {@code zetasketch} extension module, see
+ * <p>{@link #combineFn} can also be used manually, with the {@link Combine} transform.

Review comment:
       Please add this new line of comment to the end of the block, because the next line is tied to the previous example.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
##########
@@ -54,12 +56,39 @@
 /**
  * {@code PTransform}s for getting an idea of a {@code PCollection}'s data distribution using
  * approximate {@code N}-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.
+ *
+ * <p>{@link #combineFn} can also be used manually, with the {@link Combine} transform.
  */
 public class ApproximateQuantiles {
   private ApproximateQuantiles() {
     // do not instantiate
   }
 
+  /**
+   * Returns a {@link CombineFn} that gives an idea of the distribution of a collection of values
+   * using approximate {@code N}-tiles.
+   *
+   * @param <T> the type of the elements in the input {@code PCollection}
+   * @param numQuantiles the number of elements in the resulting quantile values {@code List}
+   * @param compareFn the function to use to order the elements
+   */
+  public static <T, ComparatorT extends Comparator<T> & Serializable>
+      CombineFn<T, ?, List<T>> combineFn(int numQuantiles, ComparatorT compareFn) {
+    checkArgument(compareFn != null, "compareFn should not be null");

Review comment:
       Nit: you can use `checkNotNull` instead

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
##########
@@ -108,6 +120,38 @@
     return new Globally<>(maximumEstimationError);
   }
 
+  /**
+   * Returns a {@link Combine.Globally} that gives a single value that is an estimate of the number
+   * of distinct elements in the input {@code PCollection}.
+   *
+   * @param <T> the type of the elements in the input {@code PCollection}
+   * @param sampleSize the number of entries in the statistical sample; the higher this number, the
+   *     more accurate the estimate will be; should be {@code >= 16}
+   * @param inputCoder the coder for {@code PCollection<T>} where the combineFn will be applied on.
+   * @throws IllegalArgumentException if the {@code sampleSize} argument is too small
+   */
+  public static <T> Combine.Globally<T, Long> combineFn(int sampleSize, Coder<T> inputCoder) {

Review comment:
       The `combineFn()` function here does not actually return the `combineFn` (as you did in `ApproximateQuantile`), please make a change here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org