You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/30 07:27:05 UTC

[GitHub] [beam] iindyk opened a new pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

iindyk opened a new pull request #12420:
URL: https://github.com/apache/beam/pull/12420


   Extending functionality of ApproximateQuantiles PTransform to calculate quantiles of a data stream with custom weights. In case of weighted quantiles, input PCollection is expected to contain tuples of elements with their weight. 
   Also adding add_inputs function to the corresponding CombineFn for efficient processing of batched inputs.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [*] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [*] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/b
 eam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673663914


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on a change in pull request #12420:
URL: https://github.com/apache/beam/pull/12420#discussion_r468937788



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -398,8 +424,8 @@ class ApproximateQuantilesCombineFn(CombineFn):
   http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1
   &type=pdf
 
-  The default error bound is (1 / N), though in practice the accuracy
-  tends to be much better.
+  The default error bound is (1 / N) for uniformly distributed data and 1e-2 for

Review comment:
       I think N rarely exceeds 100 and most of the time is <10, so I changed to 1e-2 to increase accuracy for the weighted case. This was done to reflect the fact that, although we guarantee this error bound, the bound itself is on the weight concentrated between the returned value and the actual quantile. If the weights are uneven, then there may potentially be a lot of values between those. 
   But looking at it again, I think you're right and it would make more sense to set it to min(1e-2, 1/N).

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.

Review comment:
       Done. Although I wasn't able to find a pattern for examples (like for DocTest). Let me know if I should change the format




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673160703






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem merged pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem merged pull request #12420:
URL: https://github.com/apache/beam/pull/12420


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673720471


   thanks @iindyk 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673079130


   Was just looking for a way to do that, thanks a lot!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-671376299


   Friendly ping @pabloem 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673745786


   Thanks for the review @pabloem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-673078427


   the precommit tsts are currently broken on mainline, but also I think there's some docs issues in your pr. You can run `tox -e py38-docs` to run the docs tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-671634231


   Great, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12420:
URL: https://github.com/apache/beam/pull/12420#discussion_r468063036



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.

Review comment:
       Perhaps under the ApproximateQuantiles transform pydoc, rather than on these args (so it'll work for perKey and globally)? Up to you.

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -576,17 +633,34 @@ def _interpolate(self, i_buffers, count, step, offset):
     weighted_element = next(sorted_elem)
     current = weighted_element[1]
     j = 0
-    while j < count:
-      target = j * step + offset
-      j = j + 1
-      try:
-        while current <= target:
-          weighted_element = next(sorted_elem)
-          current = current + weighted_element[1]
-      except StopIteration:
-        pass
-      new_elements.append(weighted_element[0])
-    return new_elements
+    if self._weighted:

Review comment:
       I wonder if it's possible to do this without duplicating the code in both branches? I understand that separating branches may help performance, but it decreases on readability...  WDYT?

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.
       key: (optional) Key is  a mapping of elements to a comparable key, similar
         to the key argument of Python's sorting methods.
       reverse: (optional) whether to order things smallest to largest, rather
         than largest to smallest
     """
-    def __init__(self, num_quantiles, key=None, reverse=False):
+    def __init__(self, num_quantiles, weighted=False, key=None, reverse=False):

Review comment:
       Can we add weighted as the last argument? So we'll have a change that's backwards compatible

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.

Review comment:
       Perhaps show an example of this in a code snippet?

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -398,8 +424,8 @@ class ApproximateQuantilesCombineFn(CombineFn):
   http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1
   &type=pdf
 
-  The default error bound is (1 / N), though in practice the accuracy
-  tends to be much better.
+  The default error bound is (1 / N) for uniformly distributed data and 1e-2 for

Review comment:
       I don't know much math, so I wonder: How come the error bound for the approximation for weighted elements is not f(N)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk edited a comment on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk edited a comment on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-666062605


   R: @pabloem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on a change in pull request #12420:
URL: https://github.com/apache/beam/pull/12420#discussion_r468937924



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -576,17 +633,34 @@ def _interpolate(self, i_buffers, count, step, offset):
     weighted_element = next(sorted_elem)
     current = weighted_element[1]
     j = 0
-    while j < count:
-      target = j * step + offset
-      j = j + 1
-      try:
-        while current <= target:
-          weighted_element = next(sorted_elem)
-          current = current + weighted_element[1]
-      except StopIteration:
-        pass
-      new_elements.append(weighted_element[0])
-    return new_elements
+    if self._weighted:

Review comment:
       I tried to avoid affecting the performance of the existing codepath, but after a closer look I think the overhead caused by the if in the loop is negligible compared to the time taken by the inner loop and append. Merged the branches.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-671485307


   Ah yes, I had started doing the review. I'll wrap it up today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on a change in pull request #12420:
URL: https://github.com/apache/beam/pull/12420#discussion_r468937732



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.
       key: (optional) Key is  a mapping of elements to a comparable key, similar
         to the key argument of Python's sorting methods.
       reverse: (optional) whether to order things smallest to largest, rather
         than largest to smallest
     """
-    def __init__(self, num_quantiles, key=None, reverse=False):
+    def __init__(self, num_quantiles, weighted=False, key=None, reverse=False):

Review comment:
       Ah, yes, of course.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iindyk commented on pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Posted by GitBox <gi...@apache.org>.
iindyk commented on pull request #12420:
URL: https://github.com/apache/beam/pull/12420#issuecomment-666062605


   R: @chrisgorgo 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org