You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/09/02 02:21:11 UTC

[GitHub] [beam] monicadsong opened a new pull request #12756: [BEAM-10824] Change hash function in ApproximateUniqueCombineFn

monicadsong opened a new pull request #12756:
URL: https://github.com/apache/beam/pull/12756


   Replace python built-in hash() function with hashlib md5, which is deterministic.
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   ![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg)
   ![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg)
   ![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485223986



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       Can we parameterize the test to exercise both md5/mmh3 codepaths. We could force md5 by a patch for test purposes if mmh3 is installed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483158065



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       Ah, that was because `python hash()` can return negative values. So yes, I think we need to set `self._min_hash = 2**64` given that now all hash sizes are positive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `25.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53783     +114     
   ==========================================
   + Hits        21587    21656      +69     
   - Misses      32082    32127      +45     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `39.22% <0.00%> (+0.94%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fe05782](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...4910b8e](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...70c1b75](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a83c0db](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485976987



##########
File path: sdks/python/container/base_image_requirements.txt
##########
@@ -57,6 +57,7 @@ google-cloud-datastore==1.7.4
 cython==0.29.13
 guppy==0.1.11;python_version<="2.7"
 guppy3==3.0.9;python_version>="3.5"
+mmh3>=2.5.1,<3.0

Review comment:
       For this file, you can pick a specific version instead of a range.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] Hannah-Jiang commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
Hannah-Jiang commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483374887



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       A fixed dataset was used because random dataset cannot always pass the test even with retries and introduces flakiness. Does it always pass now? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `25.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.23%   40.26%   +0.02%     
   ==========================================
     Files         455      455              
     Lines       53729    53840     +111     
   ==========================================
   + Hits        21620    21678      +58     
   - Misses      32109    32162      +53     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [sdks/python/apache\_beam/pipeline.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcGlwZWxpbmUucHk=) | `24.23% <0.00%> (-0.05%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `39.22% <0.00%> (+0.94%)` | :arrow_up: |
   | [...s/python/apache\_beam/testing/synthetic\_pipeline.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9zeW50aGV0aWNfcGlwZWxpbmUucHk=) | `23.45% <0.00%> (+2.52%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...25bc7b4](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...175aaec](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn merged pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn merged pull request #12756:
URL: https://github.com/apache/beam/pull/12756






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483215632



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       Modified it to 2^63 - 1 (which is also equivalent to sys.maxsize... which is why my tests were passing.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483215196



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,62 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):
+    # test if the combiner throws an error with a nondeterministic coder.
+    sample_size = 30
+    coder = coders.Base64PickleCoder()
+
+    with self.assertRaises(ValueError) as e:
+      _ = ApproximateUniqueCombineFn(sample_size, coder)
+
+    self.assertRegex(
+        e.exception.args[0],
+        'The key coder "Base64PickleCoder" '
+        'for ApproximateUniqueCombineFn is not deterministic.')
+
+  def test_approximate_unique_combine_fn_by_wrong_coder(self):

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...90ab1bf](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485860554



##########
File path: sdks/python/setup.py
##########
@@ -165,6 +165,7 @@ def get_version():
     'requests>=2.24.0,<3.0.0',
     'typing>=3.7.0,<3.8.0; python_full_version < "3.5.3"',
     'typing-extensions>=3.7.0,<3.8.0',
+    'mmh3>=2.5.1,<2.5.2',

Review comment:
       It is better to avoid adding a new dependency if possible especially a one with platform specific differences. If we need to add it, it is better to make it optional similar to snappy.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485858771



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       I agree with Hannah. HLL would be the right solution and given that py3 has a better hash function adding this dependency could be avoided.
   
   What is the specific issue we are addressing by this change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a168792](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485862569



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       py3 hash is non-deterministic, see BEAM-10824 and internal issue b/166646014 which has a bit more details. 
   We should not be using py3 hash function, and I support this change. Current change supports both md5 and mmh3, but we have a remanining question whether mmh3 should be a default dep (better performance than md5, but extra dependency), or optional (default to md5, give a warning recommendation to install mmh3, similar to snappy).
   
   HLL is not available for Beam Python yet, is it? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485983860



##########
File path: sdks/python/container/license_scripts/dep_urls_py.yaml
##########
@@ -77,8 +77,6 @@ pip_dependencies:
     license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE"
   httplib2:
     license: "https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE"
-  mmh3:
-    license: "https://raw.githubusercontent.com/hajimes/mmh3/master/LICENSE"

Review comment:
       so pip-licences tool is able to automatically fetch the license for mmh3? If so, great! If not, we will need this line, since we will ship mmh3 in our containers. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44373    +22753     
   + Misses      32109     9713    -22396     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fa23439](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a017d6b](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685895179


   @tvalentyn-- I can't seem to pass the codecov/patch report, but have added unit tests for the combiner that should go through the paths that are highlighted. Also,I think the other 3 failing tests failing for other merged PR's as well. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485875491



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       sounds good thanks! more info on py3 hash here: https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a83c0db](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r482261286



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,51 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):
+    # test if the combiner throws an error with a nondeterministic coder.
+    sample_size = 30
+    coder = coders.Base64PickleCoder()
+
+    with self.assertRaises(ValueError) as e:
+      _ = ApproximateUniqueCombineFn(sample_size, coder)
+
+    self.assertRegex(
+        e.exception.args[0],
+        'The key coder "Base64PickleCoder" '
+        'for ApproximateUniqueCombineFn is not deterministic.')
+
+  def test_approximate_unique_combine_fn_add_values(self):

Review comment:
       nit: you could call this: test_approximate_unique_combine_fn_adds_values_correctly instead of adding a comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...32f6319](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...70c1b75](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] Hannah-Jiang commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
Hannah-Jiang commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485025528



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       Ah, I miss understood your above comment. Adding the seed looks good to me. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fa1f8d3](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...2bb0088](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...5d77f3c](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a017d6b](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn merged pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn merged pull request #12756:
URL: https://github.com/apache/beam/pull/12756


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-690811376


   thank you all for reviewing :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...74645b1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...5d77f3c](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r484681387



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       hmm not sure I understand... i believe that adding a random seed should eliminate all randomness in the generation and shuffling of the test data, such that every test run uses the same data? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong closed pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong closed pull request #12756:
URL: https://github.com/apache/beam/pull/12756


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...59743f8](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] Hannah-Jiang commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
Hannah-Jiang commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485276327



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       Here is a background of mmh3 in case it helps for decision making. I used mmh3 at first but decided to use default hash function. mmh3 has better prediction performance, but it introduces much more complexity for dataflow import, and users should use HLL if they really care about the accuracy. In addition, the default hash function improved for Py3 and the performance diff was not big enough to use mmh3 and introduce the complexity. This is the reason mmh3 was not used, and it is just for reference.  
   I think it’s better to mention that HLL is supported by Beam. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.25%   +0.03%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53798     +129     
   ==========================================
   + Hits        21587    21659      +72     
   - Misses      32082    32139      +57     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...8de1e4d](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong closed pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong closed pull request #12756:
URL: https://github.com/apache/beam/pull/12756


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.25%   +0.03%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53798     +129     
   ==========================================
   + Hits        21587    21659      +72     
   - Misses      32082    32139      +57     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...7d34751](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a83c0db](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...cc30e8a](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485860383



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       hash is non deterministic, so if the computation is distributed over different machines, the estimates are overly inaccurate




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485855090



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       We can still do Dataflow import with a new dependency, AFAIK we noticed that Google internal-only tests for Beam on Windows were failing, so we thought that adding mmh3 would be a concern for all Beam Windows users. I no longer see Windows failures in precommit tests running on the PR.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483384943



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       yes, all tests pass in a single run, always




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685920989


   Also, here are the accuracy stats for the hashlib md5 fingerprint for a dataset with 3B examples with expected error set to 0.01: 
   ![Screen Shot 2020-09-02 at 11 29 32 AM](https://user-images.githubusercontent.com/17239878/92022374-05c0c300-ed10-11ea-9bcf-9ce656bd4b6a.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685895179






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485984077



##########
File path: sdks/python/container/license_scripts/dep_urls_py.yaml
##########
@@ -77,8 +77,6 @@ pip_dependencies:
     license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE"
   httplib2:
     license: "https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE"
-  mmh3:

Review comment:
       Actually should we keep this since it will be in the container?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r486076366



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,355 +41,88 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
+try:
+  import mmh3
+  mmh3_options = [(mmh3, ), (None, )]
+except ImportError:
+  mmh3_options = [(None, )]
 
-class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
-  def test_approximate_unique_global_by_invalid_size(self):
-    # test if the transformation throws an error as expected with an invalid
-    # small input size (< 16).
-    sample_size = 10
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            |
-            'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-    expected_msg = beam.ApproximateUnique._INPUT_SIZE_ERR_MSG % (sample_size)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_type_size(self):
-    # test if the transformation throws an error as expected with an invalid
-    # type of input size (not int).
-    sample_size = 100.0
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            |
-            'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-    expected_msg = beam.ApproximateUnique._INPUT_SIZE_ERR_MSG % (sample_size)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_small_error(self):
-    # test if the transformation throws an error as expected with an invalid
-    # small input error (< 0.01).
-    est_err = 0.0
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(error=est_err))
-
-    expected_msg = beam.ApproximateUnique._INPUT_ERROR_ERR_MSG % (est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_big_error(self):
-    # test if the transformation throws an error as expected with an invalid
-    # big input error (> 0.50).
-    est_err = 0.6
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(error=est_err))
-
-    expected_msg = beam.ApproximateUnique._INPUT_ERROR_ERR_MSG % (est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_no_input(self):
-    # test if the transformation throws an error as expected with no input.
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally())
-
-    expected_msg = beam.ApproximateUnique._NO_VALUE_ERR_MSG
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_both_input(self):
-    # test if the transformation throws an error as expected with multi input.
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-    est_err = 0.2
-    sample_size = 30
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(
-                size=sample_size, error=est_err))
-
-    expected_msg = beam.ApproximateUnique._MULTI_VALUE_ERR_MSG % (
-        sample_size, est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_get_sample_size_from_est_error(self):
-    # test if get correct sample size from input error.
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.5) == 16
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.4) == 25
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.2) == 100
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.1) == 400
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
-
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
-  def test_approximate_unique_global_by_sample_size(self):
-    # test if estimation error with a given sample size is not greater than
-    # expected max error.
-    sample_size = 16
-    max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
-    actual_count = len(set(test_input))
-
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size)
-          | 'compare' >> beam.FlatMap(
-              lambda x: [abs(x - actual_count) * 1.0 / actual_count <= max_err])
-      )
-
-      assert_that(result, equal_to([True]), label='assert:global_by_size')
-
-  @retry(reraise=True, stop=stop_after_attempt(5))
-  def test_approximate_unique_global_by_sample_size_with_duplicates(self):
-    # test if estimation error with a given sample size is not greater than
-    # expected max error with duplicated input.
-    sample_size = 30
-    max_err = 2 / math.sqrt(sample_size)
-    test_input = [10] * 50 + [20] * 50
-    actual_count = len(set(test_input))
-
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size)
-          | 'compare' >> beam.FlatMap(
-              lambda x: [abs(x - actual_count) * 1.0 / actual_count <= max_err])
-      )
-
-      assert_that(
-          result,
-          equal_to([True]),
-          label='assert:global_by_size_with_duplicates')
-
-  @retry(reraise=True, stop=stop_after_attempt(5))
-  def test_approximate_unique_global_by_sample_size_with_small_population(self):
-    # test if estimation is exactly same to actual value when sample size is
-    # not smaller than population size (sample size > 100% of population).
-    sample_size = 31
-    test_input = [
-        144,
-        160,
-        229,
-        923,
-        390,
-        756,
-        674,
-        769,
-        145,
-        888,
-        809,
-        159,
-        222,
-        101,
-        943,
-        901,
-        876,
-        194,
-        232,
-        631,
-        221,
-        829,
-        965,
-        729,
-        35,
-        33,
-        115,
-        894,
-        827,
-        364
-    ]
-    actual_count = len(set(test_input))
 
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-      assert_that(
-          result,
-          equal_to([actual_count]),
-          label='assert:global_by_sample_size_with_small_population')
-
-  @unittest.skip(
-      'Skip because hash function is not good enough. '
-      'TODO: BEAM-7654')
-  def test_approximate_unique_global_by_error(self):
-    # test if estimation error from input error is not greater than input error.
-    est_err = 0.3
-    test_input = [
-        291,
-        371,
-        271,
-        126,
-        762,
-        391,
-        222,
-        565,
-        428,
-        786,
-        801,
-        867,
-        337,
-        690,
-        261,
-        436,
-        311,
-        568,
-        946,
-        722,
-        973,
-        386,
-        506,
-        546,
-        991,
-        450,
-        226,
-        889,
-        514,
-        693
-    ]
+@parameterized_class(('sys.modules[\'mmh3\']', ), mmh3_options)
+class ApproximateUniqueTest(unittest.TestCase):
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.
+  """
+  random.seed(0)
+  sys.modules['mmh3'] = None

Review comment:
       that was deliberate, but it bothered me as well. done. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3256299](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r482266705



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,51 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):
+    # test if the combiner throws an error with a nondeterministic coder.
+    sample_size = 30
+    coder = coders.Base64PickleCoder()
+
+    with self.assertRaises(ValueError) as e:
+      _ = ApproximateUniqueCombineFn(sample_size, coder)
+
+    self.assertRegex(
+        e.exception.args[0],
+        'The key coder "Base64PickleCoder" '
+        'for ApproximateUniqueCombineFn is not deterministic.')
+
+  def test_approximate_unique_combine_fn_add_values(self):

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.25%   +0.03%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53798     +129     
   ==========================================
   + Hits        21587    21659      +72     
   - Misses      32082    32139      +57     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...7d34751](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a017d6b](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3256299](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485853260



##########
File path: sdks/python/setup.py
##########
@@ -165,6 +165,7 @@ def get_version():
     'requests>=2.24.0,<3.0.0',
     'typing>=3.7.0,<3.8.0; python_full_version < "3.5.3"',
     'typing-extensions>=3.7.0,<3.8.0',
+    'mmh3>=2.5.1,<2.5.2',

Review comment:
       (I thought I already added this comment here but for some reason I don't see it...)
   Let's make the upper bound more flexible `mmh3>=2.5.1,<3.0`, or remove the obligatory dependency on mmh3 as we do for snappy.  @aaltay do you have a preference on this?
   For the record, Windows tests on the PR are passing. AFAIK, previously we didn't add a dep on mmh3 we observed installation errors on Google internal Windows test. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483159262



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       Valentyn-- you're right, forgot to update self._HASH_SPACE_SIZE. I think originally pythons hash() can return negative value, so the total hash space ranges from (- sys.maxsize,  sys.maxsize). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-690811376


   thank you all for reviewing :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fa23439](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...ea1231a](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...74645b1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483158065



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       Ah, that was because python's  `hash()` can return negative values. So yes, I think we need to set `self._min_hash = 2**64` given that now all hash values are positive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483113006



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       I think we need to update the initialization of `self._min_hash`. 
   
   @Hannah-Jiang do you by chance remember what was the logic behind setting
    `  _HASH_SPACE_SIZE = 2.0 * sys.maxsize`
    `self._min_hash = sys.maxsize` ?
   
   Shouldn't we initialize `self._min_hash` with `self._HASH_SPACE_SIZE`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-688975853


   ptal! i've added an optional import for mmh3, since mmh3 is significantly faster than md5. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483158065



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -192,7 +193,7 @@ def get_estimate(self):
     if len(self._sample_heap) < self._sample_size:
       return len(self._sample_heap)
     else:
-      sample_space_size = sys.maxsize - 1.0 * self._min_hash
+      sample_space_size = self._HASH_SPACE_SIZE - 1.0 * self._min_hash

Review comment:
       Ah, that was because python's  `hash()` can return negative values. So yes, I think we need to set `self._min_hash = 2**64` given that now all hash sizes are positive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485985426



##########
File path: sdks/python/container/license_scripts/dep_urls_py.yaml
##########
@@ -77,8 +77,6 @@ pip_dependencies:
     license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE"
   httplib2:
     license: "https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE"
-  mmh3:

Review comment:
       oh whoops, will add that back in




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485982785



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,355 +41,88 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
+try:
+  import mmh3
+  mmh3_options = [(mmh3, ), (None, )]
+except ImportError:
+  mmh3_options = [(None, )]
 
-class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
-  def test_approximate_unique_global_by_invalid_size(self):
-    # test if the transformation throws an error as expected with an invalid
-    # small input size (< 16).
-    sample_size = 10
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            |
-            'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-    expected_msg = beam.ApproximateUnique._INPUT_SIZE_ERR_MSG % (sample_size)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_type_size(self):
-    # test if the transformation throws an error as expected with an invalid
-    # type of input size (not int).
-    sample_size = 100.0
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            |
-            'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-    expected_msg = beam.ApproximateUnique._INPUT_SIZE_ERR_MSG % (sample_size)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_small_error(self):
-    # test if the transformation throws an error as expected with an invalid
-    # small input error (< 0.01).
-    est_err = 0.0
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(error=est_err))
-
-    expected_msg = beam.ApproximateUnique._INPUT_ERROR_ERR_MSG % (est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_big_error(self):
-    # test if the transformation throws an error as expected with an invalid
-    # big input error (> 0.50).
-    est_err = 0.6
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(error=est_err))
-
-    expected_msg = beam.ApproximateUnique._INPUT_ERROR_ERR_MSG % (est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_no_input(self):
-    # test if the transformation throws an error as expected with no input.
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally())
-
-    expected_msg = beam.ApproximateUnique._NO_VALUE_ERR_MSG
-    assert e.exception.args[0] == expected_msg
-
-  def test_approximate_unique_global_by_invalid_both_input(self):
-    # test if the transformation throws an error as expected with multi input.
-    test_input = [random.randint(0, 1000) for _ in range(100)]
-    est_err = 0.2
-    sample_size = 30
-
-    with self.assertRaises(ValueError) as e:
-      with TestPipeline() as pipeline:
-        _ = (
-            pipeline
-            | 'create' >> beam.Create(test_input)
-            | 'get_estimate' >> beam.ApproximateUnique.Globally(
-                size=sample_size, error=est_err))
-
-    expected_msg = beam.ApproximateUnique._MULTI_VALUE_ERR_MSG % (
-        sample_size, est_err)
-
-    assert e.exception.args[0] == expected_msg
-
-  def test_get_sample_size_from_est_error(self):
-    # test if get correct sample size from input error.
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.5) == 16
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.4) == 25
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.2) == 100
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.1) == 400
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
-    assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
-
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
-  def test_approximate_unique_global_by_sample_size(self):
-    # test if estimation error with a given sample size is not greater than
-    # expected max error.
-    sample_size = 16
-    max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
-    actual_count = len(set(test_input))
-
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size)
-          | 'compare' >> beam.FlatMap(
-              lambda x: [abs(x - actual_count) * 1.0 / actual_count <= max_err])
-      )
-
-      assert_that(result, equal_to([True]), label='assert:global_by_size')
-
-  @retry(reraise=True, stop=stop_after_attempt(5))
-  def test_approximate_unique_global_by_sample_size_with_duplicates(self):
-    # test if estimation error with a given sample size is not greater than
-    # expected max error with duplicated input.
-    sample_size = 30
-    max_err = 2 / math.sqrt(sample_size)
-    test_input = [10] * 50 + [20] * 50
-    actual_count = len(set(test_input))
-
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size)
-          | 'compare' >> beam.FlatMap(
-              lambda x: [abs(x - actual_count) * 1.0 / actual_count <= max_err])
-      )
-
-      assert_that(
-          result,
-          equal_to([True]),
-          label='assert:global_by_size_with_duplicates')
-
-  @retry(reraise=True, stop=stop_after_attempt(5))
-  def test_approximate_unique_global_by_sample_size_with_small_population(self):
-    # test if estimation is exactly same to actual value when sample size is
-    # not smaller than population size (sample size > 100% of population).
-    sample_size = 31
-    test_input = [
-        144,
-        160,
-        229,
-        923,
-        390,
-        756,
-        674,
-        769,
-        145,
-        888,
-        809,
-        159,
-        222,
-        101,
-        943,
-        901,
-        876,
-        194,
-        232,
-        631,
-        221,
-        829,
-        965,
-        729,
-        35,
-        33,
-        115,
-        894,
-        827,
-        364
-    ]
-    actual_count = len(set(test_input))
 
-    with TestPipeline() as pipeline:
-      result = (
-          pipeline
-          | 'create' >> beam.Create(test_input)
-          | 'get_estimate' >> beam.ApproximateUnique.Globally(size=sample_size))
-
-      assert_that(
-          result,
-          equal_to([actual_count]),
-          label='assert:global_by_sample_size_with_small_population')
-
-  @unittest.skip(
-      'Skip because hash function is not good enough. '
-      'TODO: BEAM-7654')
-  def test_approximate_unique_global_by_error(self):
-    # test if estimation error from input error is not greater than input error.
-    est_err = 0.3
-    test_input = [
-        291,
-        371,
-        271,
-        126,
-        762,
-        391,
-        222,
-        565,
-        428,
-        786,
-        801,
-        867,
-        337,
-        690,
-        261,
-        436,
-        311,
-        568,
-        946,
-        722,
-        973,
-        386,
-        506,
-        546,
-        991,
-        450,
-        226,
-        889,
-        514,
-        693
-    ]
+@parameterized_class(('sys.modules[\'mmh3\']', ), mmh3_options)
+class ApproximateUniqueTest(unittest.TestCase):
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.
+  """
+  random.seed(0)
+  sys.modules['mmh3'] = None

Review comment:
       Is this line a left-over?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [not ready for review] [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...498a896](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `25.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53783     +114     
   ==========================================
   + Hits        21587    21656      +69     
   - Misses      32082    32127      +45     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `39.22% <0.00%> (+0.94%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3ee3de1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685895179


   R:@tvalentyn-- I can't seem to pass the codecov/patch report, but have added unit tests for the combiner that should go through the paths that are highlighted. Also,I think the other 3 failing tests failing for other merged PR's as well. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...74645b1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-690807589


   LGTM, thanks a lot, @monicadsong !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...32f6319](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485855090



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       We can still do Dataflow import with a new dependency, AFAIK we noticed that Google internal-only tests for Beam on Windows were failing, so we thought that adding mmh3 would be a concern for all Beam Windows users. I no longer see  failures in Windows precommit tests running on the PR.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...194dba3](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...175aaec](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn merged pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn merged pull request #12756:
URL: https://github.com/apache/beam/pull/12756


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **decrease** coverage by `0.02%`.
   > The diff coverage is `22.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.23%   40.21%   -0.03%     
   ==========================================
     Files         455      457       +2     
     Lines       53729    54057     +328     
   ==========================================
   + Hits        21620    21738     +118     
   - Misses      32109    32319     +210     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `26.93% <22.72%> (-0.40%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [...ache\_beam/runners/interactive/recording\_manager.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9yZWNvcmRpbmdfbWFuYWdlci5weQ==) | `29.05% <0.00%> (-0.55%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5) | `28.57% <0.00%> (-0.19%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `55.98% <0.00%> (-0.14%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...apache\_beam/runners/dataflow/internal/apiclient.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kYXRhZmxvdy9pbnRlcm5hbC9hcGljbGllbnQucHk=) | `20.20% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...ea1231a](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53729      +60     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32109      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...70c1b75](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.25%   +0.03%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53798     +129     
   ==========================================
   + Hits        21587    21659      +72     
   - Misses      32082    32139      +57     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3256299](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685920989


   Also, here are the accuracy stats with expected error to 0.01: 
   ![Screen Shot 2020-09-02 at 11 29 32 AM](https://user-images.githubusercontent.com/17239878/92022374-05c0c300-ed10-11ea-9bcf-9ce656bd4b6a.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685288813


   R: @tvalentyn 
   Went with hashlib instead. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3256299](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53840     +171     
   ==========================================
   + Hits        21587    21678      +91     
   - Misses      32082    32162      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...a83c0db](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-686873008


   Benchmark on 3B dataset showed accurate results. And modified the unit test to confirm the expected error-- the error for this algorithm is an expected error, not a max error, so certain inputs will exceed the expected error. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-689070359


   Run Portable_Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485870278



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       I did not know about py3 hash being non deterministic. That is strange.
   
   In that case I agree with this change. Let's make it optional so that the getting started experience does not change and any actual execution environment could install it as needed. (i.e. add it here : https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt in this PR.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485218862



##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -46,6 +57,35 @@
 V = typing.TypeVar('V')
 
 
+def _default_hash_fn(value):
+  """Hash value using either murmurhash or md5 based on installation."""
+  if not _default_hash_fn.fn:
+    try:
+      import mmh3  # pylint: disable=import-error
+
+      def _mmh3_hash(value):
+        # mmh3.hash64 returns 2 64-bit unsigned integers
+        return mmh3.hash64(value, seed=0, signed=False)[0]
+
+      _default_hash_fn.fn = _mmh3_hash
+    except ImportError:
+      logging.warning(
+          'Couldn\'t find murmurhash so the implementation of '
+          'ApproximateUnique is not as fast as it could be.')

Review comment:
       `Couldn't find murmurhash. Install mmh3 for a faster implementation of ApproximateUnique.`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] commented on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...90ab1bf](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...90ab1bf](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44375    +22755     
   + Misses      32109     9711    -22398     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...32f6319](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483385510



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       but I will add a seed to account for randomness in the shuffling of the test data. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `25.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.26%   +0.04%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53783     +114     
   ==========================================
   + Hits        21587    21656      +69     
   - Misses      32082    32127      +45     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `39.22% <0.00%> (+0.94%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3ee3de1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...9f66d37](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44373    +22753     
   + Misses      32109     9713    -22396     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fa23439](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685288813


   R: @tvalentyn 
   Went with hashlib instead (was the only open-source and Google available python hash function I could find). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...5d77f3c](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485862569



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -41,13 +40,15 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.transforms.stats import ApproximateQuantilesCombineFn
+from apache_beam.transforms.stats import ApproximateUniqueCombineFn
 
 
 class ApproximateUniqueTest(unittest.TestCase):
-  """Unit tests for ApproximateUnique.Globally and ApproximateUnique.PerKey.
-  Hash() with Python3 is nondeterministic, so Approximation algorithm generates
-  different result each time and sometimes error rate is out of range, so add
-  retries for all tests who actually running approximation algorithm."""
+  """Unit tests for ApproximateUnique.Globally, ApproximateUnique.PerKey,
+  and ApproximateUniqueCombineFn.

Review comment:
       py3 hash is non-deterministic, see BEAM-10824 and internal issue b/166646014 which has a bit more details. 
   We should not be using py3 hash function, and I support this change. Current change supports both md5 and mmh3, but we have a remanining question whether mmh3 should be a default dep (better performance than md5), but extra dependency, or optional (default to md5, give a warning recommendation to install mmh3).
   
   HLL is not available for Beam Python yet, is it? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44373    +22753     
   + Misses      32109     9713    -22396     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...dadcf86](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] Hannah-Jiang commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
Hannah-Jiang commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483452975



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       It’s better to run at least 100 times to test flakiness. I’d recommend to run 1000 times. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `25.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.23%   40.26%   +0.02%     
   ==========================================
     Files         455      455              
     Lines       53729    53840     +111     
   ==========================================
   + Hits        21620    21678      +58     
   - Misses      32109    32162      +53     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.06% <25.00%> (-0.27%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [sdks/python/apache\_beam/pipeline.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcGlwZWxpbmUucHk=) | `24.23% <0.00%> (-0.05%)` | :arrow_down: |
   | [setup.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2V0dXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `23.36% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `28.92% <0.00%> (+0.24%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `43.73% <0.00%> (+0.76%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `39.22% <0.00%> (+0.94%)` | :arrow_up: |
   | [...s/python/apache\_beam/testing/synthetic\_pipeline.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9zeW50aGV0aWNfcGlwZWxpbmUucHk=) | `23.45% <0.00%> (+2.52%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...cc30e8a](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [not ready for review] [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   + Coverage   40.22%   40.23%   +0.01%     
   ==========================================
     Files         454      455       +1     
     Lines       53669    53734      +65     
   ==========================================
   + Hits        21587    21620      +33     
   - Misses      32082    32114      +32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <28.57%> (+0.05%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `41.17% <0.00%> (-1.58%)` | :arrow_down: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/kinesis.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8va2luZXNpcy5weQ==) | `66.66% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3ee3de1](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483215304



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,62 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r485985580



##########
File path: sdks/python/container/license_scripts/dep_urls_py.yaml
##########
@@ -77,8 +77,6 @@ pip_dependencies:
     license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE"
   httplib2:
     license: "https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE"
-  mmh3:
-    license: "https://raw.githubusercontent.com/hajimes/mmh3/master/LICENSE"

Review comment:
       hmm probably not-- will add it back in




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483167308



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,62 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):
+    # test if the combiner throws an error with a nondeterministic coder.
+    sample_size = 30
+    coder = coders.Base64PickleCoder()
+
+    with self.assertRaises(ValueError) as e:
+      _ = ApproximateUniqueCombineFn(sample_size, coder)
+
+    self.assertRegex(
+        e.exception.args[0],
+        'The key coder "Base64PickleCoder" '
+        'for ApproximateUniqueCombineFn is not deterministic.')
+
+  def test_approximate_unique_combine_fn_by_wrong_coder(self):

Review comment:
       naming suggestion: 
   test_approximate_unique_combine_fn_requires_compatible_coder

##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -496,6 +483,62 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self):
           equal_to([True]),
           label='assert:globally_by_error_with_skewed_data')
 
+  def test_approximate_unique_combine_fn_by_nondeterministic_coder(self):

Review comment:
       naming suggestion:
   test_approximate_unique_combine_fn_requires_deterministic_coder




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] monicadsong commented on a change in pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
monicadsong commented on a change in pull request #12756:
URL: https://github.com/apache/beam/pull/12756#discussion_r483384943



##########
File path: sdks/python/apache_beam/transforms/stats_test.py
##########
@@ -160,57 +159,13 @@ def test_get_sample_size_from_est_error(self):
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
     assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 40000
 
-  @unittest.skip(
-      'Skip it because hash function is not good enough. '
-      'TODO: BEAM-7654')
   def test_approximate_unique_global_by_sample_size(self):
     # test if estimation error with a given sample size is not greater than
     # expected max error.
     sample_size = 16
     max_err = 2 / math.sqrt(sample_size)
-    test_input = [
-        4,
-        34,
-        29,
-        46,
-        80,
-        66,
-        51,
-        81,
-        31,
-        9,
-        26,
-        36,
-        10,
-        41,
-        90,
-        35,
-        33,
-        19,
-        88,
-        86,
-        28,
-        93,
-        38,
-        76,
-        15,
-        87,
-        12,
-        39,
-        84,
-        13,
-        32,
-        49,
-        65,
-        100,
-        16,
-        27,
-        23,
-        30,
-        96,
-        54
-    ]
-
+    test_input = list(range(100))

Review comment:
       yes, all tests pass in a single run!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...8a6fd58](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824] [BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/f923da12e02e233d8c40b28d520c49886b1adca4?el=desc) will **increase** coverage by `41.80%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #12756       +/-   ##
   ===========================================
   + Coverage   40.23%   82.04%   +41.80%     
   ===========================================
     Files         455      457        +2     
     Lines       53729    54086      +357     
   ===========================================
   + Hits        21620    44373    +22753     
   + Misses      32109     9713    -22396     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `87.39% <86.36%> (+60.06%)` | :arrow_up: |
   | [sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==) | `0.00% <0.00%> (ø)` | |
   | [...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5) | `94.28% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/slow\_stream.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Nsb3dfc3RyZWFtLnB5) | `92.43% <0.00%> (+1.68%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=) | `32.11% <0.00%> (+1.83%)` | :arrow_up: |
   | [...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=) | `36.21% <0.00%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=) | `95.25% <0.00%> (+2.64%)` | :arrow_up: |
   | ... and [274 more](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...fa23439](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12756: [BEAM-10824, BEAM-7654] Change hash function in ApproximateUniqueCombineFn

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12756:
URL: https://github.com/apache/beam/pull/12756#issuecomment-685255075


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=h1) Report
   > Merging [#12756](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/1d25e2ebeb4a0f74278dbd0cfbaa00f36abd73dc?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `22.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12756/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12756      +/-   ##
   ==========================================
   - Coverage   40.22%   40.21%   -0.01%     
   ==========================================
     Files         454      454              
     Lines       53669    53681      +12     
   ==========================================
   + Hits        21587    21588       +1     
   - Misses      32082    32093      +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `25.00% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | `27.38% <33.33%> (+0.05%)` | :arrow_up: |
   | [.../runners/portability/fn\_api\_runner/translations.py](https://codecov.io/gh/apache/beam/pull/12756/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3RyYW5zbGF0aW9ucy5weQ==) | `13.62% <0.00%> (-0.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=footer). Last update [31af8b1...3f3c90b](https://codecov.io/gh/apache/beam/pull/12756?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org