You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "damccorm (via GitHub)" <gi...@apache.org> on 2023/01/20 20:02:06 UTC

[GitHub] [beam] damccorm opened a new pull request, #25106: Move CombineValues override to non portable overrides

damccorm opened a new pull request, #25106:
URL: https://github.com/apache/beam/pull/25106

   Right now, running the following pipeline on Dataflow:
   
   ```
   import logging
   
   import apache_beam as beam
   from apache_beam.options.pipeline_options import DebugOptions
   
   def merge(vals):
       out = ""
       for v in vals:
           out += v
       return out
   
   def run():
       pipeline_options = DebugOptions() 
       pipeline_options.dataflow_job_file = "path/to/dump/proto"
   
       with beam.Pipeline(options=pipeline_options) as p:
           (p
               | beam.Create([("key1", "foo"), ("key2", "bar"), ("key1", "baz")])
               | beam.GroupByKey()
               | beam.CombineValues(merge)
           )
   
   
   if __name__ == '__main__':
       logging.getLogger().setLevel(logging.INFO)
       run()
   ```
   
   succeeds when `use_runner_v2` is not included as an experiment, but fails when it is. This is because the translation being done generates an invalid graph for the `use_runner_v2` case: a portable pipeline shouldn't be using the overridden combine here.
   
   This fixes the problem by moving the override to only occur when `use_runner_v2` is not included
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400640898

   stop reviewer notifications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400530273

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400567471

   Run Python 3.8 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400585112

   Run Python 3.8 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400585287

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400519065

   Run Python 3.8 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400530083

   Run Python 3.8 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400658184

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400657789

   Run Python 3.8 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on a diff in pull request #25106: Move CombineValues override to non portable overrides

Posted by "lukecwik (via GitHub)" <gi...@apache.org>.
lukecwik commented on code in PR #25106:
URL: https://github.com/apache/beam/pull/25106#discussion_r1084470132


##########
sdks/python/apache_beam/transforms/combiners_test.py:
##########
@@ -873,6 +873,24 @@ def test_with_input_types_decorator_violation(self):
         _ = pc | beam.CombineGlobally(self.fn)
 
 
+@pytest.mark.it_validatesrunner
+class CombineValuesTest(unittest.TestCase):
+  def test_combinevalues(self):

Review Comment:
   nit:
   ```suggestion
     def test_gbk_immediately_followed_by_combine(self):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400787011

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400786799

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400520266

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1399150490

   R: @lukecwik 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1398892074

   Run Python 3.9 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm merged pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm merged PR #25106:
URL: https://github.com/apache/beam/pull/25106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400530465

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400657994

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on a diff in pull request #25106: Move CombineValues override to non portable overrides

Posted by "lukecwik (via GitHub)" <gi...@apache.org>.
lukecwik commented on code in PR #25106:
URL: https://github.com/apache/beam/pull/25106#discussion_r1084315923


##########
sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues_it_test.py:
##########
@@ -0,0 +1,39 @@
+import logging
+import pytest
+import unittest
+
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+from . import combinevalues
+
+class CombineValuesIT(unittest.TestCase):

Review Comment:
   I would suggest making this a validates runner test like:
   https://github.com/apache/beam/blob/897fffdeda59cc2f121dd502e569770a897fc4ac/sdks/python/apache_beam/transforms/ptransform_test.py#L702
   
   in ptransform_test.py
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400519867

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400585467

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400567657

   Run Python Dataflow V2 ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25106: Move CombineValues override to non portable overrides

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25106:
URL: https://github.com/apache/beam/pull/25106#issuecomment-1400567826

   Run Python Dataflow ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org