You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "robertwb (via GitHub)" <gi...@apache.org> on 2023/04/25 22:28:52 UTC

[GitHub] [beam] robertwb opened a new pull request, #26423: Allow implicit flattening for yaml inputs.

robertwb opened a new pull request, #26423:
URL: https://github.com/apache/beam/pull/26423

   One can still explicitly use `Flatten` if desired.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #26423:
URL: https://github.com/apache/beam/pull/26423#issuecomment-1526286634

   Regarding type hints, this is mostly because the codebase predates PEP 484 (and Python 3). We have added type hints in some places, and do run mypy. It would probably be a good project to add type hints here (and it's probably self-contained enough that it wouldn't be a herculean project of fixing the rest of the SDK to do so).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #26423:
URL: https://github.com/apache/beam/pull/26423#issuecomment-1522501820

   R: @bzablocki


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #26423:
URL: https://github.com/apache/beam/pull/26423#issuecomment-1522514674

   ## [Codecov](https://codecov.io/gh/apache/beam/pull/26423?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#26423](https://codecov.io/gh/apache/beam/pull/26423?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c7b363b) into [master](https://codecov.io/gh/apache/beam/commit/85d4276016a1e4388cc9dd431e8f0684eb09fd95?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85d4276) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #26423      +/-   ##
   ==========================================
   + Coverage   81.09%   81.11%   +0.01%     
   ==========================================
     Files         469      469              
     Lines       67199    67298      +99     
   ==========================================
   + Hits        54494    54587      +93     
   - Misses      12705    12711       +6     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `81.11% <100.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/26423?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/yaml/yaml\_provider.py](https://codecov.io/gh/apache/beam/pull/26423?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC95YW1sX3Byb3ZpZGVyLnB5) | `62.38% <100.00%> (-0.41%)` | :arrow_down: |
   | [sdks/python/apache\_beam/yaml/yaml\_transform.py](https://codecov.io/gh/apache/beam/pull/26423?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC95YW1sX3RyYW5zZm9ybS5weQ==) | `85.45% <100.00%> (+0.78%)` | :arrow_up: |
   
   ... and [10 files with indirect coverage changes](https://codecov.io/gh/apache/beam/pull/26423/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb merged pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb merged PR #26423:
URL: https://github.com/apache/beam/pull/26423


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #26423:
URL: https://github.com/apache/beam/pull/26423#discussion_r1179607394


##########
sdks/python/apache_beam/yaml/yaml_transform.py:
##########
@@ -250,29 +260,35 @@ def expand_transform(spec, scope):
 
 
 def expand_leaf_transform(spec, scope):
+  _LOGGER.info("Expanding %s ", identify_object(spec))
   spec = normalize_inputs_outputs(spec)
-  inputs_dict = {
-      key: scope.get_pcollection(value)
-      for (key, value) in spec['input'].items()
-  }
-  input_type = spec.get('input_type', 'default')
-  if input_type == 'list':
-    inputs = tuple(inputs_dict.values())
-  elif input_type == 'map':
-    inputs = inputs_dict
+  ptransform = scope.create_ptransform(spec)
+  transform_label = scope.unique_name(spec, ptransform)
+
+  if spec['type'] == 'Flatten':
+    # Avoid flattening before the flatten, just to make a nicer graph.
+    inputs = tuple(
+        scope.get_pcollection(input) for key,
+        value in spec['input'].items()
+        for input in ([value] if isinstance(value, str) else value))

Review Comment:
   Ah, yeah. Yapf has trouble with the commas in list comprehensions. Fixed by adding parentheses. 



##########
sdks/python/apache_beam/yaml/yaml_transform.py:
##########
@@ -123,6 +123,16 @@ def compute_all(self):
     for transform_id in self._transforms_by_uuid.keys():
       self.compute_outputs(transform_id)
 
+  def get_input(self, inputs, transform_label, key):
+    if isinstance(inputs, str):
+      return self.get_pcollection(inputs)
+    else:
+      return tuple(
+          self.get_pcollection(x) for x in
+          inputs) | f'{transform_label}-FlattenInputs[{key}]' >> beam.Flatten(
+              pipeline=self.root if isinstance(self.root, beam.Pipeline
+                                               ) else self.root.pipeline)

Review Comment:
   That pipeline statement is certainly unwieldy. Refactored a bit, hopefully it's clearer now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26423:
URL: https://github.com/apache/beam/pull/26423#issuecomment-1522502502

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bzablocki commented on a diff in pull request #26423: Allow implicit flattening for yaml inputs.

Posted by "bzablocki (via GitHub)" <gi...@apache.org>.
bzablocki commented on code in PR #26423:
URL: https://github.com/apache/beam/pull/26423#discussion_r1179485349


##########
sdks/python/apache_beam/yaml/yaml_transform.py:
##########
@@ -123,6 +123,16 @@ def compute_all(self):
     for transform_id in self._transforms_by_uuid.keys():
       self.compute_outputs(transform_id)
 
+  def get_input(self, inputs, transform_label, key):
+    if isinstance(inputs, str):
+      return self.get_pcollection(inputs)
+    else:
+      return tuple(
+          self.get_pcollection(x) for x in
+          inputs) | f'{transform_label}-FlattenInputs[{key}]' >> beam.Flatten(
+              pipeline=self.root if isinstance(self.root, beam.Pipeline
+                                               ) else self.root.pipeline)

Review Comment:
   It took me some time to understand what is happening here, but that is probably because I'm not that familiar with the Python's SDK, it will probably come with time ;) To get a better understanding on this piece of code, I rewrote it to this form: 
   ```suggestion
         transform_name = f'{transform_label}-FlattenInputs[{key}]'
         root = self.root if isinstance(self.root,
                                        beam.Pipeline) else self.root.pipeline
         inputs = (
             tuple(self.get_pcollection(x) for x in inputs)
             | transform_name >> beam.Flatten(pipeline=root))
         return inputs
   ```
   It is definitely longer and more verbose, so I leave the choice to you. 



##########
sdks/python/apache_beam/yaml/yaml_transform.py:
##########
@@ -250,29 +260,35 @@ def expand_transform(spec, scope):
 
 
 def expand_leaf_transform(spec, scope):
+  _LOGGER.info("Expanding %s ", identify_object(spec))
   spec = normalize_inputs_outputs(spec)
-  inputs_dict = {
-      key: scope.get_pcollection(value)
-      for (key, value) in spec['input'].items()
-  }
-  input_type = spec.get('input_type', 'default')
-  if input_type == 'list':
-    inputs = tuple(inputs_dict.values())
-  elif input_type == 'map':
-    inputs = inputs_dict
+  ptransform = scope.create_ptransform(spec)
+  transform_label = scope.unique_name(spec, ptransform)
+
+  if spec['type'] == 'Flatten':
+    # Avoid flattening before the flatten, just to make a nicer graph.
+    inputs = tuple(
+        scope.get_pcollection(input) for key,
+        value in spec['input'].items()
+        for input in ([value] if isinstance(value, str) else value))

Review Comment:
   Can we improve formatting for readability?
   ```suggestion
       inputs = tuple(
           scope.get_pcollection(input) 
           for key, value in spec['input'].items()
           for input in ([value] if isinstance(value, str) else value))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org