You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "liferoad (via GitHub)" <gi...@apache.org> on 2023/03/07 03:22:09 UTC

[GitHub] [beam] liferoad opened a new pull request, #25743: Raise the Runtime error when DoFn.process uses both yield and return

liferoad opened a new pull request, #25743:
URL: https://github.com/apache/beam/pull/25743

   
   `addresses #123`
   
   Do not allow mixing yield and return statements in the process method for DoFn
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1130378092


##########
sdks/python/apache_beam/transforms/core_test.py:
##########
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more

Review Comment:
   will do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1142538738


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1436,7 +1436,8 @@ def _check_fn_use_yield_and_return(fn):
       if has_yield and has_return:
         return True
     return False
-  except TypeError:
+  except Exception as e:
+    _LOGGER.info(str(e))

Review Comment:
   switched to debug. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1131198177


##########
sdks/python/apache_beam/transforms/core_test.py:
##########
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more

Review Comment:
   Please take a look at CHANGES.md.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1130154156


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   I think analyzing bytecode (see `dis.get_instructions()`) would be more reliable than src analysis. Have you considered that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1136223203


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,56 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]
+  source_lines = dropwhile(lambda x: x.startswith("@"), source_lines)
+  def_line = next(source_lines).strip()
+  if def_line.startswith("def ") and def_line.endswith(":"):
+    first_line = next(source_lines)
+    indentation = len(first_line) - len(first_line.lstrip())
+    final_lines = [first_line[indentation:]]
+
+    skip_inner_def = False
+    if first_line[indentation:].startswith("def "):
+      skip_inner_def = True
+    for line in source_lines:
+      line_indentation = len(line) - len(line.lstrip())
+
+      if line[indentation:].startswith("def "):
+        skip_inner_def = True
+        continue
+
+      if skip_inner_def and line_indentation == indentation:
+        skip_inner_def = False
+
+      if skip_inner_def and line_indentation > indentation:
+        continue
+      final_lines.append(line[indentation:])
+
+    return "".join(final_lines)
+  else:
+    return def_line.rsplit(":")[-1].strip()
+
+
+def _check_fn_use_yield_and_return(fn):
+  if isinstance(fn, types.BuiltinFunctionType):
+    return False
+  try:
+    source_code = _get_function_body_without_inners(fn)
+    has_yield = False
+    has_return = False
+    for line in source_code.split("\n"):
+      if line.lstrip().startswith("yield"):

Review Comment:
   Good catch. I changed the code and updated the tests to capture these cases. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1136223412


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1427,6 +1478,16 @@ def __init__(self, fn, *args, **kwargs):
     if not isinstance(self.fn, DoFn):
       raise TypeError('ParDo must be called with a DoFn instance.')
 
+    # DoFn.process cannot allow both return and yield
+    if _check_fn_use_yield_and_return(self.fn.process):
+      _LOGGER.warning(
+          'The yield and return statements in the process method '

Review Comment:
   Sounds good. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25743:
URL: https://github.com/apache/beam/pull/25743#issuecomment-1459191278

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @pabloem for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1131566457


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   `total_yield` and `tax_return` could  trigger false positives here. I took a closer look, I am no longer sure unconditionally failing  the pipeline is the right thing to do here, as we are fighting against Python language rules. I think we can print a warning potentially, and update the documentation to recommend `yield from` instead of return for iterables. We should do the due diligence to make sure `yield from` achieves the desired result though to make sure there are no surprises.



##########
CHANGES.md:
##########
@@ -68,6 +68,7 @@
 
 ## Breaking Changes
 
+* Python SDK now does not allow mixing the yield and return statements in `DoFn.process()` ([#22969](https://github.com/apache/beam/issues/22969)). `yield` is recommended for emitting elements and `yield from` for iterators.

Review Comment:
   ```suggestion
   * Python SDK now does not allow mixing the yield and return statements in `DoFn.process()` ([#22969](https://github.com/apache/beam/issues/22969)). We recommend to use `yield`  for emitting individual elements and `yield from` for emitting the content of entire iterables.
   ```



##########
CHANGES.md:
##########
@@ -68,6 +68,7 @@
 
 ## Breaking Changes
 
+* Python SDK now does not allow mixing the yield and return statements in `DoFn.process()` ([#22969](https://github.com/apache/beam/issues/22969)). `yield` is recommended for emitting elements and `yield from` for iterators.

Review Comment:
   this suggestion should be added to Beam docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1132911272


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   By checking the lines, we can capture these cases now. I also updated the tests. To be safe, I changed to use the logging warning instead of raising RuntimeError. The doc was updated by https://github.com/apache/beam/pull/25060. I also updated the warning messages with the actionable suggestions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1132912439


##########
CHANGES.md:
##########
@@ -68,6 +68,7 @@
 
 ## Breaking Changes
 
+* Python SDK now does not allow mixing the yield and return statements in `DoFn.process()` ([#22969](https://github.com/apache/beam/issues/22969)). `yield` is recommended for emitting elements and `yield from` for iterators.

Review Comment:
   I added this to the warning message. The current doc indeed has this information here: https://beam.apache.org/documentation/programming-guide/#pardo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1130154156


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   I think analyzing bytecode (see `dis.get_instructions()`) would be more reliable that src analysis. Have you considered that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1127300944


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1387,11 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _check_fn_use_yield_and_return(fn):
+  source_code = inspect.getsource(fn)
+  return " yield " in source_code and " return " in source_code

Review Comment:
   only check the function source code to check yield and return, which should cover most of cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on PR #25743:
URL: https://github.com/apache/beam/pull/25743#issuecomment-1457455088

   @tvalentyn PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25743:
URL: https://github.com/apache/beam/pull/25743#issuecomment-1457516744

   Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1130377411


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   I tried this as my first idea. It does not work:
   ```
   def bar():
       yield "bar"
   list(dis.get_instructions(bar))
   ```
   This generates:
   ```
   [Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval='bar', argrepr="'bar'", offset=0, starts_line=2, is_jump_target=False),
    Instruction(opname='YIELD_VALUE', opcode=86, arg=None, argval=None, argrepr='', offset=2, starts_line=None, is_jump_target=False),
    Instruction(opname='POP_TOP', opcode=1, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False),
    Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval=None, argrepr='None', offset=6, starts_line=None, is_jump_target=False),
    Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=8, starts_line=None, is_jump_target=False)]
   ```
   `RETURN_VALUE` is always there no matter the function contains yield or return. So I decided to use this simple way by just checking the source code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1134740493


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1427,6 +1478,16 @@ def __init__(self, fn, *args, **kwargs):
     if not isinstance(self.fn, DoFn):
       raise TypeError('ParDo must be called with a DoFn instance.')
 
+    # DoFn.process cannot allow both return and yield
+    if _check_fn_use_yield_and_return(self.fn.process):
+      _LOGGER.warning(
+          'The yield and return statements in the process method '

Review Comment:
   I would rephrase the beginning as: `Using yield and return in the process method can lead to unexpected behavior, see: https://github.com/apache/beam/issues/22969.`



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,56 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]
+  source_lines = dropwhile(lambda x: x.startswith("@"), source_lines)
+  def_line = next(source_lines).strip()
+  if def_line.startswith("def ") and def_line.endswith(":"):
+    first_line = next(source_lines)
+    indentation = len(first_line) - len(first_line.lstrip())
+    final_lines = [first_line[indentation:]]
+
+    skip_inner_def = False
+    if first_line[indentation:].startswith("def "):
+      skip_inner_def = True
+    for line in source_lines:
+      line_indentation = len(line) - len(line.lstrip())
+
+      if line[indentation:].startswith("def "):
+        skip_inner_def = True
+        continue
+
+      if skip_inner_def and line_indentation == indentation:
+        skip_inner_def = False
+
+      if skip_inner_def and line_indentation > indentation:
+        continue
+      final_lines.append(line[indentation:])
+
+    return "".join(final_lines)
+  else:
+    return def_line.rsplit(":")[-1].strip()
+
+
+def _check_fn_use_yield_and_return(fn):
+  if isinstance(fn, types.BuiltinFunctionType):
+    return False
+  try:
+    source_code = _get_function_body_without_inners(fn)
+    has_yield = False
+    has_return = False
+    for line in source_code.split("\n"):
+      if line.lstrip().startswith("yield"):

Review Comment:
   this can still trigger false-positives for variables like yield_elements and return_value (both occur in Beam). 
   
   consider checking for `startswith("yield(") or startswith("yield ") or startswith("return(") or startswith("return ")`



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   > The doc was updated by #25060
   
   we could recommend yield from instead of return for iterables, after checking that yield from works well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1141047640


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,47 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]

Review Comment:
   Done. and also log it with info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1142302492


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1436,7 +1436,8 @@ def _check_fn_use_yield_and_return(fn):
       if has_yield and has_return:
         return True
     return False
-  except TypeError:
+  except Exception as e:
+    _LOGGER.info(str(e))

Review Comment:
   Sounds like this may cause extra noise in the logs to the users. I suggest _LOGGER.debug or a one-line hand-written message, or remove. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1132911411


##########
sdks/python/apache_beam/transforms/core_test.py:
##########
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more

Review Comment:
   No need to change this now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #25743:
URL: https://github.com/apache/beam/pull/25743#issuecomment-1460432881

   # [Codecov](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#25743](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6712440) into [master](https://codecov.io/gh/apache/beam/commit/6452dc7982240819a763aaf9ff3efc4a01fc1d2b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6452dc7) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #25743      +/-   ##
   ==========================================
   + Coverage   72.81%   72.82%   +0.01%     
   ==========================================
     Files         775      775              
     Lines      102928   102970      +42     
   ==========================================
   + Hits        74949    74991      +42     
     Misses      26524    26524              
     Partials     1455     1455              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `81.97% <100.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `93.09% <100.00%> (+0.17%)` | :arrow_up: |
   | [...thon/apache\_beam/runners/worker/sdk\_worker\_main.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlcl9tYWluLnB5) | `76.87% <0.00%> (-0.77%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.97% <0.00%> (-0.76%)` | :arrow_down: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `89.20% <0.00%> (-0.11%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `94.33% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `71.41% <0.00%> (+0.06%)` | :arrow_up: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/25743?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `93.13% <0.00%> (+0.63%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1130159769


##########
sdks/python/apache_beam/transforms/core_test.py:
##########
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more

Review Comment:
   I think it's worth adding a note to CHANGES.md in Breaking Changes section, with a suggestion how users should proceed if they are affected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on PR #25743:
URL: https://github.com/apache/beam/pull/25743#issuecomment-1464759096

   Run PythonDocker PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on a diff in pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #25743:
URL: https://github.com/apache/beam/pull/25743#discussion_r1140866962


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -1387,6 +1388,58 @@ def partition_for(self, element, num_partitions, *args, **kwargs):
     return self._fn(element, num_partitions, *args, **kwargs)
 
 
+def _get_function_body_without_inners(func):
+  source_lines = inspect.getsourcelines(func)[0]
+  source_lines = dropwhile(lambda x: x.startswith("@"), source_lines)
+  def_line = next(source_lines).strip()
+  if def_line.startswith("def ") and def_line.endswith(":"):
+    first_line = next(source_lines)
+    indentation = len(first_line) - len(first_line.lstrip())
+    final_lines = [first_line[indentation:]]
+
+    skip_inner_def = False
+    if first_line[indentation:].startswith("def "):
+      skip_inner_def = True
+    for line in source_lines:
+      line_indentation = len(line) - len(line.lstrip())
+
+      if line[indentation:].startswith("def "):
+        skip_inner_def = True
+        continue
+
+      if skip_inner_def and line_indentation == indentation:
+        skip_inner_def = False
+
+      if skip_inner_def and line_indentation > indentation:
+        continue
+      final_lines.append(line[indentation:])
+
+    return "".join(final_lines)
+  else:
+    return def_line.rsplit(":")[-1].strip()
+
+
+def _check_fn_use_yield_and_return(fn):
+  if isinstance(fn, types.BuiltinFunctionType):
+    return False
+  try:
+    source_code = _get_function_body_without_inners(fn)
+    has_yield = False
+    has_return = False
+    for line in source_code.split("\n"):
+      if line.lstrip().startswith("yield ") or line.lstrip().startswith(
+          "yield("):
+        has_yield = True
+      if line.lstrip().startswith("return ") or line.lstrip().startswith(
+          "return("):
+        has_return = True
+      if has_yield and has_return:
+        return True
+    return False
+  except TypeError:

Review Comment:
   I suggest broadening the except to include any exception just in case so as to not fail the job unnecessarily. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn merged pull request #25743: Raise the Runtime error when DoFn.process uses both yield and return

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn merged PR #25743:
URL: https://github.com/apache/beam/pull/25743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org