You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/07/26 22:55:00 UTC

[GitHub] [beam] yifanmai commented on a change in pull request #15202: [BEAM-1833] Preserve inputs names at graph construction and through proto transaltion.

yifanmai commented on a change in pull request #15202:
URL: https://github.com/apache/beam/pull/15202#discussion_r676974351



##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -437,11 +438,11 @@ def visit_transform(self, transform_node):
                 output_replacements[transform_node].append((tag, replacement))
 
         if replace_input:
-          new_input = [
-              input if not input in output_map else output_map[input]
-              for input in transform_node.inputs
-          ]
-          input_replacements[transform_node] = new_input
+          new_inputs = {
+              tag: input if not input in output_map else output_map[input]
+              for (tag, input) in transform_node.main_inputs.items()
+          }
+          input_replacements[transform_node] = new_inputs

Review comment:
       I can't leave a comment on Line 274 but the type annotation for `input_replacements` there needs to change from `Dict[AppliedPTransform, Sequence[Union[pvalue.PBegin, pvalue.PCollection]]]` to `Dict[AppliedPTransform, Dict[str, Union[pvalue.PBegin, pvalue.PCollection]]]` (or `Mapping`).

##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -670,15 +671,18 @@ def apply(
 
     pvalueish, inputs = transform._extract_input_pvalues(pvalueish)
     try:
-      inputs = tuple(inputs)
-      for leaf_input in inputs:
-        if not isinstance(leaf_input, pvalue.PValue):
-          raise TypeError
+      if not isinstance(inputs, dict):
+        inputs = {str(ix): input for (ix, input) in enumerate(inputs)}
     except TypeError:

Review comment:
       Delete the `except` branch; it is no longer needed because the `PValue` check is now done below.

##########
File path: sdks/python/apache_beam/transforms/ptransform.py
##########
@@ -253,7 +254,7 @@ def visit(self, node):
       return self.visit_nested(node)
 
 
-def get_named_nested_pvalues(pvalueish):
+def get_named_nested_pvalues(pvalueish, as_inputs=False):

Review comment:
       In the invocation of `get_named_nested_pvalues()` in `pipeline.py`, do we also need to stringify the tags (so that `None` becomes `'None'`)?

##########
File path: sdks/python/apache_beam/pipeline_test.py
##########
@@ -972,6 +972,24 @@ def expand(self, p):
     for transform_id in runner_api_proto.components.transforms:
       self.assertRegex(transform_id, r'[a-zA-Z0-9-_]+')
 
+  def test_input_bames(self):

Review comment:
       typo: `test_input_names`

##########
File path: sdks/python/apache_beam/transforms/ptransform.py
##########
@@ -262,16 +263,21 @@ def get_named_nested_pvalues(pvalueish):
     else:
       tagged_values = enumerate(pvalueish)
   elif isinstance(pvalueish, list):
+    if as_inputs:
+      yield None, pvalueish

Review comment:
       What's the rationale for this branch i.e. yielding the whole `pvalueish` versus enumerating it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org