You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/11 23:22:32 UTC

[GitHub] [beam] ryanthompson591 commented on a diff in pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements

ryanthompson591 commented on code in PR #22198:
URL: https://github.com/apache/beam/pull/22198#discussion_r918404835


##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -357,13 +357,15 @@ def infer_output_type(self, input_type):
         return np.int64
 
     with self.create_pipeline() as p:
-      res = (
+      pc = (
           p
           | beam.Create(np.array([1, 2, 3], dtype=np.int64)).with_output_types(
               np.int64)
-          | beam.ParDo(ArrayProduceDoFn())
-          | beam.ParDo(ArrayMultiplyDoFn())
-          | beam.Map(lambda x: x * 3))
+          | beam.ParDo(ArrayProduceDoFn()))
+
+      self.assertEqual(pc.element_type, np.int64)

Review Comment:
   This is up to you, my rule of thumb is smaller concise tests for unit tests.
   
   Like:
   def test_pardo_maintains_type_hint():
     with self.create_pipeline() as p:
         pc = (
             p
             | beam.Create(np.array([1, 2, 3], dtype=np.int64)).with_output_types(
                 np.int64)
             | beam.ParDo(ArrayProduceDoFn()))
   
         self.assertEqual(pc.element_type, np.int64)
   
   And then end the test there.



##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -170,6 +173,29 @@ def process_batch(self, batch, *args, **kwargs):
     yield [element * 2 for element in batch]
 
 
+class MismatchedBatchProducingDoFn(beam.DoFn):
+  """A DoFn that produces batches from both process and process_batch, with
+  mismatched types. Should yield a construction time error when applied."""

Review Comment:
   took me three looks to realize the mismatch was one produces floats and the other ints. Maybe just slightly amend the comment like this:
   ...with mismatched types (one has floats the other ints)....



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -708,6 +708,36 @@ def get_function_arguments(self, func):
 
   def default_type_hints(self):
     fn_type_hints = typehints.decorators.IOTypeHints.from_callable(self.process)
+    batch_fn_type_hints = typehints.decorators.IOTypeHints.from_callable(
+        self.process_batch)
+
+    if fn_type_hints is not None:

Review Comment:
   This nested logic is hard to read and a little messy and hard to follow.
   
   For example here and below we have "if fn_type_hints is not None:"
   
   I had to write this a little to really wrap my head around it. I don't know if any parts of my rewrite are useful.
   
   ```
   # default type hints can come from one of two places. The process type hints or the process batch type hints.
   # also the process might be set up to yield batches or to yield elements
   
   has_fn_type_hints = fn_type_hints is not None
   has_batch_type_hints = batch_fn_type_hints is not None
   yields_batches = self._process_yields_batches
   yeilds_elements = self._process_yields_elements
   
   if has_fn_type_hints:
     if yields_elements and has_batch_type_hints:
       # because both batch and element type hints are defined, raise an exception if they are not equal.
       self.validate_batch_and_fn_typehints_same()
       fn_type_hints = fn_type_hints.with_output_types_from(
               batch_fn_type_hints)  # TODO: comment why this is done.
     elif yields_batches:
       # Because the process method produces batches, don't use its output typehints.
       fn_type_hints = fn_type_hints.with_output_types_from(
               typehints.decorators.IOTypeHints.empty())
   elif yields_elements and has_batch_type_hints:
     # process_batch produces elements, grab its output typehint
     fn_type_hints = batch_fn_type_hints.with_input_types_from(
               typehints.decorators.IOTypeHints.empty())  # TODO comment why does this use with_input_types_from
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org