You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/14 00:58:54 UTC

[GitHub] [beam] TheNeuralBit opened a new pull request, #21844: Document and test overriding batch type inference

TheNeuralBit opened a new pull request, #21844:
URL: https://github.com/apache/beam/pull/21844

   Fixes #21652
   
   Some Batched DoFns (e.g. RunInference) will need to declare their input/output batch types dynamically based on some configuration. Technically a DoFn implementation should already be able to do this, but it's untested and undocumented. This PR simply documents the functions that need to be overridden (`get_input_batch_type`, `get_output_batch_type`), and adds tests verifying it's possible. 
   
   We also add new `_normalized` versions of these functions which are responsible for normalizing the typehints to Beam typehints. This allows users to return native typehints in their implementations if they prefer.
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] codecov[bot] commented on pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

codecov[bot] commented on PR #21844:
URL: https://github.com/apache/beam/pull/21844#issuecomment-1154607687

   # [Codecov](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#21844](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b4a3c88) into [master](https://codecov.io/gh/apache/beam/commit/87a7dcc7251c41ce8d452f9fca76f908ecd3484d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (87a7dcc) will **decrease** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #21844      +/-   ##
   ==========================================
   - Coverage   74.15%   74.14%   -0.02%     
   ==========================================
     Files         698      698              
     Lines       92411    92412       +1     
   ==========================================
   - Hits        68530    68519      -11     
   - Misses      22630    22642      +12     
     Partials     1251     1251              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.75% <100.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `92.40% <100.00%> (+0.02%)` | :arrow_up: |
   | [.../python/apache\_beam/testing/test\_stream\_service.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy90ZXN0X3N0cmVhbV9zZXJ2aWNlLnB5) | `88.09% <0.00%> (-4.77%)` | :arrow_down: |
   | [...n/apache\_beam/ml/gcp/recommendations\_ai\_test\_it.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvZ2NwL3JlY29tbWVuZGF0aW9uc19haV90ZXN0X2l0LnB5) | `73.46% <0.00%> (-2.05%)` | :arrow_down: |
   | [...che\_beam/runners/interactive/interactive\_runner.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9ydW5uZXIucHk=) | `90.06% <0.00%> (-1.33%)` | :arrow_down: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `88.46% <0.00%> (-0.64%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/combiners.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb21iaW5lcnMucHk=) | `93.05% <0.00%> (-0.39%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.54% <0.00%> (-0.13%)` | :arrow_down: |
   | [...examples/inference/pytorch\_image\_classification.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvaW5mZXJlbmNlL3B5dG9yY2hfaW1hZ2VfY2xhc3NpZmljYXRpb24ucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/ml/inference/api.py](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL2FwaS5weQ==) | | |
   | ... and [6 more](https://codecov.io/gh/apache/beam/pull/21844/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [87a7dcc...b4a3c88](https://codecov.io/gh/apache/beam/pull/21844?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897112603


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -770,16 +794,32 @@ def _get_element_type_from_return_annotation(method, input_type):
           f"{method!r}, did you mean Iterator[{return_type}]?")
 
   def get_output_batch_type(
-      self, input_element_type) -> typing.Optional[TypeConstraint]:
+      self, input_element_type
+  ) -> typing.Optional[typing.Union[TypeConstraint, type]]:
+    """Determine the batch type produced by this DoFn's ``process_batch``
+    implementation and/or it's ``process`` implementation with
+    ``@yields_batch``.
+
+    The default implementation of this method observes the return type
+    annotations on ``process_batch`` and/or ``process``.  A Batched DoFn may
+    override this method if a dynamic approach is required.
+
+    Args:
+      input_element_type: The **element type** of the input PCollection this
+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn will never yield batches, a Beam typehint or

Review Comment:
   I want "Beam typehint or native typehint" as a unit to be the "else" clause. I updated the language to make that explicit instead of applying this. Thanks for pointing it out



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -737,7 +737,23 @@ def process_yields_batches(self) -> bool:
   def process_batch_yields_elements(self) -> bool:
     return getattr(self.process_batch, '_beam_yields_elements', False)
 
-  def get_input_batch_type(self) -> typing.Optional[TypeConstraint]:
+  def get_input_batch_type(
+      self, input_element_type
+  ) -> typing.Optional[typing.Union[TypeConstraint, type]]:
+    """Determine the batch type expected as input to process_batch.
+
+    The default implementation of ``get_input_batch_type`` simply observes the
+    input typehint for the first parameter of ``process_batch``. A Batched DoFn
+    may override this method if a dynamic approach is required.
+
+    Args:
+      input_element_type: The **element type** of the input PCollection this
+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn cannot accept batches, a Beam typehint or a native

Review Comment:
   Same here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on PR #21844:
URL: https://github.com/apache/beam/pull/21844#issuecomment-1155502739

   Thanks @yeandy!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897113202


##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -46,6 +46,17 @@ def process_batch(self, batch: List[int], *args, **kwargs):
     yield [element * 2 for element in batch]
 
 
+class BatchDoFnOverrideTypeInference(beam.DoFn):
+  def process_batch(self, batch, *args, **kwargs):
+    yield [element * 2 for element in batch]
+
+  def get_input_batch_type(self, input_element_type):
+    return List[input_element_type]
+
+  def get_output_batch_type(self, input_element_type):
+    return List[input_element_type]

Review Comment:
   Yes, users may want to do that. This is fine as-is for our purposes since it's just a test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r896688952


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -770,16 +794,32 @@ def _get_element_type_from_return_annotation(method, input_type):
           f"{method!r}, did you mean Iterator[{return_type}]?")
 
   def get_output_batch_type(
-      self, input_element_type) -> typing.Optional[TypeConstraint]:
+      self, input_element_type
+  ) -> typing.Optional[typing.Union[TypeConstraint, type]]:
+    """Determine the batch type produced by this DoFn's ``process_batch``
+    implementation and/or it's ``process`` implementation with

Review Comment:
   ```suggestion
       implementation and/or its ``process`` implementation with
   ```



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+

Review Comment:
   Remove extra line
   ```suggestion
   ```



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -737,7 +737,23 @@ def process_yields_batches(self) -> bool:
   def process_batch_yields_elements(self) -> bool:
     return getattr(self.process_batch, '_beam_yields_elements', False)
 
-  def get_input_batch_type(self) -> typing.Optional[TypeConstraint]:
+  def get_input_batch_type(
+      self, input_element_type
+  ) -> typing.Optional[typing.Union[TypeConstraint, type]]:
+    """Determine the batch type expected as input to process_batch.
+
+    The default implementation of ``get_input_batch_type`` simply observes the
+    input typehint for the first parameter of ``process_batch``. A Batched DoFn
+    may override this method if a dynamic approach is required.
+
+    Args:
+      input_element_type: The **element type** of the input PCollection this
+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn cannot accept batches, a Beam typehint or a native

Review Comment:
   ```suggestion
         ``None`` if this DoFn cannot accept batches, a Beam typehint, or a native
   ```



##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -46,6 +46,17 @@ def process_batch(self, batch: List[int], *args, **kwargs):
     yield [element * 2 for element in batch]
 
 
+class BatchDoFnOverrideTypeInference(beam.DoFn):
+  def process_batch(self, batch, *args, **kwargs):
+    yield [element * 2 for element in batch]
+
+  def get_input_batch_type(self, input_element_type):
+    return List[input_element_type]
+
+  def get_output_batch_type(self, input_element_type):
+    return List[input_element_type]

Review Comment:
   Would it make more sense for users to call `self.get_input_batch_type(input_element_type)` instead of repeating `List[input_element_type]`?



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+
+    class ArrayMultiplyTransposedDoFn(beam.DoFn):

Review Comment:
   I might be misunderstanding something, but why is "Transposed" in the name?



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -746,10 +762,18 @@ def get_input_batch_type(self) -> typing.Optional[TypeConstraint]:
       # TODO(BEAM-14340): Consider supporting an alternative (dynamic?) approach
       # for declaring input type
       raise TypeError(
-          f"{self.__class__.__name__}.process_batch() does not have a type "
-          "annotation on its first parameter. This is required for "
-          "process_batch implementations.")
-    return typehints.native_type_compatibility.convert_to_beam_type(input_type)
+          f"Either {self.__class__.__name__}.process_batch() must have a type "
+          f"annotation on its first parameter, or {self.__class__.__name__} "
+          "must override get_input_batch_type.")
+    return input_type
+
+  def _get_input_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_input_batch_type(input_element_type))
+
+  def _get_output_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_output_batch_type(input_element_type))

Review Comment:
   Why are these private functions? Is it because normalizing to Beam types isn't going to be a common op?



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -770,16 +794,32 @@ def _get_element_type_from_return_annotation(method, input_type):
           f"{method!r}, did you mean Iterator[{return_type}]?")
 
   def get_output_batch_type(
-      self, input_element_type) -> typing.Optional[TypeConstraint]:
+      self, input_element_type
+  ) -> typing.Optional[typing.Union[TypeConstraint, type]]:
+    """Determine the batch type produced by this DoFn's ``process_batch``
+    implementation and/or it's ``process`` implementation with
+    ``@yields_batch``.
+
+    The default implementation of this method observes the return type
+    annotations on ``process_batch`` and/or ``process``.  A Batched DoFn may
+    override this method if a dynamic approach is required.
+
+    Args:
+      input_element_type: The **element type** of the input PCollection this
+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn will never yield batches, a Beam typehint or

Review Comment:
   ```suggestion
         ``None`` if this DoFn will never yield batches, a Beam typehint, or
   ```



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+
+    class ArrayMultiplyTransposedDoFn(beam.DoFn):
+

Review Comment:
   Remove extra line
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on PR #21844:
URL: https://github.com/apache/beam/pull/21844#issuecomment-1154598720

   R: @yeandy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897115982


##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897102742


##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+
+    class ArrayMultiplyTransposedDoFn(beam.DoFn):

Review Comment:
   Whoops. I was originally trying something fancy here but reverted it. I'll change it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit merged pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit merged PR #21844:
URL: https://github.com/apache/beam/pull/21844


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897115826


##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -746,10 +762,18 @@ def get_input_batch_type(self) -> typing.Optional[TypeConstraint]:
       # TODO(BEAM-14340): Consider supporting an alternative (dynamic?) approach
       # for declaring input type
       raise TypeError(
-          f"{self.__class__.__name__}.process_batch() does not have a type "
-          "annotation on its first parameter. This is required for "
-          "process_batch implementations.")
-    return typehints.native_type_compatibility.convert_to_beam_type(input_type)
+          f"Either {self.__class__.__name__}.process_batch() must have a type "
+          f"annotation on its first parameter, or {self.__class__.__name__} "
+          "must override get_input_batch_type.")
+    return input_type
+
+  def _get_input_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_input_batch_type(input_element_type))
+
+  def _get_output_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_output_batch_type(input_element_type))

Review Comment:
   These are convenience functions I provided for our internal use, users shouldn't call them. Users shouldn't call the others (`get_{input,output}_batch_type`) either - but they are part of the public API since users can override them if they need to.
   
   Come to think of it I should probably mark some other convenience functions we added as protected. I'll follow up with a PR for that.



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -136,6 +136,37 @@ def test_batch_pardo(self):
 
       assert_that(res, equal_to([6, 12, 18]))
 
+  def test_batch_pardo_override_type_inference(self):
+
+    class ArrayMultiplyTransposedDoFn(beam.DoFn):
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #21844: Document and test overriding batch type inference

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #21844:
URL: https://github.com/apache/beam/pull/21844#discussion_r897113202


##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -46,6 +46,17 @@ def process_batch(self, batch: List[int], *args, **kwargs):
     yield [element * 2 for element in batch]
 
 
+class BatchDoFnOverrideTypeInference(beam.DoFn):
+  def process_batch(self, batch, *args, **kwargs):
+    yield [element * 2 for element in batch]
+
+  def get_input_batch_type(self, input_element_type):
+    return List[input_element_type]
+
+  def get_output_batch_type(self, input_element_type):
+    return List[input_element_type]

Review Comment:
   Yes, users may want to do that. This is fine as-is though for our purposes since it's just a test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org