You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/24 01:00:34 UTC

[GitHub] [beam] tvalentyn commented on pull request #13220: Optimizes extract_output for 1 element accumulator case.

tvalentyn commented on pull request #13220:
URL: https://github.com/apache/beam/pull/13220#issuecomment-750674335


   We need to revert this change. 
   
   Beam Combiners are associative and commutative reduction operators. It is reasonable to assume that the input `fn` to [`CombineFn.from_callable()`](https://github.com/apache/beam/blob/fbeb28e7d974c79a8edf00ac80c72fe0ef35f293/sdks/python/apache_beam/transforms/core.py#L980) must be an associative and commutative reduction. Being idempotent on single-element inputs not a requirement. There can be  potentially useful combiners that are not idempotent, for example a combiner that checks whether a particular element is in a PCollection,  generated by `CombineFn.from_callable(lambda input_list: 'some_element' in input_list)`.
   
   Beam programming guide[1] also mentions this: 
   > When you apply a Combine transform, you must provide the function that contains the logic for combining the elements or values. The combining function should be commutative and associative, as the function is not necessarily invoked exactly once on all values with a given key. 
   
   The implementation of `CallableWrapperCombineFn`/`NoSideInputsCallableWrapperCombineFn` should guarantee that if the input fn is an associative and commutative reduction, the resulting combiner produced by the result of wrapping the fn will also be an associative and commutative reduction, producing the same behavior.
   
   With this change, we no longer have this guarantee. Calls to [CallableWrapperCombineFn.add_input()](https://github.com/apache/beam/blob/fbeb28e7d974c79a8edf00ac80c72fe0ef35f293/sdks/python/apache_beam/transforms/core.py#L1056) append an element to an accumulator, but if there is only one accumulator with only one element, we assume it is already a combined value and don't call `self._fn()` (the intent of the optimization in this change). For non-idempotent reducers, this is not the case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org