You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 16:22:26 UTC

[GitHub] [beam] kennknowles opened a new issue, #18121: Allow a DoFn to opt in to mutating it's input

kennknowles opened a new issue, #18121:
URL: https://github.com/apache/beam/issues/18121

   Runners generally can't tell if a DoFn is mutating inputs, but assuming so by default leads to significant performance implications from unnecessary copying (around sibling fusion, etc). So instead the model prevents mutating inputs, and the Direct Runner validates this behavior. (See: http://beam.incubator.apache.org/contribute/design-principles/#make-efficient-things-easy-rather-than-make-easy-things-efficient) 
   
   However, if users are processing a small number of large records by making incremental changes (for example, genomics use cases), the cost of immutability requirement can be very large. As a workaround, users sometimes do suboptimal things (fusing ParDos by hand) or undefined things when they expect the immutability requirement is unnecessarily strict (adding no-op coders in places they hope the runner won't be materializing things, mutating things anyway when they don't expect sibling fusion to happen, etc).
   
   We should consider adding a signal (MutatingDoFn?) that users explicitly opt in to to say their code may mutate inputs. The runner can then use this assumption to either prevent optimizations that would break in the face of this or insert additional copies as needed to allow optimizations to preserve semantics.
   
   See this related user@ discussion:
   https://lists.apache.org/thread.html/f39689f54147117f3fc54c498eff1a20fa73f1be5b5cad5b6f816fd3@%3Cuser.beam.apache.org%3E
   
   Imported from Jira [BEAM-1164](https://issues.apache.org/jira/browse/BEAM-1164). Original Jira may contain additional context.
   Reported by: frances.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] cozos commented on issue #18121: Allow a DoFn to opt in to mutating it's input

Posted by "cozos (via GitHub)" <gi...@apache.org>.
cozos commented on issue #18121:
URL: https://github.com/apache/beam/issues/18121#issuecomment-1524832762

   @kennknowles @francesperry 
   I'm trying to understand this better. So a DoFn does something like this in Python:
   
   ```
   def process(self, element: Dict[str, Any]) -> Iterator[Dict[str, Any]]:
       element["new_key"] = "new_value"
       yield element
   ```
   
   Bad things will happen? Under what conditions? Is this documented anywhere in Beam? I can't find anything here: https://beam.apache.org/documentation/programming-guide/#pardo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #18121: Allow a DoFn to opt in to mutating it's input

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #18121:
URL: https://github.com/apache/beam/issues/18121#issuecomment-1526026106

   Basically if we did add this feature, it would be pretty much the same as a user making a copy of the element right away in their mutating DoFn. In some cases if we knew that the element would not be re-used we could save that cost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #18121: Allow a DoFn to opt in to mutating it's input

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #18121:
URL: https://github.com/apache/beam/issues/18121#issuecomment-1526024204

   The element can be passed to other DoFns also. So you end up sending incorrect data down other paths.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org