You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/07 19:14:55 UTC

[GitHub] [beam] TheNeuralBit commented on issue #21440: Hook In Batching DoFn Apis to RunInference

TheNeuralBit commented on issue #21440:
URL: https://github.com/apache/beam/issues/21440#issuecomment-1239777760

   Spoke with @yeandy about this today. We discussed how to implement a pytorch BatchConverter (should mostly be a copy paste job from the numpy one), and how to port RunInference over to using Batched DoFns, while maintaining backward compatibility guarantees.
   
   One backward compatibility concern is each ModelHandler's public API. Each ModelHandler will likely need to add arguments for users to specify input/output typehints in ModelHandlers. We could make these new arguments backwards compatible by defining default values that preserve the existing behavior (e.g. in pytorch, the default batch input type will have 
   
   Another  backward compatibility concern is changing RunInference DoFn to implement process_batch, while still supporting existing ModelHandler implementations. We could likely do this by augmenting the ModelHandler API s.t. base RunInference can use either the conventional approach (BatchElements  + process), or the Batched DoFn approach (process_batch with dynamic typehints).
   
   Another potential feature Andy raised in his [dev@ thread and doc](https://lists.apache.org/thread/rrjb4h451oyhygln87j6oq51hjy2r1tv) is enabling merging already batched inputs (e.g. np.concatenate rather than np.stack - the latter creates a new dimension, where as the former concatenates across an existing one). Ultimately we should be able to leverage `combine_batches` for this:
   https://github.com/apache/beam/blob/0d937d4cd725965572d4720811fa2d6efaa8edf8/sdks/python/apache_beam/typehints/batch.py#L212-L213
   
   but some work still needs to be done there (e.g. we need a way for users to declare how big they'd like their batches to be).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org