You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/12 21:06:43 UTC

[GitHub] [beam] ryanthompson591 commented on issue #21894: Move validation of Sklearn inference_args to construction time

ryanthompson591 commented on issue #21894:
URL: https://github.com/apache/beam/issues/21894#issuecomment-1182501129

   Sorry to overthink this.
   
   Since this is a side input of run_inference, it's not possible to know at construction time of ModelHandler, but we could know when RunInference is created I think.
   
   ### Idea 1:
   
   Make a method in ModelHandler:
   ```
   def framework_expects_parameters():
     # Override this to be true if your framework expects parameters.
     return False
   ```
   Then in base.py we could validate.  This should happen in the runner instead of the worker and be very fast.  This should also be less code, since I saw nVidia copy this validation into their implementation.
   
   ### Idea 2
   Just get rid of this validation, passing in parameters to a framework that can't use them could be a no op.  Right now this is what I plan to do in TFX-BSL.  
   
   Think of it this way -- If we add more optional parameters in the future. It's not possible for all frameworks to be constantly updating to check that named parameters that they don't use and don't expect are not populated. In fact, no op should be the assumption in these cases.
   
   ### Idea 3
   Move this validation code into base.py... since nVidia and future frameworks are just copy-pasting this code why not make it available.
   
   ### Idea 4
   *my preference*
   
   pass in any optional arguments only if they are present:
   
   In base.py _RunInferenceDoFn becomes this
   ```
     def process(self, batch, inference_args):
       start_time = _to_microseconds(self._clock.time_ns())
       extra_kwargs = {}
       if inference_args:
        extra_kwargs[inference_args] = inference_args
       result_generator = self._model_handler.run_inference(
           batch, self._model, **extra_kwargs)
       predictions = list(result_generator)
   ```
   
   This has the advantage that it can scale to any amount of optional named arguments, and implementations that don't expect the extra args will just fail with an unexpected arg exception as expected.
   
   @tvalentyn @yeandy any preference?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org