You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/15 16:35:37 UTC

[GitHub] [beam] yeandy opened a new issue, #21894: Move validation of Sklearn inference_args to construction time

yeandy opened a new issue, #21894:
URL: https://github.com/apache/beam/issues/21894

   ### What would you like to happen?
   
   Sklearn RunInference, at execution time, verifies that no `inference_args` are passed. Make a change to shorten feedback loop by at doing verification at construction time.
   
   Discussion: https://github.com/apache/beam/pull/21806#discussion_r898128282
   
   Parent Issue: https://github.com/apache/beam/issues/21435
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: sdk-py-core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed issue #21894: Move validation of Sklearn inference_args to construction time

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #21894: Move validation of Sklearn inference_args to construction time
URL: https://github.com/apache/beam/issues/21894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on issue #21894: Move validation of Sklearn inference_args to construction time

Posted by GitBox <gi...@apache.org>.
yeandy commented on issue #21894:
URL: https://github.com/apache/beam/issues/21894#issuecomment-1183632885

   1. Seems like an extra friction point as it will be an additional thing users would have to learn and implement. Doable, but prefer not to do this over choice 2 (see below).
   2. I think a no-op would be the easiest to for others to use/understand. We can add addition documentation explaining how this parameter is optional to use.
   3. If it's moved into `base.py`, could you illustrate how would we make the logic different for the different frameworks? 
   4. Would this cause compatibility issues with the existing interface for the `run_inference` function since it expects `inference_args`? 
   ```
     def run_inference(
         self,
         batch: Sequence[ExampleT],
         model: ModelT,
         inference_args: Optional[Dict[str, Any]] = None) -> Iterable[PredictionT]:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21894: Move validation of Sklearn inference_args to construction time

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21894:
URL: https://github.com/apache/beam/issues/21894#issuecomment-1184825652

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21894: Move validation of Sklearn inference_args to construction time

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21894:
URL: https://github.com/apache/beam/issues/21894#issuecomment-1227409211

   .close-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21894: Move validation of Sklearn inference_args to construction time

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21894:
URL: https://github.com/apache/beam/issues/21894#issuecomment-1182501129

   Sorry to overthink this.
   
   Since this is a side input of run_inference, it's not possible to know at construction time of ModelHandler, but we could know when RunInference is created I think.
   
   ### Idea 1:
   
   Make a method in ModelHandler:
   ```
   def framework_expects_parameters():
     # Override this to be true if your framework expects parameters.
     return False
   ```
   Then in base.py we could validate.  This should happen in the runner instead of the worker and be very fast.  This should also be less code, since I saw nVidia copy this validation into their implementation.
   
   ### Idea 2
   Just get rid of this validation, passing in parameters to a framework that can't use them could be a no op.  Right now this is what I plan to do in TFX-BSL.  
   
   Think of it this way -- If we add more optional parameters in the future. It's not possible for all frameworks to be constantly updating to check that named parameters that they don't use and don't expect are not populated. In fact, no op should be the assumption in these cases.
   
   ### Idea 3
   Move this validation code into base.py... since nVidia and future frameworks are just copy-pasting this code why not make it available.
   
   ### Idea 4
   *my preference*
   
   pass in any optional arguments only if they are present:
   
   In base.py _RunInferenceDoFn becomes this
   ```
     def process(self, batch, inference_args):
       start_time = _to_microseconds(self._clock.time_ns())
       extra_kwargs = {}
       if inference_args:
        extra_kwargs[inference_args] = inference_args
       result_generator = self._model_handler.run_inference(
           batch, self._model, **extra_kwargs)
       predictions = list(result_generator)
   ```
   
   This has the advantage that it can scale to any amount of optional named arguments, and implementations that don't expect the extra args will just fail with an unexpected arg exception as expected.
   
   @tvalentyn @yeandy any preference?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org