You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 23:31:29 UTC

[GitHub] [beam] damccorm opened a new issue, #21437: Add Metrics to base class

damccorm opened a new issue, #21437:
URL: https://github.com/apache/beam/issues/21437

   `Add a metric that will track prediction failures.`
   
   `num_failed_inferences`
   
   ``
   
   `Add metrics that track data on side loaded models`
   
   `{}num_model_loads{`}, `{}num_model_versions{`}, and `curr_model_version`
   
   ``
   
   `curr_model_version is less a metric and more of a static value. Part of this FR will be to figure out how that fits.`
   
   Imported from Jira [BEAM-14043](https://issues.apache.org/jira/browse/BEAM-14043). Original Jira may contain additional context.
   Reported by: Ryan.Thompson.
   Subtask of issue #21435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

damccorm commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1247087175

   Actually, I think there could be value in adding `num_failed_inferences` before worrying about the side input model loading stuff. That would probably involve putting some error handling around our run_inference call:
   
   https://github.com/apache/beam/blob/2d4f61c93250fb41112a2535394e7328bc1fdf95/sdks/python/apache_beam/ml/inference/base.py#L418
   
   and have that update the metrics count. @BjornPrime as you free up would you mind working on this? We probably can't totally close out the issue, but we can make a dent in it.
   
   cc/ @yeandy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rezarokni commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

rezarokni commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1262867983

   I think we should borrow from best practice patterns in beam I/O's. And return errors along with the error message in the Prediction object.
   
   Agree a flag would be nice user experience:
   
   Config 1- Fail if any error
   
   If Config 1 ( false ) 
   Config 2- Retry error n times ( maybe n = 3 ?)
   Config 3- Return error message along with data point in Prediction object
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] BjornPrime commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

BjornPrime commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1271699066

   So, just to make sure I'm understanding, one failed inference doesn't interrupt the rest of the batch from completing? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] BjornPrime commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

BjornPrime commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1270552212

I'm just imagining a situation where we want to update one metric and not all of the others. For example, we could have a method `update_metrics(metrics)`, where "metrics" is a dict with names of metrics as keys and numerical values that are interpreted according to the metric they're associated with. So if I run `update_metrics({"_num_failed_inferences": 2})`, it would increment _num_failed_inferences twice, but if the key was _inference_batch_latency_micro_secs, it would know to just set that equal to 2. And you could run both together with one call if you include both key-value pairs.
Maybe that's too generic for our purposes. It depends on how likely we are to want to update specific metrics but not others. Valentyn has also pointed out that none of this is directly exposed so if we need to change it later, we shouldn't end up breaking anybody else.
I think we do need to consider how a failure affects the other metrics though. For example, the inference counter basically just adds the number of items in the batch after the batch finishes, but if the batch is interrupted by failures, that number won't be accurate will it? Or maybe I'm misunderstanding what's going on under the hood of run_inference.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] BjornPrime commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

BjornPrime commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1247132410

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rezarokni commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

rezarokni commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263838353

   SG, lets deffer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

yeandy commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1271521797

I like this proposal, as it makes it more flexible.

> it would increment _num_failed_inferences twice

To be clear, there are two types: `Metrics.counter()` and `Metrics.distribution`, where we "increment" the `self._inference_counter` and "update" the distributions of `self._inference_batch_latency_micro_secs`, `self._inference_request_batch_size`, `self._inference_request_batch_byte_size`.

So we should also have something like `increment_metrics()` for `_inference_counter` and `_failed_inference_counter` (for `_num_failed_inferences`).

> I think we do need to consider how a failure affects the other metrics though. For example, the inference counter basically just adds the number of items in the batch after the batch finishes, but if the batch is interrupted by failures, that number won't be accurate will it? Or maybe I'm misunderstanding what's going on under the hood of run_inference.

That's the right interpretation. See my comment https://github.com/apache/beam/issues/21437#issuecomment-1268707869. I think the only thing we would want to update is `_num_failed_inferences` with the number of elements of that batch that did fail.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

damccorm commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1235544867

   This probably comes after #22970


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

yeandy commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1272074679

   To my knowledge, particularly for scikit-learn and pytorch, if the numpy array or torch tensor that we pass to the model contains even one invalid/malformed instance, then predicting for the entire batch will fail. But we may need to do some testing, and more digging around to confirm, especially for TensorFlow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

damccorm commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263620466

   > I think we should borrow from best practice patterns in beam I/O's. And return errors along with the error message in the Prediction object.
   
   +1, I think this is a good experience. It's worth noting that we don't force ModelHandlers to always return a PredictionResult (maybe we should?), but we can at least do this for our handlers. This logic will probably need to live in the modelHandler anyways.
   
   > Config 1- Fail if any error
   
   I'm probably -1 on this, I'd prefer to avoid a proliferation of options and to be a little more opinionated here. Failing is generally a bad experience IMO; in streaming pipelines this will potentially cause the pipeline to get stuck, in batch we fail the whole job. This is also a pretty trivial map operation in the next step if that's really what you want to do.
   
   > Config 2- Retry error n times ( maybe n = 3 ?)
   
   Retries are more interesting to me, I guess my question there is how often do models actually fail inference in a retryable way? I would anticipate most retries to be non-retryable, and am inclined to not do this flag either.
   
   -----------------------
   
   > We're working with a batch of data. If inference fails, do we consider that to be 1 num_failed_inferences? Or perhaps it should be renamed to num_failed_batch_inferences? It may depend on framework, but is there a way to figure out which specific data points in a batch fail? In which case we can also track num_failed_inferences.
   
   IMO handling this at the batch level is fine, anything more complex is probably more trouble than its worth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

damccorm commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263651307

   > I'd say most inference failures are due to mismatch in data shapes or data types. Retrying wouldn't help anyway since the user would need to go back and fix their pre-processing logic.
   
   Right, that's what I was thinking as well. If we get into remote inference we may want retries for network failures, but at a local level I don't think it makes much sense. ModelHandlers can also make that determination for themselves


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] BjornPrime commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

BjornPrime commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1270204362

   This may be beyond the scope of my assignment, but looking at the issue holistically, I think we're going to want a more generic `update_metric()` method that can pick out and update particular attributes. If we're going to be adding more attributes to _MetricsCollector, I don't think we'll necessarily want to be running the existing update() every time we change any given metric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

yeandy commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1270465630

   Not necessarily out of scope to pose a modification upon current implementation. Can you clarify w/ an example of having a more generic update? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

yeandy commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1262842580

   Several questions around the error handling:
   1. We're working with a batch of data. If inference fails, do we consider that to be 1 `num_failed_inferences`? Or perhaps it should be renamed to `num_failed_batch_inferences`? It may depend on framework, but is there a way to figure out which specific data points in a batch fail? In which case we can also track `num_failed_inferences`.
   2. Do we want to do any retries for the inference call? Maybe we shouldn't, since the default behavior for a model inference failure is to throw an exception. But if we do want retries, how many times? If it succeeds in, say 5 tries, do we swallow the error and ignore increasing the counter? Or do we still increase the counter? 
   3. Or should the pipeline always fail?
   4. Should we give users a flag to choose between the behavior?
   @rezarokni Do you have any idea on points 2-4?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

damccorm commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263824209

   > Yes future remote use cases are why I think retry is useful, as there can be many transient errors, from network to overloaded end points. As we are adding config now, perhaps we can have retry but set the default for 1
   
   My vote would be to defer it. I don't think it gets us anything right now, and we may decide to just bake in retries on network/server errors no matter what in the remote inference case (that's what I'd vote we do). If we do it now, we're stuck with that option regardless of whether it ends up being useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rezarokni commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

rezarokni commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263819536

   _Right, that's what I was thinking as well. If we get into remote inference we may want retries for network failures, but at a local level I don't think it makes much sense. ModelHandlers can also make that determination for themselves_
   
   Yes future remote use cases are why I think retry is useful, as there can be many transient errors, from network to overloaded end points. As we are adding config now, perhaps we can have retry but set the default for 1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on issue #21437: Add Metrics to base class

Posted by GitBox <gi...@apache.org>.

yeandy commented on issue #21437:
URL: https://github.com/apache/beam/issues/21437#issuecomment-1263646517

   > Retries are more interesting to me, I guess my question there is how often do models actually fail inference in a retryable way? I would anticipate most retries to be non-retryable, and am inclined to not do this flag either.
   
   I'd say most inference failures are due to mismatch in data shapes or data types. Retrying wouldn't help anyway since the user would need to go back and fix their pre-processing logic.
   
   > IMO handling this at the batch level is fine, anything more complex is probably more trouble than its worth.
   
   +1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org