You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/09 03:15:04 UTC

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591
 
 
   @dongjoon-hyun Thanks for fixing this. 
   I have several questions on this.
   
   1. Short-lived metrics
   As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately?  Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application?
   Check this [blog]( https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) about short-lived metrics for Prometheus.
   
   2. Cardinality
    It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. See more information about prometheus's cardinality issue as [here](https://www.robustperception.io/cardinality-is-key) as well as this [doc](https://prometheus.io/docs/practices/naming/#labels)
   
   If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics and assume it has M workers on average. This will cause a heavy load for a traditional Prometheus server. There are several solutions([M3](https://eng.uber.com/m3/), [Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/), [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address this issue, but we should make it clear about the cardinality for users to use such metrics.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org