You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dávid Szakállas <da...@gmail.com> on 2021/02/04 20:29:26 UTC

Exporting all Executor Metrics in Prometheus format in K8s cluster

I’ve been trying to set up monitoring for our Spark 3.0.1 cluster running in K8s. We are using Prometheus as our monitoring system. We require both executor and driver metrics. My initial approach was to use the following configuration, to expose both  metrics on the Spark UI:

{
    'spark.ui.prometheus.enabled': ‘true’
}

I was able to scrape http://<driver_hostname>:4040/metrics/prometheus/ for driver and http://<driver_hostname>:4040/metrics/executors/prometheus/ for executor metrics. However, the executor metrics only contain those defined here: https://spark.apache.org/docs/latest/monitoring.html#executor-metrics <https://spark.apache.org/docs/latest/monitoring.html#executor-metrics>, which is referred to as ExecutorSummary. However, I would like to get all metrics from the Executor instance metric system: https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor <https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor>.

I am not sure if these are available on the driver at all, so I’ve been thinking of directly scraping the executors instead. It seems PrometheusServlet is meant for this purpose, however the executors aren't running web servers. I also don’t seem to find a configuration setting to open up a port on the executor container, so that it can be scraped. So the thing I have in my mind right now is writing a custom sink that exports the metrics in the Prometheus format to a local file, and running a sidecar container with a nginx that serves that static file. In turn the nginx endpoint can be scraped by Prometheus. Am I overcomplicating this? Is there a simpler approach?

Thanks,
David Szakallas