You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Christine Gong <ch...@gmail.com> on 2020/09/28 07:21:50 UTC

[Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

What should i do to expose my own custom prometheus metrics for cluster
mode spark streaming job?

I want to run a spark streaming job to read from kafka , do some
calculations and write to localhost prometheus on port 9111.
https://github.com/jaegertracing/jaeger-analytics-java/blob/master/spark/src/main/java/io/jaegertracing/analytics/spark/SparkRunner.java#L47
is it possible to have the prometheus available in executors? I tried both
emr cluster as well as k8s, only local mode works (the metrics are
available on driver's 9111 only)
Looks like the prometheus servlet sink is my best option? Any advice would
be much appreciated!!

Thanks,
Christine

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

Posted by christinegong <ch...@gmail.com>.
Hi,
In the spark job, it exports to prometheus localhost http server, to be
later scraped by prometheus service.
(https://github.com/prometheus/client_java#http) The problem here is when
ssh to the emr instances themselves, only can see the metrics on (e.g. curl
localhost:9111) driver in local mode. If i run the spark job in cluster
mode, the localhost:9111 is still available for curl but no data, executors
i can not curl at all. Same scenario when running in kubernetes, i also
check if the metrics are there or not by execing into the containers.
Hope that makes the question clear here
Prometheus supports pushgateway but thats for short batch job, not sure if
thats applicable for my long running streaming job
Thanks! 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

Posted by Artemis User <ar...@dtechspace.com>.
I am confused with your question.  Are you running a the Spark cluster 
on AWS EMR and trying to output the result to a Prometheus instance 
running on your localhost?   Isn't your localhost behind the firewall 
and not accessible by AWS?  What does it mean "have prometheus available 
in executors"?   Apparently you need to have a Prometheus instance 
running on AWS so your EMR cluster can access easily.

Directing Spark output/sink to Prometheus would be difficult. The ideal 
integration scenario would be to write a Spark customer connector that 
uses the Prometheus client library to populate your Spark processing 
result directly in Prometheus' database.  Hope this helps...

-- ND

On 9/28/20 3:21 AM, Christine Gong wrote:
> What should i do to expose my own custom prometheus metrics for 
> cluster mode spark streaming job?
>
> I want to run a spark streaming job to read from kafka , do some 
> calculations and write to localhost prometheus on port 9111. 
> https://github.com/jaegertracing/jaeger-analytics-java/blob/master/spark/src/main/java/io/jaegertracing/analytics/spark/SparkRunner.java#L47 
> is it possible to have the prometheus available in executors? I tried 
> both emr cluster as well as k8s, only local mode works (the metrics 
> are available on driver's 9111 only)
> Looks like the prometheus servlet sink is my best option? Any advice 
> would be much appreciated!!
>
> Thanks,
> Christine

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org