You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anil Dasari <ad...@guidewire.com> on 2021/12/20 06:01:50 UTC

Spark 3.0 plugins

Hello everyone,

I was going through Apache Spark Performance Monitoring in Spark 3.0<https://www.youtube.com/watch?v=WFXzoRalwSg> talk and wanted to collect IO metrics for my spark application.
Couldn’t find Spark 3.0 built-in plugins for IO metrics like https://github.com/cerndb/SparkPlugins  in Spark 3 documentation. Does spark 3 bundle have in-built IO metric plugins ? Thanks in advance.

Regards,
Anil


RE: Spark 3.0 plugins

Posted by Luca Canali <lu...@cern.ch>.
Hi Anil,

 

To recap: Apache Spark plugins are an interface and configuration that allows to inject code on executor start-up and, among others, provide a hook to the Spark metrics system. This provides a way to extend metrics collection beyond what is available in Apache Spark.   

Instrumenting some parts of the Spark workload with plugins provides additional flexibility compared to instrumentation that is committed in the Apache Spark code, as only users who want to activate it can do so and also they can play with configuration that may be customized for their environment, so not necessarily suitable for all possible uses of Apache Spark code.  

 

The repository https://github.com/cerndb/SparkPlugins that you mentioned provides code that implements a few Spark plugins that I developed and found useful, including plugins for measuring (some) I/O metrics.  

At present this is “third-party code”, you are most welcome to use, although it is not yet part of the Apache Spark project. I’d say it may end up there, as a set of examples maybe, if more people find this type of instrumentation useful.  

 

You referenced in your mail to the DATA+AI summit talk  What is New with Apache Spark Performance Monitoring in Spark 3.0 - Databricks <https://databricks.com/session_eu20/what-is-new-with-apache-spark-performance-monitoring-in-spark-3-0>  you can also find additional work on this in the DATA+AI summit 2021 talk Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins - Databricks <https://databricks.com/session_na21/monitor-apache-spark-3-on-kubernetes-using-metrics-and-plugins> 

 

Best,

Luca

 

From: Anil Dasari <ad...@guidewire.com> 
Sent: Monday, December 20, 2021 07:02
To: user@spark.apache.org
Subject: Spark 3.0 plugins

 

Hello everyone,

 

I was going through Apache Spark Performance Monitoring in Spark 3.0 <https://www.youtube.com/watch?v=WFXzoRalwSg>  talk and wanted to collect IO metrics for my spark application. 

Couldn’t find Spark 3.0 built-in plugins for IO metrics like https://github.com/cerndb/SparkPlugins  in Spark 3 documentation. Does spark 3 bundle have in-built IO metric plugins ? Thanks in advance.

 

Regards,

Anil