You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Luca Canali (JIRA)" <ji...@apache.org> on 2019/06/18 07:18:00 UTC

[jira] [Created] (SPARK-28091) Extend Spark metrics system with executor plugin metrics

Luca Canali created SPARK-28091:
-----------------------------------

Summary: Extend Spark metrics system with executor plugin metrics
Key: SPARK-28091
URL: https://issues.apache.org/jira/browse/SPARK-28091
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.0
Reporter: Luca Canali

This proposes to improve Spark instrumentation by adding a hook for Spark executor plugin metrics to the Spark metrics systems implemented with the Dropwizard/Codahale library.

Context: The Spark metrics system provides a large variety of metrics, see also SPARK-26890, useful to monitor and troubleshoot Spark workloads. A typical workflow is to sink the metrics to a storage system and build dashboards on top of that.

Improvement: The original goal of this work was to add instrumentation for S3 filesystem access metrics by Spark job. Currently, [[ExecutorSource]] instruments HDFS and local filesystem metrics. Rather than extending the code there, we proposes to add a metrics plugin system which is of more flexible and general use.

Advantages:
* The metric plugin system makes it easy to implement instrumentation for S3 access by Spark jobs.
* The metrics plugin system allows for easy extensions of how Spark collects HDFS-related workload metrics. This is currently done using the Hadoop Filesystem GetAllStatistics method, which is deprecated in recent versions of Hadoop. Recent versions of Hadoop Filesystem recommend using method GetGlobalStorageStatistics, which also provides several additional metrics. GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an easy way to “opt in” using such new API calls for those deploying suitable Hadoop versions.
* We also have the use case of adding Hadoop filesystem monitoring for a custom Hadoop compliant filesystem in use in our organization (EOS using the XRootD protocol). The metrics plugin infrastructure makes this easy to do. Others may have similar use cases.
* More generally, this method makes it straightforward to plug in Filesystem and other metrics to the Spark monitoring system. Future work on plugin implementation can address extending monitoring to measure usage of external resources (OS, filesystem, network, accelerator cards, etc), that maybe would not normally be considered general enough for inclusion in Apache Spark code, but that can be nevertheless useful for specialized use cases, tests or troubleshooting.

Implementation:

The proposed implementation is currently a WIP open for comments and improvements. It is based on the work on Executor Plugin of SPARK-24918 and builds on recent work on extending Spark executor metrics, such as SPARK-25228

Tests and examples:

This has been so far manually tested running Spark on YARN and K8S clusters, in particular for monitoring S3 and for extending HDFS instrumentation with the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric plugin example and code used for testing are available.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org