You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2016/07/27 17:14:20 UTC

[jira] [Resolved] (SPARK-5847) Allow for configuring MetricsSystem's use of app ID to namespace all metrics

     [ https://issues.apache.org/jira/browse/SPARK-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-5847.
-----------------------------------
       Resolution: Fixed
         Assignee: Mark Grover
    Fix Version/s: 2.1.0

> Allow for configuring MetricsSystem's use of app ID to namespace all metrics
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-5847
>                 URL: https://issues.apache.org/jira/browse/SPARK-5847
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.1
>            Reporter: Ryan Williams
>            Assignee: Mark Grover
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> {{MetricsSystem}} [currently prepends the app ID to all metrics|https://github.com/apache/spark/blob/c51ab37faddf4ede23243058dfb388e74a192552/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L131].
> When reading Spark metrics in Graphite, I've found this to not always be desirable. Graphite is designed to track a mostly-unchanging set of metrics over time; it allocates large zeroed-out files for each metric it sees, and [by default rate-limits itself from creating many of these|https://github.com/graphite-project/carbon/blob/79158ffde5949b4056eb7fdb5e9b6b583fe21ea4/conf/carbon.conf.example#L61-L68].
> App-ID namespacing means that Graphite is allocating disk-space for every "metric" for every job it sees, when in reality some metrics may correspond to others across jobs (e.g. driver JVM stats).
> Some common Spark usage flows would be better modeled by e.g. namespacing metrics by {{spark.app.name}}, so that successive runs of a given job would share "metrics", from a storage perspective as well as allowing for monitoring aspects of a job's performance over time / many runs.
> There's not likely a one-size-fits-all solution here, so I'd propose allowing the metrics config file to allow users to specify whether they'd like metrics namespaced by {{spark.app.id}}, {{spark.app.name}}, or some other config param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org