You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by steveloughran <gi...@git.apache.org> on 2015/11/09 19:31:05 UTC

[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

GitHub user steveloughran opened a pull request:

    https://github.com/apache/spark/pull/9571

    [SPARK-11373] [CORE] WiP Add metrics to the History Server and providers

    
    This adds metrics to the history server, with the `FsHistoryProvider` metering its load, performance and reliability.
    
    The HistoryServer sets up the codahale metrics for the Web under metrics/ with metrics/metrics behind metrics, metrics/health any health probes and metrics/threads a thread dump. There's currently no attempt to  hook up JMX, etc. The Web servlets are the ones tests can easily hit and don't need infrastructure, so are the good initial first step.
    
    It then passes the metrics and health registries down to the providers in a `ApplicationHistoryBinding` case class, via a new method
    
        def start(binding: ApplicationHistoryBinding): Unit
    
    The base class has implementation so that all existing providers will still link properly; the base implementation currently checks and fails
     the use of a binding case class is also to ensure that if new binding information were added in future, existing implementations would still link.
    
    The `FsHistoryProvider` implements the `start()` method, registering two counters and two timers.
    
    1. Number of update attempts and number of failed updates —and the same for app UI loads.
    2. Time for updates and app UI loads.
    
    Points of note
    
    * Why not use Spark's `MetricsSystem`? I did start off with that, but it needs a `SparkContext` to run off, which the server doesn't have. Ideally that would be way to go, as it would support all the spark conf -based metrics setup. Someone who understands the `MetricsSystem` would need to get involved here as would make for a more complex patch. In `FsHistoryProvider` the registry information is all kept in a `Source` subclass for ease of future migration to `MetricsSystem`.
    * Why the extra `HealthRegistry`? It's a nice way of allowing providers to indicate (possibly transient) health problems for monitoring tools/clients to hit. For the FS provider it could maybe flag when there hadn't been any successful update for a specified time period. (that could also be indicated by having a counter of "seconds since last update" and let monitoring tools monitor the counter value and act on it). Access control problems to the directory is something else which may be considered a liveness problem: it won't get better without human intervention
    * The `FsHistoryProvider.start()` method should really take the thread start code from from class constructor's `initialize()` method. This would ensure that incomplete classes don't get called by spawned threads, and makes it possible for test-time subclasses to skip thread startup. I've not attempted to do that in this patch.
    * No tests for this yet. Hitting the three metrics servlets in the HistoryServer is the obvious route; the JSON payload of the metrics can be parsed and scanned for relevant counters too. 
    * Part of the patch for `HistoryServerSuite` removes the call to `HistoryServer.initialize()` the `before` clause. That was a duplicate call, one which hit the re-entrancy tests on the provider & registry. As well as cutting it, `HistoryServer.initialize()` has been made idempotent. That should not be needed -but it will eliminate the problem arising again.
    
    Once the SPARK-1537 YARN timeline server history provider is committed, then I'll add metrics support there too. The YARN timeline provider would:
    
    1. Add timers of REST operations as well as playback load times, which can count network delays as well as JSON deserialization overhead. 
    2. Add a health check for connectivity too: the timeline server would be unhealthy if connections to the timeline server were either blocking or failing. And again, if there were security/auth problems, they'd be considered non-recoverable.
    3. Move thread launch under the `start()` method, with some test subclasses disabling thread launch.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark feature/SPARK-11373

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9571
    
----
commit 7ab9e3a3d2fdbcc11663176703874c573cfeafb8
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-09T16:32:33Z

    [SPARK-11373] First pass at adding metrics to the history server, with the FsHistoryProvider counting
    1. Number of update attempts and number of failed updates
    2. Time for updates and app UI loads
    
    The HistoryServer sets up the codahale metrics for the Web under metrics/ with metrics/metrics behind metrics, metrics/health any health probes and metrics/threads a thread dump.

commit cb7cddb0443d4d8fac7d63e604fe303f5f57d5cd
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-09T18:18:51Z

    [SPARK-11373] tests and review; found and fixed a re-entrancy in HistoryServerSuite which was causing problems with counter registration

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73623757
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    --- End diff --
    
    Maybe use `case (name, counter) =>`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    @steveloughran you still have not addressed my comment about `appUiLoadTimer`. The metric is still being registered even though it doesn't make any sense to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-180390047
  
    **[Test build #50822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50822/consoleFull)** for PR 9571 at commit [`3ceeb9e`](https://github.com/apache/spark/commit/3ceeb9ef2cd34d5d5eeccacd9e49477c133f0ee0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73623780
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    +        sb.append("    ").append(entry._1).append(" = ").append(entry._2.getCount).append('\n')
    +    }
    +    sb.append("  Gauges\n")
    +    metricRegistry.getGauges.asScala.foreach { entry =>
    --- End diff --
    
    Same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-218733235
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155153349
  
    **[Test build #45386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45386/consoleFull)** for PR 9571 at commit [`cb7cddb`](https://github.com/apache/spark/commit/cb7cddb0443d4d8fac7d63e604fe303f5f57d5cd).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76851430
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    Actually, sorry, not this one but `historyMergeTimer` below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76515026
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -225,14 +274,26 @@ class HistoryServer(
       }
     }
     
    +
    --- End diff --
    
    got it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155150767
  
    **[Test build #45386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45386/consoleFull)** for PR 9571 at commit [`cb7cddb`](https://github.com/apache/spark/commit/cb7cddb0443d4d8fac7d63e604fe303f5f57d5cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64586/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by CharlesYeh <gi...@git.apache.org>.
Github user CharlesYeh commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155863566
  
    I'm assuming you didn't make a HistorySource in HistoryServer (what MasterSource in Master does) because it requires the MetricsSystem.
    
    Where does MetricsSystem require a SparkContext? It looks like it just takes SparkConf which HistoryServer has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62319/consoleFull)** for PR 9571 at commit [`3971241`](https://github.com/apache/spark/commit/39712419d463bface2027fb7888f2c40f26fa412).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r66614402
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Metrics integration: the various counters of activity
    +   * Time a closure, returning its output.
    +   * @param timer timer
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      timeCtx.close()
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider) extends Source {
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Metrics registry. */
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    --- End diff --
    
    good point. That'd be a nice throughput. all that's needed is to count the #of events loaded and divide them for a playback rate, some events/second or nanos/event. I'd prefer the latter for any microbenchmarking; the events/second metric probably better for some monitoring display.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-168317081
  
    **[Test build #48567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48567/consoleFull)** for PR 9571 at commit [`06e5289`](https://github.com/apache/spark/commit/06e5289fae86b3d12102bf018cc301da9964e0d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-214871280
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76972524
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    what's actually being displayed is the average time/event, so it is independent of history size, it more measures system performance during replay.
    
    Even so, it may not be that useful, so I can cut it.
    
    What may actually be more interesting as a remotely accessible metric is a 0/1 boolean indicating whether or not replay has completed. There currently doesn't seem to be any way to ask the history server if all histories have loaded: the UI comes up while replay is in progress, meaning that after startup/restart, a query of a history may fail not because the app doesn' t have history, but because the replay hasn't completed yet. Exposing the state to management tools allows functional tests & other scripts to block until the history is fully loaded


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111913717
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ---
    @@ -410,34 +409,25 @@ private[history] class CacheMetrics(prefix: String) extends Source {
         ("update.triggered.count", updateTriggeredCount))
     
       /** all metrics, including timers */
    -  private val allMetrics = counters ++ Seq(
    +  private val allMetrics: Seq[(String, Metric with Counting)] = counters ++ Seq(
    --- End diff --
    
    Either I was just being explicit about what came in, or the IDE decided to get involved. Removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76851220
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    Similarly, this metric suffers from the same issue as `appUiLoadTimer` and probably shouldn't be exposed. Sorry should have pointed this out earlier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #63864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63864/consoleFull)** for PR 9571 at commit [`745d532`](https://github.com/apache/spark/commit/745d532b67b64149e86d2fdf0464cab616023eac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62318/consoleFull)** for PR 9571 at commit [`6e7f466`](https://github.com/apache/spark/commit/6e7f466da0681af2c40a6842ed6c542ca44c7bd5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67325987
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala ---
    @@ -110,3 +127,87 @@ private[history] abstract class ApplicationHistoryProvider {
       def writeEventLogs(appId: String, attemptId: Option[String], zipStream: ZipOutputStream): Unit
     
     }
    +
    +/**
    + * A simple counter of events.
    + * There is no concurrency support here: all events must come in sequentially.
    + */
    +private[history] class EventCountListener extends SparkListener {
    --- End diff --
    
    I was sure there was something to do this, I just couldn't find it when I looked. will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62318/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r66204561
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -114,28 +123,45 @@ class HistoryServer(
        * this UI with the event logs in the provided base directory.
        */
       def initialize() {
    -    attachPage(new HistoryPage(this))
    +    if (!initialized.getAndSet(true)) {
    --- End diff --
    
    I don't remember now, but it did happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r65786474
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -278,6 +303,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
         try {
    --- End diff --
    
    Don't you need to indent this block of code now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #75704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75704/testReport)** for PR 9571 at commit [`ec1f2d7`](https://github.com/apache/spark/commit/ec1f2d7f8743ce6de3e83f2f9a82f1c940c8be52).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64359/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155148739
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63864/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    sorry, didn't see that one. Will fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r77030612
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    > I think the total load time does need to be measured
    
    That one is fine. I'm talking about the one that measures each app separately (`history.merge.timer`); that one is not useful to the user since every app is different.
    
    > a remotely accessible metric is a 0/1 boolean indicating whether or not replay has completed
    
    That's good to expose, not just as a metric but also in the UI itself, if it's not currently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-181380440
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219019969
  
    **[Test build #58563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58563/consoleFull)** for PR 9571 at commit [`64de389`](https://github.com/apache/spark/commit/64de38964c05b514e921c15962be41e92519e98f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60296/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #75708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75708/testReport)** for PR 9571 at commit [`8903dcf`](https://github.com/apache/spark/commit/8903dcfe2b927c8fc3fed9df3e9939670a016944).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #73695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73695/testReport)** for PR 9571 at commit [`d8ae876`](https://github.com/apache/spark/commit/d8ae876505de9599480929905f88612dcbc3905b).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64359/consoleFull)** for PR 9571 at commit [`745d532`](https://github.com/apache/spark/commit/745d532b67b64149e86d2fdf0464cab616023eac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-218706904
  
    **[Test build #58471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58471/consoleFull)** for PR 9571 at commit [`a5d1f4c`](https://github.com/apache/spark/commit/a5d1f4c3d1d7076ad7c65343d9c33beaab71afbe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74962863
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73624457
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    --- End diff --
    
    Can you put this class (and the new gauges you're adding) in a new source file? They're not specific to `HistoryServer` so it's better to have them in a separate place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r77193230
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    UI would be good, though it should be some separate work: it'd need to change the history provider API to add some progress check. 
    
    FWIW the ATS history provider simply adds this as a string in the list of key-value pairs a provider can publish. Ugly and not cross-provider.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62320/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219020128
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58563/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test failures timeout related; unlikely to be due to this patch
    ```
    Test Result (2 failures / +2)
    org.apache.spark.sql.hive.HiveSparkSubmitSuite.dir
    org.apache.spark.sql.hive.HiveSparkSubmitSuite.dir
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76097567
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Metrics integration: the various counters of activity
    +   * Time a closure, returning its output.
    +   * @param timer timer
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      timeCtx.close()
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider) extends Source {
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Metrics registry. */
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    --- End diff --
    
    Ping. I see you use this in `time()` to update the total time, but do you need to register this metric below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64586/consoleFull)** for PR 9571 at commit [`d39996a`](https://github.com/apache/spark/commit/d39996aaabcc0de873bf410eeed6448d6d259c42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r65786593
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Metrics integration: the various counters of activity
    +   * Time a closure, returning its output.
    +   * @param timer timer
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      timeCtx.close()
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider) extends Source {
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Metrics registry. */
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    --- End diff --
    
    Given that different apps can have very different event logs, I wonder how useful this metric really is? Perhaps some extra processing needs to be done (e.g. events processed per sec or something)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-181380445
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50921/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-214871011
  
    **[Test build #57010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57010/consoleFull)** for PR 9571 at commit [`a5d1f4c`](https://github.com/apache/spark/commit/a5d1f4c3d1d7076ad7c65343d9c33beaab71afbe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73624337
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  private val countersAndGauges = Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime)
    +  )
    +
    +  private val timers = Seq (
    +    ("update.timer", updateTimer),
    +    ("history.merge.timer", historyMergeTimer),
    +    ("appui.load.timer", appUiLoadTimer))
    +
    +  val allMetrics = countersAndGauges ++ timers
    --- End diff --
    
    You don't seem to need these lists outside of the context of calling `register`, so can you instead inline the list into the call to avoid having these fields?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67242259
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -278,55 +304,63 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    -    try {
    -      val newLastScanTime = getNewLastScanTime()
    -      logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    -      val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    -        .getOrElse(Seq[FileStatus]())
    -      // scan for modified applications, replay and merge them
    -      val logInfos: Seq[FileStatus] = statusList
    -        .filter { entry =>
    -          try {
    -            val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    -            !entry.isDirectory() && prevFileSize < entry.getLen()
    -          } catch {
    -            case e: AccessControlException =>
    -              // Do not use "logInfo" since these messages can get pretty noisy if printed on
    -              // every poll.
    -              logDebug(s"No permission to read $entry, ignoring.")
    -              false
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
    +      try {
    +        val newLastScanTime = getNewLastScanTime()
    +        logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    +        val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    +          .getOrElse(Seq[FileStatus]())
    +        // scan for modified applications, replay and merge them
    +        val logInfos: Seq[FileStatus] = statusList
    +          .filter { entry =>
    +            try {
    +              val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    +              !entry.isDirectory() && prevFileSize < entry.getLen()
    +            } catch {
    +              case e: AccessControlException =>
    +                // Do not use "logInfo" since these messages can get pretty noisy if printed on
    +                // every poll.
    +                logDebug(s"No permission to read $entry, ignoring.")
    +                false
    +            }
               }
    +          .flatMap { entry => Some(entry) }
    +          .sortWith { case (entry1, entry2) =>
    +            entry1.getModificationTime() >= entry2.getModificationTime()
             }
    -        .flatMap { entry => Some(entry) }
    -        .sortWith { case (entry1, entry2) =>
    -          entry1.getModificationTime() >= entry2.getModificationTime()
    -      }
     
    -      if (logInfos.nonEmpty) {
    -        logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
    -      }
    -      logInfos.map { file =>
    -          replayExecutor.submit(new Runnable {
    -            override def run(): Unit = mergeApplicationListing(file)
    -          })
    +        if (logInfos.nonEmpty) {
    +          logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
             }
    -        .foreach { task =>
    -          try {
    -            // Wait for all tasks to finish. This makes sure that checkForLogs
    -            // is not scheduled again while some tasks are already running in
    -            // the replayExecutor.
    -            task.get()
    -          } catch {
    -            case e: InterruptedException =>
    -              throw e
    -            case e: Exception =>
    -              logError("Exception while merging application listings", e)
    +        logInfos.map { file =>
    +            replayExecutor.submit(new Runnable {
    +              override def run(): Unit = time(metrics.mergeApplicationListingTimer) {
    --- End diff --
    
    I commented later on the event count metric update; but this would be more useful if you knew the event count related to it, and if it were more targeted (i.e. just the replay, not including the time taken to merge the internal lists).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r65786382
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -114,28 +123,45 @@ class HistoryServer(
        * this UI with the event logs in the provided base directory.
        */
       def initialize() {
    -    attachPage(new HistoryPage(this))
    +    if (!initialized.getAndSet(true)) {
    --- End diff --
    
    In which cases is this method called twice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    A few minor things left.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219001777
  
    **[Test build #58566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58566/consoleFull)** for PR 9571 at commit [`1b276e4`](https://github.com/apache/spark/commit/1b276e441dabab2ac20a98c03de01975c95feb72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-218733236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58471/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-183332469
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64531/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111934303
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -729,6 +778,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime),
    +    ("update.timer", updateTimer),
    +    ("history.merge.timer", historyMergeTimer)))
    --- End diff --
    
    cut


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Style police. FWIW I think the lines that failed were already >100 chars, it was just they got indented slightly more.
    ```
    Scalastyle checks failed at following occurrences:
    [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:275: File line length exceeds 100 characters
    [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:283: File line length exceeds 100 characters
    [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:284: File line length exceeds 100 characters
    [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:498: File line length exceeds 100 characters
    ```
    will fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111831786
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -729,6 +778,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime),
    +    ("update.timer", updateTimer),
    +    ("history.merge.timer", historyMergeTimer)))
    --- End diff --
    
    This times is still in this list, why? It's not useful. (This is the timer I was talking about in my previous comment.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219022129
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58566/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73624479
  
    --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala ---
    @@ -204,6 +204,8 @@ private[spark] class MetricsSystem private (
           }
         }
       }
    +
    +  private[spark] def getMetricRegistry(): MetricRegistry = { registry }
    --- End diff --
    
    nit: braces are not necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-163357569
  
    **[Test build #47440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47440/consoleFull)** for PR 9571 at commit [`2e04803`](https://github.com/apache/spark/commit/2e04803528c9820ac1e47a8470d7e34cdd98faff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-168331321
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-183332471
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51188/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Line lengths fixed, tests all happy.
    
    @vanzin \u2014any chance of adding this to your review list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155153352
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-175779755
  
    **[Test build #50202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50202/consoleFull)** for PR 9571 at commit [`984100d`](https://github.com/apache/spark/commit/984100d39f9ba137a9ca4dc2a06ce6393f21ad6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73624345
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  private val countersAndGauges = Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime)
    +  )
    +
    +  private val timers = Seq (
    --- End diff --
    
    nit: no space before `(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62321/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155169298
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111830700
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala ---
    @@ -99,6 +104,19 @@ private[history] abstract class ApplicationHistoryProvider {
       }
     
       /**
    +   * Bind to the History Server: threads should be started here; exceptions may be raised
    +   * Start the provider: threads should be started here; exceptions may be raised
    +   * if the history provider cannot be started.
    +   * The base implementation contains a re-entrancy check and should
    --- End diff --
    
    This makes this interface awkward. Why can't the `HistoryServer` keep track of that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64531/consoleFull)** for PR 9571 at commit [`d39996a`](https://github.com/apache/spark/commit/d39996aaabcc0de873bf410eeed6448d6d259c42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64531/consoleFull)** for PR 9571 at commit [`d39996a`](https://github.com/apache/spark/commit/d39996aaabcc0de873bf410eeed6448d6d259c42).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74910734
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    --- End diff --
    
    replaced with " Average time to process an event in the history merge operation."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73623442
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    --- End diff --
    
    Comment looks wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67326183
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -395,7 +429,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
       private def mergeApplicationListing(fileStatus: FileStatus): Unit = {
         val newAttempts = try {
             val bus = new ReplayListenerBus()
    -        val res = replay(fileStatus, bus)
    +        val (res, count) = replay(fileStatus, bus)
    +        metrics.eventReplayedCount.inc(count)
    --- End diff --
    
    I will measure them both then, and add some time-to-load metric


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #73695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73695/testReport)** for PR 9571 at commit [`d8ae876`](https://github.com/apache/spark/commit/d8ae876505de9599480929905f88612dcbc3905b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #75708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75708/testReport)** for PR 9571 at commit [`8903dcf`](https://github.com/apache/spark/commit/8903dcfe2b927c8fc3fed9df3e9939670a016944).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62318/consoleFull)** for PR 9571 at commit [`6e7f466`](https://github.com/apache/spark/commit/6e7f466da0681af2c40a6842ed6c542ca44c7bd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75704/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74962913
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    +        sb.append("    ").append(entry._1).append(" = ").append(entry._2.getCount).append('\n')
    +    }
    +    sb.append("  Gauges\n")
    +    metricRegistry.getGauges.asScala.foreach { entry =>
    --- End diff --
    
    done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73624122
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Metrics integration: the various counters of activity
    +   * Time a closure, returning its output.
    +   * @param timer timer
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      timeCtx.close()
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider) extends Source {
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Metrics registry. */
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    --- End diff --
    
    I see you added the more useful `appUIEventReplayTime`, so is this metric needed? I don't think a lot of useful information, if any, can be derived from it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64651/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #75704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75704/testReport)** for PR 9571 at commit [`ec1f2d7`](https://github.com/apache/spark/commit/ec1f2d7f8743ce6de3e83f2f9a82f1c940c8be52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
GitHub user steveloughran reopened a pull request:

    https://github.com/apache/spark/pull/9571

    [SPARK-11373] [CORE] Add metrics to the History Server and FsHistoryProvider

    This adds metrics to the history server, with the `FsHistoryProvider` metering its load, performance and reliability.
    
    see [SPARK-11373](https://issues.apache.org/jira/browse/SPARK-11373)
    
    The `HistoryServer` sets up the codahale metrics for the Web under metrics/ with metrics/metrics behind metrics, metrics/health any health probes and metrics/threads a thread dump. There's currently no attempt to  hook up JMX, etc. The Web servlets are the ones tests can easily hit and don't need infrastructure, so are the good initial first step.
    
    It then passes the metrics and health registries down to the providers in a `ApplicationHistoryBinding` case class, via a new method
    
        def start(binding: ApplicationHistoryBinding): Unit
    
    The base class has implementation so that all existing providers will still link properly; the base implementation currently checks and fails
     the use of a binding case class is also to ensure that if new binding information were added in future, existing implementations would still link.
    
    The `FsHistoryProvider` implements the `start()` method, registering two counters and two timers.
    
    1. Number of update attempts and number of failed updates \u2014and the same for app UI loads.
    2. Time for updates and app UI loads.
    
    Points of note
    
    * Why not use Spark's `MetricsSystem`? I did start off with that, but it needs a `SparkContext` to run off, which the server doesn't have. Ideally that would be way to go, as it would support all the spark conf -based metrics setup. Someone who understands the `MetricsSystem` would need to get involved here as would make for a more complex patch. In `FsHistoryProvider` the registry information is all kept in a `Source` subclass for ease of future migration to `MetricsSystem`.
    * Why the extra `HealthRegistry`? It's a nice way of allowing providers to indicate (possibly transient) health problems for monitoring tools/clients to hit. For the FS provider it could maybe flag when there hadn't been any successful update for a specified time period. (that could also be indicated by having a counter of "seconds since last update" and let monitoring tools monitor the counter value and act on it). Access control problems to the directory is something else which may be considered a liveness problem: it won't get better without human intervention
    * The `FsHistoryProvider.start()` method should really take the thread start code from from class constructor's `initialize()` method. This would ensure that incomplete classes don't get called by spawned threads, and makes it possible for test-time subclasses to skip thread startup. I've not attempted to do that in this patch.
    * No tests for this yet. Hitting the three metrics servlets in the HistoryServer is the obvious route; the JSON payload of the metrics can be parsed and scanned for relevant counters too. 
    * Part of the patch for `HistoryServerSuite` removes the call to `HistoryServer.initialize()` the `before` clause. That was a duplicate call, one which hit the re-entrancy tests on the provider & registry. As well as cutting it, `HistoryServer.initialize()` has been made idempotent. That should not be needed -but it will eliminate the problem arising again.
    
    Once the SPARK-1537 YARN timeline server history provider is committed, then I'll add metrics support there too. The YARN timeline provider would:
    
    1. Add timers of REST operations as well as playback load times, which can count network delays as well as JSON deserialization overhead. 
    2. Add a health check for connectivity too: the timeline server would be unhealthy if connections to the timeline server were either blocking or failing. And again, if there were security/auth problems, they'd be considered non-recoverable.
    3. Move thread launch under the `start()` method, with some test subclasses disabling thread launch.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark feature/SPARK-11373

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9571
    
----
commit 2ae734fd358c5d33e75f92837c2e414dc620454a
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-09T16:32:33Z

    [SPARK-11373] First pass at adding metrics to the history server, with the FsHistoryProvider counting
    1. Number of update attempts and number of failed updates
    2. Time for updates and app UI loads
    
    The HistoryServer sets up the codahale metrics for the Web under metrics/ with metrics/metrics behind metrics, metrics/health any health probes and metrics/threads a thread dump.

commit 31777992e151b66bfe53836d9de1e00ccc743a1e
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-09T18:18:51Z

    [SPARK-11373] tests and review; found and fixed a re-entrancy in HistoryServerSuite which was causing problems with counter registration

commit 2d3066fbb09638626a11e3f68f88c4f0ecca591a
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-09T19:26:30Z

    [SPARK-11373] scala-style javadoc tuning in FsHistoryProvider

commit 8c2f0e35202d8382313847c2432c4f8dd050e41f
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-26T21:18:38Z

    SPARK-11373: move to MetricsSystem, though retaining/expanding health checks, which aren't part of it

commit 64fc88c054226fd419152bfa76f4877de2672134
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-26T21:33:28Z

    scalacheck

commit 69c334bca42d9d26b0b8d9576853ddd390965c85
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-11-26T21:57:42Z

    SPARK-11373: cut out the notion of binding information; simplify provider launch

commit d9ba0d2a33e458d89f7430915208798abacfe75d
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-12-09T18:06:53Z

    SPARK-11373 cut the health checks out

commit 8324cccc371ceb63918daa2fc64c7c665204deeb
Author: Steve Loughran <st...@hortonworks.com>
Date:   2016-02-05T14:43:00Z

    [SPARK-11373] tail end of rebase operation

commit b1d3baa2674d210a64820574be8955769bf50ee3
Author: Steve Loughran <st...@hortonworks.com>
Date:   2016-02-08T12:37:40Z

    [SPARK-11373] scalastyle and import ordering

commit c68823e787184675bae42cf83b0b794ca251ca7e
Author: Steve Loughran <st...@hortonworks.com>
Date:   2016-02-12T13:18:40Z

    SPARK-11373 finish review of merge with trunk; add new Timestamp gauge and set to time of update/update success. Makes update time visible for tests/users/ops

commit d16a3e9f757caf88c4e0595807b7b6e452646cf4
Author: Steve Loughran <st...@apache.org>
Date:   2016-04-26T17:50:44Z

    [SPARK-11373] finish rebasing to master, correct tightened style checks

commit ecd0ceed776e0264cd3e846dbaaf6151c12640b6
Author: Steve Loughran <st...@apache.org>
Date:   2016-06-10T14:10:46Z

    [SPARK-11373] Address review comments and add a new counter of events played back during merge and U load

commit f086242e4c9ac75b828c34cae650415196119e10
Author: Steve Loughran <st...@apache.org>
Date:   2016-07-14T13:29:38Z

    [SPARK-11373] more metrics, all sources have prefixes, common features pulled up into HistoryMetricSource, including registration and prefix support. The existing test case "incomplete apps get refreshed" has been extended to check for metrics in the listings, alongside some specific probes for values in the fs history provider (loads, load time average > 0)

commit 0cf532ec34e328e71b3495c64e0e15ed13a254b2
Author: Steve Loughran <st...@apache.org>
Date:   2016-07-14T13:35:20Z

    [SPARK-11373] add a check that before there's been any events loaded, the average load time is "0" and not a division by zero error. This validates the logic in the relevant lambda-expression

commit c8b4912dccd7bbff1c3457592227b07de8c7e9da
Author: Steve Loughran <st...@apache.org>
Date:   2016-07-14T13:48:50Z

    [SPARK-11373] style check in the javadocs

commit aa1aada8c898e773dc8af88f580a906113e95da1
Author: Steve Loughran <st...@apache.org>
Date:   2016-07-14T13:57:18Z

    [SPARK-11373] IDE had mysteriously re-ordered imports in the same line. Interesting, and not in a good way

commit 745d532b67b64149e86d2fdf0464cab616023eac
Author: Steve Loughran <st...@apache.org>
Date:   2016-08-16T18:50:19Z

    [SPARK-11373] address latest comments
    -nits
    -recommended minor changes
    +pull out lambda gauge and HistoryMetricSource into their own file, HistoryMetricSource.scala. Added tests to go with this, and made the toString call robust against failing gauges
    +fixed a scaladoc warning in HistoryServer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64651/consoleFull)** for PR 9571 at commit [`1c750b5`](https://github.com/apache/spark/commit/1c750b50cca7e17ebe5950592732ff4ceae7cf2b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    This patch adds separate average values of the load times vs merge times per event; this shows ~2x difference in replay from load in the test case.,
    
    These `.time` gauges are little lambda expressions evaluated whenever the gauge value is extracted; they divide the total load/merge durations by the event count. The Timer metrics don't provide enough data here, because they support various decaying reservoirs/windows for their time values, not whole-life-of-app durations.
    
    There's some base class metric support for registration (everything is prefixed) and testability. There's now tests in `HistoryServerSuite."incomplete apps get refreshed"` for metrics being in the dumped list, and for specific values, especially averages (that they increase, that they don't trigger division-by-0 exceptions before there have been any loads)
    
    This is what the metrics look like after the tests (from a log of the toString value). 
    
    The `.last.attempted` values are `System.currentTimeMillis` timestamps of the operations. I'd considered a "time-since" gauge, but after some offline discussion with Allen Wittenauer, went for the absolute values; I'll leave to to the management tooling to work out elapsed times
    from absolute values if they want to use that for alerts or UIs.
    
    ```
    16/07/14 14:17:03.837 ScalaTest-main-running-HistoryServerSuite INFO HistoryServerSuite: Metrics:
    Metrics for history:
      Counters
      Gauges
    
    Metrics for history.fs:
      Counters
        history.provider.appui.event.count = 103
        history.provider.appui.load.count = 6
        history.provider.appui.load.duration = 38244482
        history.provider.appui.load.failure.count = 0
        history.provider.appui.load.not-found.count = 0
        history.provider.history.merge.duration = 16138193
        history.provider.history.merge.event.count = 83
        history.provider.update.count = 5
        history.provider.update.failure.count = 0
      Gauges
        history.provider.appui.event.replay.time = 371305
        history.provider.history.merge.event.time = 194436
        history.provider.update.last.attempted = 1468502223726
        history.provider.update.last.succeeded = 1468502223000
    
    Metrics for application.cache:
      Counters
        history.cache.eviction.count = 3
        history.cache.load.count = 4
        history.cache.lookup.count = 58
        history.cache.lookup.failure.count = 0
        history.cache.update.probe.count = 57
        history.cache.update.triggered.count = 3
      Gauges
    ```
    
    One thing to consider: gauges of the number of complete and incomplete applications? I know the REST UI gives this, but only indirectly (you call, you count the size of the lists). Doing it in the metrics provides something that could be monitored or probed in tests without making REST calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-218733095
  
    **[Test build #58471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58471/consoleFull)** for PR 9571 at commit [`a5d1f4c`](https://github.com/apache/spark/commit/a5d1f4c3d1d7076ad7c65343d9c33beaab71afbe).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by CharlesYeh <gi...@git.apache.org>.
Github user CharlesYeh commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155695451
  
    Thanks! I'll look into this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73623629
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -114,28 +123,45 @@ class HistoryServer(
        * this UI with the event logs in the provided base directory.
        */
       def initialize() {
    -    attachPage(new HistoryPage(this))
    +    if (!initialized.getAndSet(true)) {
    --- End diff --
    
    If that's the case, `initialize()` should be made private and its calls removed from the tests.
    
    (I've seen this pattern of calling an `init()` method from constructors in Scala when the initialization code requires local variables, which otherwise would have to be fields in the class...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62320/consoleFull)** for PR 9571 at commit [`0b8db8b`](https://github.com/apache/spark/commit/0b8db8b285bfcc73e861a92303206f670ea40de1).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62321/consoleFull)** for PR 9571 at commit [`4b2046f`](https://github.com/apache/spark/commit/4b2046fd47d58bc5814f6271aa1d67a6cdd0918c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155148705
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64359/consoleFull)** for PR 9571 at commit [`745d532`](https://github.com/apache/spark/commit/745d532b67b64149e86d2fdf0464cab616023eac).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62319/consoleFull)** for PR 9571 at commit [`3971241`](https://github.com/apache/spark/commit/39712419d463bface2027fb7888f2c40f26fa412).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219021936
  
    **[Test build #58566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58566/consoleFull)** for PR 9571 at commit [`1b276e4`](https://github.com/apache/spark/commit/1b276e441dabab2ac20a98c03de01975c95feb72).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-180390375
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #60296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60296/consoleFull)** for PR 9571 at commit [`5eb5360`](https://github.com/apache/spark/commit/5eb536023c0d363728911487942d47fec2e99fb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    I'm going to close this PR and start one based on a reapplication of this patch onto master; gets rid of all the merge pain and is intended to be more minimal. The latest comments of this one will be applied


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    One other metric set I'm thinking of relates to a JIRA on app UIs not being visible: making the time of last scan a metric, both as an epoch time and diff from current time. That would let a management tool keep track of the scan history, maybe even show alerts if things are out of date. At the very least, if there are support calls, *and someone were to add the coda hale metrics servlets* then users could point their browsers at the metrics page and get a lump of JSON to paste into JIRAs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-175780323
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50202/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75708/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #63864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63864/consoleFull)** for PR 9571 at commit [`745d532`](https://github.com/apache/spark/commit/745d532b67b64149e86d2fdf0464cab616023eac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67240845
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -278,55 +304,63 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    -    try {
    -      val newLastScanTime = getNewLastScanTime()
    -      logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    -      val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    -        .getOrElse(Seq[FileStatus]())
    -      // scan for modified applications, replay and merge them
    -      val logInfos: Seq[FileStatus] = statusList
    -        .filter { entry =>
    -          try {
    -            val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    -            !entry.isDirectory() && prevFileSize < entry.getLen()
    -          } catch {
    -            case e: AccessControlException =>
    -              // Do not use "logInfo" since these messages can get pretty noisy if printed on
    -              // every poll.
    -              logDebug(s"No permission to read $entry, ignoring.")
    -              false
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
    +      try {
    +        val newLastScanTime = getNewLastScanTime()
    +        logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    +        val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    +          .getOrElse(Seq[FileStatus]())
    +        // scan for modified applications, replay and merge them
    +        val logInfos: Seq[FileStatus] = statusList
    +          .filter { entry =>
    +            try {
    +              val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    +              !entry.isDirectory() && prevFileSize < entry.getLen()
    +            } catch {
    +              case e: AccessControlException =>
    +                // Do not use "logInfo" since these messages can get pretty noisy if printed on
    +                // every poll.
    +                logDebug(s"No permission to read $entry, ignoring.")
    +                false
    +            }
               }
    +          .flatMap { entry => Some(entry) }
    +          .sortWith { case (entry1, entry2) =>
    +            entry1.getModificationTime() >= entry2.getModificationTime()
             }
    -        .flatMap { entry => Some(entry) }
    -        .sortWith { case (entry1, entry2) =>
    -          entry1.getModificationTime() >= entry2.getModificationTime()
    -      }
     
    -      if (logInfos.nonEmpty) {
    -        logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
    -      }
    -      logInfos.map { file =>
    -          replayExecutor.submit(new Runnable {
    -            override def run(): Unit = mergeApplicationListing(file)
    -          })
    +        if (logInfos.nonEmpty) {
    +          logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
             }
    -        .foreach { task =>
    -          try {
    -            // Wait for all tasks to finish. This makes sure that checkForLogs
    -            // is not scheduled again while some tasks are already running in
    -            // the replayExecutor.
    -            task.get()
    -          } catch {
    -            case e: InterruptedException =>
    -              throw e
    -            case e: Exception =>
    -              logError("Exception while merging application listings", e)
    +        logInfos.map { file =>
    +            replayExecutor.submit(new Runnable {
    +              override def run(): Unit = time(metrics.mergeApplicationListingTimer) {
    --- End diff --
    
    This metric contains the time to replay an app event log, which can vary a lot per application; same issue with the UI load time issue I pointed out before. That makes this metric not very useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155165525
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-169013506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48768/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67242040
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -395,7 +429,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
       private def mergeApplicationListing(fileStatus: FileStatus): Unit = {
         val newAttempts = try {
             val bus = new ReplayListenerBus()
    -        val res = replay(fileStatus, bus)
    +        val (res, count) = replay(fileStatus, bus)
    +        metrics.eventReplayedCount.inc(count)
    --- End diff --
    
    You're using this metric in two different contexts: while creating the listing, and while building the UI.
    
    That means there's no good way to get a useful metric out of it, since you don't know how many events correspond to each and you're measuring the times for each separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111830465
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ---
    @@ -410,34 +409,25 @@ private[history] class CacheMetrics(prefix: String) extends Source {
         ("update.triggered.count", updateTriggeredCount))
     
       /** all metrics, including timers */
    -  private val allMetrics = counters ++ Seq(
    +  private val allMetrics: Seq[(String, Metric with Counting)] = counters ++ Seq(
    --- End diff --
    
    Is declaring the type useful here in some way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-181379643
  
    **[Test build #50921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50921/consoleFull)** for PR 9571 at commit [`889978c`](https://github.com/apache/spark/commit/889978c52a9efa29e287d6e6d4c57f1fa9717442).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76773046
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Metrics integration: the various counters of activity
    +   * Time a closure, returning its output.
    +   * @param timer timer
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      timeCtx.close()
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider) extends Source {
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Metrics registry. */
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    --- End diff --
    
    I've cut the metric


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-168988730
  
    **[Test build #48768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48768/consoleFull)** for PR 9571 at commit [`d6fa568`](https://github.com/apache/spark/commit/d6fa568fab72a2c4d57ecfcd304d000379534990).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-175780315
  
    **[Test build #50202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50202/consoleFull)** for PR 9571 at commit [`984100d`](https://github.com/apache/spark/commit/984100d39f9ba137a9ca4dc2a06ce6393f21ad6a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-163358148
  
    latest branch cut out health checks. They'd be nice, but as they are useful across all of spark, it's probably best to wait for something central to go in and pick it up, rather than try and squeeze something in here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67239990
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala ---
    @@ -17,9 +17,12 @@
     
     package org.apache.spark.deploy.history
     
    +import java.util.concurrent.atomic.AtomicBoolean
     import java.util.zip.ZipOutputStream
     
     import org.apache.spark.SparkException
    +import org.apache.spark.metrics.source.Source
    +import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart, SparkListenerBlockManagerAdded, SparkListenerBlockManagerRemoved, SparkListenerBlockUpdated, SparkListenerEnvironmentUpdate, SparkListenerEvent, SparkListenerExecutorAdded, SparkListenerExecutorMetricsUpdate, SparkListenerExecutorRemoved, SparkListenerJobEnd, SparkListenerJobStart, SparkListenerStageCompleted, SparkListenerStageSubmitted, SparkListenerTaskEnd, SparkListenerTaskGettingResult, SparkListenerTaskStart, SparkListenerUnpersistRDD}
    --- End diff --
    
    At this point I'd just import `org.apache.spark.scheduler._`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r66613829
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -278,6 +303,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
         try {
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-175780322
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-180390371
  
    **[Test build #50822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50822/consoleFull)** for PR 9571 at commit [`3ceeb9e`](https://github.com/apache/spark/commit/3ceeb9ef2cd34d5d5eeccacd9e49477c133f0ee0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Could you close this PR please? @steveloughran 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-169013292
  
    **[Test build #48768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48768/consoleFull)** for PR 9571 at commit [`d6fa568`](https://github.com/apache/spark/commit/d6fa568fab72a2c4d57ecfcd304d000379534990).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran closed the pull request at:

    https://github.com/apache/spark/pull/9571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-218999605
  
    **[Test build #58563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58563/consoleFull)** for PR 9571 at commit [`64de389`](https://github.com/apache/spark/commit/64de38964c05b514e921c15962be41e92519e98f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r73623900
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    +        sb.append("    ").append(entry._1).append(" = ").append(entry._2.getCount).append('\n')
    +    }
    +    sb.append("  Gauges\n")
    +    metricRegistry.getGauges.asScala.foreach { entry =>
    +      sb.append("    ").append(entry._1).append(" = ").append(entry._2.getValue).append('\n')
    +    }
    +    sb.toString()
    +  }
    +
    +  /**
    +   * Get a named counter.
    +   * @param counterName name of the counter
    +   * @return the counter, if found
    +   */
    +  def getCounter(counterName: String): Option[Counter] = {
    +    Option(metricRegistry.getCounters(new MetricFilter {
    +      def matches(name: String, metric: Metric): Boolean = name == counterName
    +    }).get(counterName))
    +  }
    +
    +  /**
    +   * Get a gauge of an unknown numeric type.
    +   * @param gaugeName name of the gauge
    +   * @return gauge, if found
    +   */
    +  def getGauge(gaugeName: String): Option[Gauge[_]] = {
    +    Option(metricRegistry.getGauges(new MetricFilter {
    +      def matches(name: String, metric: Metric): Boolean = name == gaugeName
    +    }).get(gaugeName))
    +  }
    +
    +  /**
    +   * Get a Long gauge.
    +   * @param gaugeName name of the gauge
    +   * @return gauge, if found
    +   * @throws ClassCastException if the gauge is found but of the wrong type
    +   */
    +  def getLongGauge(gaugeName: String): Option[Gauge[Long]] = {
    +    Option(metricRegistry.getGauges(new MetricFilter {
    --- End diff --
    
    Can you call `getGauge` and cast instead of copy & pasting the code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Updated patch. Addresses indentation, found and eliminated one more call to {{initialize()}} outside of constructor.
    
    Adds a whole new counter, `event.replay.count`, which counts the number of events replayed during load or attempt merge. I'm not trying to be clever about deriving playback rate or anything, as it could get complex with the merge/update listings being included. However, including those playback events could be useful in identifying why startup was slow, or why updates are taking a while to surface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64651/consoleFull)** for PR 9571 at commit [`1c750b5`](https://github.com/apache/spark/commit/1c750b50cca7e17ebe5950592732ff4ceae7cf2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Looks ok but there's one comment you haven't addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62320/consoleFull)** for PR 9571 at commit [`0b8db8b`](https://github.com/apache/spark/commit/0b8db8b285bfcc73e861a92303206f670ea40de1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74910796
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  private val countersAndGauges = Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime)
    +  )
    +
    +  private val timers = Seq (
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111914872
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala ---
    @@ -99,6 +104,19 @@ private[history] abstract class ApplicationHistoryProvider {
       }
     
       /**
    +   * Bind to the History Server: threads should be started here; exceptions may be raised
    +   * Start the provider: threads should be started here; exceptions may be raised
    +   * if the history provider cannot be started.
    +   * The base implementation contains a re-entrancy check and should
    --- End diff --
    
    All the work on Yarn service model, and that of SmartFrog before it, have given me a fear of startup logic. I'll see about doing it there though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111942037
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -310,77 +338,87 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    -    try {
    -      val newLastScanTime = getNewLastScanTime()
    -      logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    -      val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    -        .getOrElse(Seq[FileStatus]())
    -      // scan for modified applications, replay and merge them
    -      val logInfos: Seq[FileStatus] = statusList
    -        .filter { entry =>
    -          try {
    -            val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    -            !entry.isDirectory() &&
    -              // FsHistoryProvider generates a hidden file which can't be read.  Accidentally
    -              // reading a garbage file is safe, but we would log an error which can be scary to
    -              // the end-user.
    -              !entry.getPath().getName().startsWith(".") &&
    -              prevFileSize < entry.getLen()
    -          } catch {
    -            case e: AccessControlException =>
    -              // Do not use "logInfo" since these messages can get pretty noisy if printed on
    -              // every poll.
    -              logDebug(s"No permission to read $entry, ignoring.")
    -              false
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
    +      try {
    +        val newLastScanTime = getNewLastScanTime()
    +        logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    +        val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    +          .getOrElse(Seq[FileStatus]())
    +        // scan for modified applications, replay and merge them
    +        val logInfos: Seq[FileStatus] = statusList
    +          .filter { entry =>
    +            try {
    +              val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    +              !entry.isDirectory() &&
    +                // FsHistoryProvider generates a hidden file which can't be read.  Accidentally
    +                // reading a garbage file is safe, but we would log an error which can be scary to
    +                // the end-user.
    +                !entry.getPath().getName().startsWith(".") &&
    +                prevFileSize < entry.getLen()
    +            } catch {
    +              case e: AccessControlException =>
    +                // Do not use "logInfo" since these messages can get pretty noisy if printed on
    +                // every poll.
    +                logDebug(s"No permission to read $entry, ignoring.")
    +                false
    +            }
    +          }
    +          .flatMap { entry => Some(entry) }
    +          .sortWith { case (entry1, entry2) =>
    +            entry1.getModificationTime() >= entry2.getModificationTime()
               }
    -        }
    -        .flatMap { entry => Some(entry) }
    -        .sortWith { case (entry1, entry2) =>
    -          entry1.getModificationTime() >= entry2.getModificationTime()
    -      }
     
           if (logInfos.nonEmpty) {
             logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
           }
     
    -      var tasks = mutable.ListBuffer[Future[_]]()
    -
    -      try {
    -        for (file <- logInfos) {
    -          tasks += replayExecutor.submit(new Runnable {
    -            override def run(): Unit = mergeApplicationListing(file)
    -          })
    -        }
    -      } catch {
    -        // let the iteration over logInfos break, since an exception on
    -        // replayExecutor.submit (..) indicates the ExecutorService is unable
    -        // to take any more submissions at this time
    -
    -        case e: Exception =>
    -          logError(s"Exception while submitting event log for replay", e)
    -      }
    -
    -      pendingReplayTasksCount.addAndGet(tasks.size)
    +        var tasks = mutable.ListBuffer[Future[_]]()
     
    -      tasks.foreach { task =>
             try {
    -          // Wait for all tasks to finish. This makes sure that checkForLogs
    -          // is not scheduled again while some tasks are already running in
    -          // the replayExecutor.
    -          task.get()
    +          for (file <- logInfos) {
    +            tasks += replayExecutor.submit(new Runnable {
    +              override def run(): Unit =
    +                time(metrics.historyMergeTimer, Some(metrics.historyTotalMergeTime)) {
    +                  mergeApplicationListing(file)
    +              }
    +            })
    +          }
             } catch {
    -          case e: InterruptedException =>
    -            throw e
    +          // let the iteration over logInfos break, since an exception on
    +          // replayExecutor.submit (..) indicates the ExecutorService is unable
    +          // to take any more submissions at this time
    +
               case e: Exception =>
    -            logError("Exception while merging application listings", e)
    -        } finally {
    -          pendingReplayTasksCount.decrementAndGet()
    +            logError(s"Exception while submitting event log for replay", e)
             }
    -      }
     
    -      lastScanTime.set(newLastScanTime)
    -    } catch {
    -      case e: Exception => logError("Exception in checking for event log updates", e)
    +        pendingReplayTasksCount.addAndGet(tasks.size)
    +
    +        tasks.foreach { task =>
    +          try {
    +            // Wait for all tasks to finish. This makes sure that checkForLogs
    +            // is not scheduled again while some tasks are already running in
    +            // the replayExecutor.
    +            task.get()
    +          } catch {
    +            case e: InterruptedException =>
    +              throw e
    +            case e: Exception =>
    +              logError("Exception while merging application listings", e)
    +          } finally {
    +            pendingReplayTasksCount.decrementAndGet()
    +          }
    +        }
    +
    +        lastScanTime.set(newLastScanTime)
    +        metrics.updateLastSucceeded.setValue(newLastScanTime)
    +      } catch {
    +        case e: Exception => logError(
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-183332022
  
    **[Test build #51188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51188/consoleFull)** for PR 9571 at commit [`de1769b`](https://github.com/apache/spark/commit/de1769b85b7a9ac3fc160eaf7d465176e95d7100).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-183332463
  
    **[Test build #51188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51188/consoleFull)** for PR 9571 at commit [`de1769b`](https://github.com/apache/spark/commit/de1769b85b7a9ac3fc160eaf7d465176e95d7100).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155900033
  
    ...I thought it did. If I'm wrong, then yes, all you need is a context. Once that's in there, the HistorySource should be what's needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] WiP Add metrics to the Hi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-155165489
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-168331324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48567/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran closed the pull request at:

    https://github.com/apache/spark/pull/9571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r66612954
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -114,28 +123,45 @@ class HistoryServer(
        * this UI with the event logs in the provided base directory.
        */
       def initialize() {
    -    attachPage(new HistoryPage(this))
    +    if (!initialized.getAndSet(true)) {
    --- End diff --
    
    It's in `HistoryServerSuite`
    
    ```scala
    server = new HistoryServer(conf, provider, securityManager, 18080)
    server.initialize()
    server.bind()
    ```
    
    issue here is the HS constructor already calls `initialize()`; some of the tests call it twice.
    
    One option here would be move that `initialize()` out of the constructor of HS; that'd mean changing `main()`, but produce something better for subclassing and testing anyway. Even so, making `initialize()`  non-reentrant can only be a good thing, given that threads are spawned off, metrics registered, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76952864
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -664,6 +707,116 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to process an event in the history merge operation. */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  private val clock = new SystemClock
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new TimestampGauge(clock)
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new TimestampGauge(clock)
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  register(Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    --- End diff --
    
    No problem, though I think the total load time does need to be measured and logged somewhere. I'll make sure that happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74960995
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    --- End diff --
    
    done; also made the`Timestamp` gauge take a Spark `Clock` for easier use/testing, plus some other standalone tests. Separate source file -> isolated tests.
    
    These gauges are actually generic to any metrics, and the `LambdaLongGauge` gauge quite a nice one to use in a lot of places; if accompanied with `LambdaIntGauge` they could be used as a more functional equivalent to the various gauge subclasses throughout the code. I could put them somewhere more generic if you think its useful (and even, in a different patch, move the existing gauges over to them)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-214827077
  
    **[Test build #57010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57010/consoleFull)** for PR 9571 at commit [`a5d1f4c`](https://github.com/apache/spark/commit/a5d1f4c3d1d7076ad7c65343d9c33beaab71afbe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74981677
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    --- End diff --
    
    We can move the classes to a different module when it becomes necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    > I was secretly hoping you'd just give up on this patch, since it will generate a lot of conflicts with the code I'm working on in parallel..
    
    No. Sorry
    
    I do suspect the reviewers of my other outstanding patches are of the same mind "if we ignore him, he'll go away". which is why I'm effectively giving up filing new patches and bug reports related to Spark.
    
    However, that doesn't mean that I'm going to stop updating and reminding people of my current set of patches. It's easier to accept them and hope I'll go away afterwards. After all, if you'd merged this in earlier: no merge conflicts. Better all round.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62319/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r111831406
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -310,77 +338,87 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
        * applications that haven't been updated since last time the logs were checked.
        */
       private[history] def checkForLogs(): Unit = {
    -    try {
    -      val newLastScanTime = getNewLastScanTime()
    -      logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    -      val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    -        .getOrElse(Seq[FileStatus]())
    -      // scan for modified applications, replay and merge them
    -      val logInfos: Seq[FileStatus] = statusList
    -        .filter { entry =>
    -          try {
    -            val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    -            !entry.isDirectory() &&
    -              // FsHistoryProvider generates a hidden file which can't be read.  Accidentally
    -              // reading a garbage file is safe, but we would log an error which can be scary to
    -              // the end-user.
    -              !entry.getPath().getName().startsWith(".") &&
    -              prevFileSize < entry.getLen()
    -          } catch {
    -            case e: AccessControlException =>
    -              // Do not use "logInfo" since these messages can get pretty noisy if printed on
    -              // every poll.
    -              logDebug(s"No permission to read $entry, ignoring.")
    -              false
    +    metrics.updateCount.inc()
    +    metrics.updateLastAttempted.touch()
    +    time(metrics.updateTimer) {
    +      try {
    +        val newLastScanTime = getNewLastScanTime()
    +        logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
    +        val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
    +          .getOrElse(Seq[FileStatus]())
    +        // scan for modified applications, replay and merge them
    +        val logInfos: Seq[FileStatus] = statusList
    +          .filter { entry =>
    +            try {
    +              val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
    +              !entry.isDirectory() &&
    +                // FsHistoryProvider generates a hidden file which can't be read.  Accidentally
    +                // reading a garbage file is safe, but we would log an error which can be scary to
    +                // the end-user.
    +                !entry.getPath().getName().startsWith(".") &&
    +                prevFileSize < entry.getLen()
    +            } catch {
    +              case e: AccessControlException =>
    +                // Do not use "logInfo" since these messages can get pretty noisy if printed on
    +                // every poll.
    +                logDebug(s"No permission to read $entry, ignoring.")
    +                false
    +            }
    +          }
    +          .flatMap { entry => Some(entry) }
    +          .sortWith { case (entry1, entry2) =>
    +            entry1.getModificationTime() >= entry2.getModificationTime()
               }
    -        }
    -        .flatMap { entry => Some(entry) }
    -        .sortWith { case (entry1, entry2) =>
    -          entry1.getModificationTime() >= entry2.getModificationTime()
    -      }
     
           if (logInfos.nonEmpty) {
             logDebug(s"New/updated attempts found: ${logInfos.size} ${logInfos.map(_.getPath)}")
           }
     
    -      var tasks = mutable.ListBuffer[Future[_]]()
    -
    -      try {
    -        for (file <- logInfos) {
    -          tasks += replayExecutor.submit(new Runnable {
    -            override def run(): Unit = mergeApplicationListing(file)
    -          })
    -        }
    -      } catch {
    -        // let the iteration over logInfos break, since an exception on
    -        // replayExecutor.submit (..) indicates the ExecutorService is unable
    -        // to take any more submissions at this time
    -
    -        case e: Exception =>
    -          logError(s"Exception while submitting event log for replay", e)
    -      }
    -
    -      pendingReplayTasksCount.addAndGet(tasks.size)
    +        var tasks = mutable.ListBuffer[Future[_]]()
     
    -      tasks.foreach { task =>
             try {
    -          // Wait for all tasks to finish. This makes sure that checkForLogs
    -          // is not scheduled again while some tasks are already running in
    -          // the replayExecutor.
    -          task.get()
    +          for (file <- logInfos) {
    +            tasks += replayExecutor.submit(new Runnable {
    +              override def run(): Unit =
    +                time(metrics.historyMergeTimer, Some(metrics.historyTotalMergeTime)) {
    +                  mergeApplicationListing(file)
    +              }
    +            })
    +          }
             } catch {
    -          case e: InterruptedException =>
    -            throw e
    +          // let the iteration over logInfos break, since an exception on
    +          // replayExecutor.submit (..) indicates the ExecutorService is unable
    +          // to take any more submissions at this time
    +
               case e: Exception =>
    -            logError("Exception while merging application listings", e)
    -        } finally {
    -          pendingReplayTasksCount.decrementAndGet()
    +            logError(s"Exception while submitting event log for replay", e)
             }
    -      }
     
    -      lastScanTime.set(newLastScanTime)
    -    } catch {
    -      case e: Exception => logError("Exception in checking for event log updates", e)
    +        pendingReplayTasksCount.addAndGet(tasks.size)
    +
    +        tasks.foreach { task =>
    +          try {
    +            // Wait for all tasks to finish. This makes sure that checkForLogs
    +            // is not scheduled again while some tasks are already running in
    +            // the replayExecutor.
    +            task.get()
    +          } catch {
    +            case e: InterruptedException =>
    +              throw e
    +            case e: Exception =>
    +              logError("Exception while merging application listings", e)
    +          } finally {
    +            pendingReplayTasksCount.decrementAndGet()
    +          }
    +        }
    +
    +        lastScanTime.set(newLastScanTime)
    +        metrics.updateLastSucceeded.setValue(newLastScanTime)
    +      } catch {
    +        case e: Exception => logError(
    --- End diff --
    
    `logError` goes in the next line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-180390376
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50822/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-214871285
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57010/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74910950
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -667,6 +710,123 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
             prevFileSize < latest.fileSize
         }
       }
    +
    +  /**
    +   * Time a closure, returning its output.
    +   * The timer is updated with the duration, and if a counter is supplied, it's count
    +   * is incremented by the duration.
    +   * @param timer timer
    +   * @param counter counter: an optional counter of the duration
    +   * @param fn function
    +   * @tparam T type of return value of time
    +   * @return the result of the function.
    +   */
    +  private def time[T](timer: Timer, counter: Option[Counter] = None)(fn: => T): T = {
    +    val timeCtx = timer.time()
    +    try {
    +      fn
    +    } finally {
    +      val duration = timeCtx.stop()
    +      counter.foreach(_.inc(duration))
    +    }
    +  }
    +}
    +
    +/**
    + * Metrics integration: the various counters of activity.
    + */
    +private[history] class FsHistoryProviderMetrics(owner: FsHistoryProvider, prefix: String)
    +    extends HistoryMetricSource(prefix) {
    +
    +  /**
    +   * Function to return an average; if the count is 0, so is the average.
    +   * @param value value to average
    +   * @param count event count to divide by
    +   * @return the average, or 0 if the counter is itself 0
    +   */
    +  private def average(value: Long, count: Long): Long = {
    +    if (count> 0) value / count else 0
    +  }
    +
    +  override val sourceName = "history.fs"
    +
    +  private val name = MetricRegistry.name(sourceName)
    +
    +  /** Number of updates. */
    +  val updateCount = new Counter()
    +
    +  /** Number of update failures. */
    +  val updateFailureCount = new Counter()
    +
    +  /** Number of events replayed as listing merge. */
    +  val historyEventCount = new Counter()
    +
    +  /** Timer of listing merges. */
    +  val historyMergeTimer = new Timer()
    +
    +  /** Total time to merge all histories. */
    +  val historyTotalMergeTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val historyEventMergeTime = new LambdaLongGauge(() =>
    +    average(historyTotalMergeTime.getCount, historyEventCount.getCount))
    +
    +  /** Number of events replayed. */
    +  val appUIEventCount = new Counter()
    +
    +  /** Update duration timer. */
    +  val updateTimer = new Timer()
    +
    +  /** Time the last update was attempted. */
    +  val updateLastAttempted = new Timestamp()
    +
    +  /** Time the last update succeded. */
    +  val updateLastSucceeded = new Timestamp()
    +
    +  /** Number of App UI load operations. */
    +  val appUILoadCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to a load/parse/replay problem. */
    +  val appUILoadFailureCount = new Counter()
    +
    +  /** Number of App UI load operations that failed due to an unknown file. */
    +  val appUILoadNotFoundCount = new Counter()
    +
    +  /** Statistics on time to load app UIs. */
    +  val appUiLoadTimer = new Timer()
    +
    +  /** Total load time of all App UIs. */
    +  val appUITotalLoadTime = new Counter()
    +
    +  /** Average time to load a single event in the App UI */
    +  val appUIEventReplayTime = new LambdaLongGauge(() =>
    +    average(appUITotalLoadTime.getCount, appUIEventCount.getCount))
    +
    +  private val countersAndGauges = Seq(
    +    ("history.merge.event.count", historyEventCount),
    +    ("history.merge.event.time", historyEventMergeTime),
    +    ("history.merge.duration", historyTotalMergeTime),
    +    ("update.count", updateCount),
    +    ("update.failure.count", updateFailureCount),
    +    ("update.last.attempted", updateLastAttempted),
    +    ("update.last.succeeded", updateLastSucceeded),
    +    ("appui.load.count", appUILoadCount),
    +    ("appui.load.duration", appUITotalLoadTime),
    +    ("appui.load.failure.count", appUILoadFailureCount),
    +    ("appui.load.not-found.count", appUILoadNotFoundCount),
    +    ("appui.event.count", appUIEventCount),
    +    ("appui.event.replay.time", appUIEventReplayTime)
    +  )
    +
    +  private val timers = Seq (
    +    ("update.timer", updateTimer),
    +    ("history.merge.timer", historyMergeTimer),
    +    ("appui.load.timer", appUiLoadTimer))
    +
    +  val allMetrics = countersAndGauges ++ timers
    --- End diff --
    
    ok. merged them all in. I'd split them by type, but like you say: not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r67240208
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala ---
    @@ -110,3 +127,87 @@ private[history] abstract class ApplicationHistoryProvider {
       def writeEventLogs(appId: String, attemptId: Option[String], zipStream: ZipOutputStream): Unit
     
     }
    +
    +/**
    + * A simple counter of events.
    + * There is no concurrency support here: all events must come in sequentially.
    + */
    +private[history] class EventCountListener extends SparkListener {
    --- End diff --
    
    Isn't it easier to extend `SparkFirehoseListener` if all you're doing is counting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219022128
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #60296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60296/consoleFull)** for PR 9571 at commit [`5eb5360`](https://github.com/apache/spark/commit/5eb536023c0d363728911487942d47fec2e99fb7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73695/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #62321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62321/consoleFull)** for PR 9571 at commit [`4b2046f`](https://github.com/apache/spark/commit/4b2046fd47d58bc5814f6271aa1d67a6cdd0918c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r76098002
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -225,14 +274,26 @@ class HistoryServer(
       }
     }
     
    +
    --- End diff --
    
    nit: too many blank lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-168330976
  
    **[Test build #48567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48567/consoleFull)** for PR 9571 at commit [`06e5289`](https://github.com/apache/spark/commit/06e5289fae86b3d12102bf018cc301da9964e0d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    > if you'd merged this in earlier: no merge conflicts
    
    Well, if you're going to follow that route, if you had addressed feedback earlier, the patch would have been merged long ago...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-169013504
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74963037
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -226,6 +259,135 @@ class HistoryServer(
     }
     
     /**
    + * An abstract implementation of the metrics [[Source]] trait with some common operations.
    + */
    +private[history] abstract class HistoryMetricSource(val prefix: String) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +
    +  /**
    +   * Register a sequence of metrics
    +   * @param metrics sequence of metrics to register
    +   */
    +  def register(metrics: Seq[(String, Metric)]): Unit = {
    +    metrics.foreach { case (name, metric) =>
    +      metricRegistry.register(fullname(name), metric)
    +    }
    +  }
    +
    +  /**
    +   * Create the full name of a metric by prepending the prefix to the name
    +   * @param name short name
    +   * @return the full name to use in registration
    +   */
    +  def fullname(name: String): String = {
    +    MetricRegistry.name(prefix, name)
    +  }
    +
    +  /**
    +   * Dump the counters and gauges.
    +   * @return a string for logging and diagnostics -not for parsing by machines.
    +   */
    +  override def toString: String = {
    +    val sb = new StringBuilder(s"Metrics for $sourceName:\n")
    +    sb.append("  Counters\n")
    +    metricRegistry.getCounters.asScala.foreach { entry =>
    +        sb.append("    ").append(entry._1).append(" = ").append(entry._2.getCount).append('\n')
    +    }
    +    sb.append("  Gauges\n")
    +    metricRegistry.getGauges.asScala.foreach { entry =>
    +      sb.append("    ").append(entry._1).append(" = ").append(entry._2.getValue).append('\n')
    +    }
    +    sb.toString()
    +  }
    +
    +  /**
    +   * Get a named counter.
    +   * @param counterName name of the counter
    +   * @return the counter, if found
    +   */
    +  def getCounter(counterName: String): Option[Counter] = {
    +    Option(metricRegistry.getCounters(new MetricFilter {
    +      def matches(name: String, metric: Metric): Boolean = name == counterName
    +    }).get(counterName))
    +  }
    +
    +  /**
    +   * Get a gauge of an unknown numeric type.
    +   * @param gaugeName name of the gauge
    +   * @return gauge, if found
    +   */
    +  def getGauge(gaugeName: String): Option[Gauge[_]] = {
    +    Option(metricRegistry.getGauges(new MetricFilter {
    +      def matches(name: String, metric: Metric): Boolean = name == gaugeName
    +    }).get(gaugeName))
    +  }
    +
    +  /**
    +   * Get a Long gauge.
    +   * @param gaugeName name of the gauge
    +   * @return gauge, if found
    +   * @throws ClassCastException if the gauge is found but of the wrong type
    +   */
    +  def getLongGauge(gaugeName: String): Option[Gauge[Long]] = {
    +    Option(metricRegistry.getGauges(new MetricFilter {
    --- End diff --
    
    done, &  cleaned up all the filtering to have a single `MetricByName` filter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/9571
  
    **[Test build #64586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64586/consoleFull)** for PR 9571 at commit [`d39996a`](https://github.com/apache/spark/commit/d39996aaabcc0de873bf410eeed6448d6d259c42).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-219020127
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9571#discussion_r74911975
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
    @@ -114,28 +123,45 @@ class HistoryServer(
        * this UI with the event logs in the provided base directory.
        */
       def initialize() {
    -    attachPage(new HistoryPage(this))
    +    if (!initialized.getAndSet(true)) {
    --- End diff --
    
    ...Pulled it from the test suite, so the check is superflous, and cut it...
    
    Looking at this more, I think the metrics setup should be moved to `bind()`, rather than `initialize()`, precisely because the metrics setup may be doing things in terms of thread startup, and you don't want that to happen so early in class construction. Doing that and the testing/review to make sure that the later-metrics-init won't break assumptions elsewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11373] [CORE] Add metrics to the Histor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9571#issuecomment-181380436
  
    **[Test build #50921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50921/consoleFull)** for PR 9571 at commit [`889978c`](https://github.com/apache/spark/commit/889978c52a9efa29e287d6e6d4c57f1fa9717442).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org