You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Vandenberg (JIRA)" <ji...@apache.org> on 2017/08/01 22:23:01 UTC

[jira] [Created] (SPARK-21598) Collect usability/events information from Spark History Server

Eric Vandenberg created SPARK-21598:
---------------------------------------

             Summary: Collect usability/events information from Spark History Server
                 Key: SPARK-21598
                 URL: https://issues.apache.org/jira/browse/SPARK-21598
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 2.0.2
            Reporter: Eric Vandenberg
            Priority: Minor


The Spark History Server doesn't currently have a way to collect usability/performance on its main activity, loading/replay of history files.  We'd like to collect this information to monitor, target and measure improvements in the spark debugging experience (via history server usage.)  Once available these usability events could be analyzed using other analytics tools.

The event info to collect:
    SparkHistoryReplayEvent(
        logPath: String,
        logCompressionType: String,
        logReplayException: String // if an error
        logReplayAction: String // user replay, vs checkForLogs replay
        logCompleteFlag: Boolean,
        logFileSize: Long,
        logFileSizeUncompressed: Long,
        logLastModifiedTimestamp: Long,
        logCreationTimestamp: Long,
        logJobId: Long,
        logNumEvents: Int,
        logNumStages: Int,
        logNumTasks: Int
        logReplayDurationMillis: Long
)

The main spark engine has a SparkListenerInterface through which all compute engine events are broadcast.  It probably doesn't make sense to reuse this abstraction for broadcasting spark history server events since the "events" are not related or compatible with one another.  Also note the metrics registry collects history caching metrics but doesn't provide the type of above information.

Proposal here would be to add some basic event listener infrastructure to capture history server activity events.  This would work similar to how the SparkListener infrastructure works.  It could be configured in a similar manner, eg. spark.history.listeners=MyHistoryListenerClass.

Open to feedback / suggestions / comments on the approach or alternatives.

cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987]





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org