You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:31 UTC

[jira] [Updated] (SPARK-21598) Collect usability/events information from Spark History Server

     [ https://issues.apache.org/jira/browse/SPARK-21598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-21598:
---------------------------------
    Labels: bulk-closed  (was: )

> Collect usability/events information from Spark History Server
> --------------------------------------------------------------
>
>                 Key: SPARK-21598
>                 URL: https://issues.apache.org/jira/browse/SPARK-21598
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.0.2
>            Reporter: Eric Vandenberg
>            Priority: Minor
>              Labels: bulk-closed
>
> The Spark History Server doesn't currently have a way to collect usability/performance on its main activity, loading/replay of history files.  We'd like to collect this information to monitor, target and measure improvements in the spark debugging experience (via history server usage.)  Once available these usability events could be analyzed using other analytics tools.
> The event info to collect:
>     SparkHistoryReplayEvent(
>         logPath: String,
>         logCompressionType: String,
>         logReplayException: String // if an error
>         logReplayAction: String // user replay, vs checkForLogs replay
>         logCompleteFlag: Boolean,
>         logFileSize: Long,
>         logFileSizeUncompressed: Long,
>         logLastModifiedTimestamp: Long,
>         logCreationTimestamp: Long,
>         logJobId: Long,
>         logNumEvents: Int,
>         logNumStages: Int,
>         logNumTasks: Int
>         logReplayDurationMillis: Long
> )
> The main spark engine has a SparkListenerInterface through which all compute engine events are broadcast.  It probably doesn't make sense to reuse this abstraction for broadcasting spark history server events since the "events" are not related or compatible with one another.  Also note the metrics registry collects history caching metrics but doesn't provide the type of above information.
> Proposal here would be to add some basic event listener infrastructure to capture history server activity events.  This would work similar to how the SparkListener infrastructure works.  It could be configured in a similar manner, eg. spark.history.listeners=MyHistoryListenerClass.
> Open to feedback / suggestions / comments on the approach or alternatives.
> cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org