You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Vandenberg (JIRA)" <ji...@apache.org> on 2017/08/01 22:23:01 UTC
[jira] [Created] (SPARK-21598) Collect usability/events information
from Spark History Server
Eric Vandenberg created SPARK-21598:
---------------------------------------
Summary: Collect usability/events information from Spark History Server
Key: SPARK-21598
URL: https://issues.apache.org/jira/browse/SPARK-21598
Project: Spark
Issue Type: Improvement
Components: Scheduler
Affects Versions: 2.0.2
Reporter: Eric Vandenberg
Priority: Minor
The Spark History Server doesn't currently have a way to collect usability/performance on its main activity, loading/replay of history files. We'd like to collect this information to monitor, target and measure improvements in the spark debugging experience (via history server usage.) Once available these usability events could be analyzed using other analytics tools.
The event info to collect:
SparkHistoryReplayEvent(
logPath: String,
logCompressionType: String,
logReplayException: String // if an error
logReplayAction: String // user replay, vs checkForLogs replay
logCompleteFlag: Boolean,
logFileSize: Long,
logFileSizeUncompressed: Long,
logLastModifiedTimestamp: Long,
logCreationTimestamp: Long,
logJobId: Long,
logNumEvents: Int,
logNumStages: Int,
logNumTasks: Int
logReplayDurationMillis: Long
)
The main spark engine has a SparkListenerInterface through which all compute engine events are broadcast. It probably doesn't make sense to reuse this abstraction for broadcasting spark history server events since the "events" are not related or compatible with one another. Also note the metrics registry collects history caching metrics but doesn't provide the type of above information.
Proposal here would be to add some basic event listener infrastructure to capture history server activity events. This would work similar to how the SparkListener infrastructure works. It could be configured in a similar manner, eg. spark.history.listeners=MyHistoryListenerClass.
Open to feedback / suggestions / comments on the approach or alternatives.
cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org