You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2014/11/11 20:32:35 UTC

[jira] [Commented] (SPARK-3682) Add helpful warnings to the UI

    [ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206891#comment-14206891 ] 

Kay Ousterhout commented on SPARK-3682:
---------------------------------------

Some of the metrics you mentioned fall under the additional metrics that are hidden by default; as part of this, it might be nice to automatically show a metric as part of warning a user that the value is problematic.

> Add helpful warnings to the UI
> ------------------------------
>
>                 Key: SPARK-3682
>                 URL: https://issues.apache.org/jira/browse/SPARK-3682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Web UI
>    Affects Versions: 1.1.0
>            Reporter: Sandy Ryza
>         Attachments: SPARK-3682Design.pdf
>
>
> Spark has a zillion configuration options and a zillion different things that can go wrong with a job.  Improvements like incremental and better metrics and the proposed spark replay debugger provide more insight into what's going on under the covers.  However, it's difficult for non-advanced users to synthesize this information and understand where to direct their attention. It would be helpful to have some sort of central location on the UI users could go to that would provide indications about why an app/job is failing or performing poorly.
> Some helpful messages that we could provide:
> * Warn that the tasks in a particular stage are spending a long time in GC.
> * Warn that spark.shuffle.memoryFraction does not fit inside the young generation.
> * Warn that tasks in a particular stage are very short, and that the number of partitions should probably be decreased.
> * Warn that tasks in a particular stage are spilling a lot, and that the number of partitions should probably be increased.
> * Warn that a cached RDD that gets a lot of use does not fit in memory, and a lot of time is being spent recomputing it.
> To start, probably two kinds of warnings would be most helpful.
> * Warnings at the app level that report on misconfigurations, issues with the general health of executors.
> * Warnings at the job level that indicate why a job might be performing slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org