You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Bozarth (JIRA)" <ji...@apache.org> on 2018/01/26 22:49:00 UTC

[jira] [Commented] (SPARK-23237) Add UI / endpoint for threaddumps for executors with active tasks

    [ https://issues.apache.org/jira/browse/SPARK-23237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341730#comment-16341730 ] 

Alex Bozarth commented on SPARK-23237:
--------------------------------------

Unlike the related task, I'm not sure about this one. I see the need as you stated it, but also as you stated, it would be difficult to go about it. I'm willing to look at a PR for this but I wouldn't hold out hope for convincing me.

> Add UI / endpoint for threaddumps for executors with active tasks
> -----------------------------------------------------------------
>
>                 Key: SPARK-23237
>                 URL: https://issues.apache.org/jira/browse/SPARK-23237
>             Project: Spark
>          Issue Type: New Feature
>          Components: Web UI
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Frequently, when there are a handful of straggler tasks, users want to know what is going on in those executors running the stragglers.  Currently, that is a bit of a pain to do: you have to go to the page for your active stage, find the task, figure out which executor its on, then go to the executors page, and get the thread dump.  Or maybe you just go to the executors page, find the executor with an active task, and then click on that, but that doesn't work if you've got multiple stages running.
> Users could figure this by extracting the info from the stage rest endpoint, but it's such a common thing to do that we should make it easy.
> I realize that figuring out a good way to do this is a little tricky.  We don't want to make it easy to end up pulling thread dumps from 1000 executors back to the driver.  So we've got to come up with a reasonable heuristic for choosing which executors to poll.  And we've also got to find a suitable place to put this.
> My suggestion is that the stage page always has a link to the thread dumps for the *one* executor with the longest running task.  And there would be a corresponding endpoint in the rest api with the same info, maybe at {{/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/slowestTaskThreadDump}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org