You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Siddharth Ahuja (Jira)" <ji...@apache.org> on 2020/08/25 07:44:00 UTC

[jira] [Comment Edited] (YARN-1806) webUI update to allow end users to request thread dump

    [ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183819#comment-17183819 ] 

Siddharth Ahuja edited comment on YARN-1806 at 8/25/20, 7:43 AM:
-----------------------------------------------------------------

This JIRA implements a feature for the addition of a "*Jstack*" button on the ResourceManager Web UI's individual application page accessible by visiting RM Web UI -> Applications -> Click on <app_id> (So, the breadcrumb would be {{Home / Applications / App [app_id] / Jstack}}) to trigger thread dumps for running YARN containers for a currently running application attempt. The thread dumps are captured as part of the stdout logs for the selected container and displayed as-is by querying the NodeManager node on which this container ran on.

As part of this feature, there are 2 panels implemented. The first panel displays two drop-downs, the first one displaying the currently running app attempt id and a "None" option (similar to "Logs" functionality). Once this is selected, it goes on to display another drop-down in the same panel that contains a listing of currently running containers for this application attempt id.

Once you select a container id from this second drop-down, another Panel is opened just below (again this is similar to the "Logs" functionality) that shows the selected attempt id and the container as the header with container's stdout logs also being displayed containing the thread dump that was triggered when the container was selected.

Following sets of API calls are made:

API calls made when the Jstack button is clicked:
	1. http://<rm>:8088/ws/v1/cluster/apps/<app_id> -> Get application info e.g. app state from RM,
	2. http://<rm>:8088/ws/v1/cluster/apps/<app_id>/appattempts -> Get application attempt info from RM, e.g. to get the app attempt state to see if it is RUNNING or not ([YARN-10381|https://issues.apache.org/jira/browse/YARN-10381]).

If the application is not RUNNING, then, there will be an error displayed for that based on info from 1. above. 
If the application is RUNNING, then, by checking the application attempts info for this app (there can be more than one app attempt), we display the application attempt id for the RUNNING attempt only. This is based on the info from 2. above.

API calls made when the app attempt is selected from the drop-down:
	3. http://<rm>:8088/ws/v1/cluster/apps/<app_id>/appattempts/<appattempt_id>/containers -> This is to get the list of running containers for the currently running app attempt from the RM.

API calls made when the container is selected from the drop-down:
	4. http://<rm>:8088/ws/v1/cluster/containers/<container_id>/signal/OUTPUT_THREAD_DUMP?user.name=<logged_in_user> -> This is for RM (that eventually calls NM through NM heartbeat) to send a SIGQUIT signal to the container process for the selected container ([YARN-8693|https://issues.apache.org/jira/browse/YARN-8693]). This is essentially a kill -3 and it generates a thread dump that are captured in the stdout logs of the container.
http://<nm>:8042/ws/v1/node/containerlogs/<container_id>/stdout -> This is for the NM that is running the selected container to acquire the stdout logs from this running container that contains the thread dump by the above call. 


was (Author: sahuja):
This JIRA implements a feature for the addition of a "Jstack" button on the ResourceManager Web UI's individual application page accessible by visiting RM Web UI -> Applications -> Click on <app_id> (So, the breadcrumb would be "Home / Applications / App [app_id] / Jstack") to trigger thread dumps for running YARN containers for a currently running application attempt. The thread dumps are captured as part of the stdout logs for the selected container and displayed as-is by querying the NodeManager node on which this container ran on.

As part of this feature, there are 2 panels implemented. The first panel displays two drop-downs, the first one displaying the currently running app attempt id and a "None" option (similar to "Logs" functionality). Once this is selected, it goes on to display another drop-down in the same panel that contains a listing of currently running containers for this application attempt id.

Once you select a container id from this second drop-down, another Panel is opened just below (again this is similar to the "Logs" functionality) that shows the selected attempt id and the container as the header with container's stdout logs also being displayed containing the thread dump that was triggered when the container was selected.

Following sets of API calls are made:

API calls made when the Jstack button is clicked:
	1. http://<rm>:8088/ws/v1/cluster/apps/<app_id> -> Get application info e.g. app state from RM,
	2. http://<rm>:8088/ws/v1/cluster/apps/<app_id>/appattempts -> Get application attempt info from RM, e.g. to get the app attempt state to see if it is RUNNING or not ([YARN-10381|https://issues.apache.org/jira/browse/YARN-10381]).

If the application is not RUNNING, then, there will be an error displayed for that based on info from 1. above. 
If the application is RUNNING, then, by checking the application attempts info for this app (there can be more than one app attempt), we display the application attempt id for the RUNNING attempt only. This is based on the info from 2. above.

API calls made when the app attempt is selected from the drop-down:
	3. http://<rm>:8088/ws/v1/cluster/apps/<app_id>/appattempts/<appattempt_id>/containers -> This is to get the list of running containers for the currently running app attempt from the RM.

API calls made when the container is selected from the drop-down:
	4. http://<rm>:8088/ws/v1/cluster/containers/<container_id>/signal/OUTPUT_THREAD_DUMP?user.name=<logged_in_user> -> This is for RM (that eventually calls NM through NM heartbeat) to send a SIGQUIT signal to the container process for the selected container ([YARN-8693|https://issues.apache.org/jira/browse/YARN-8693]). This is essentially a kill -3 and it generates a thread dump that are captured in the stdout logs of the container.
http://<nm>:8042/ws/v1/node/containerlogs/<container_id>/stdout -> This is for the NM that is running the selected container to acquire the stdout logs from this running container that contains the thread dump by the above call. 

> webUI update to allow end users to request thread dump
> ------------------------------------------------------
>
>                 Key: YARN-1806
>                 URL: https://issues.apache.org/jira/browse/YARN-1806
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ming Ma
>            Assignee: Siddharth Ahuja
>            Priority: Major
>
> Both individual container gage and containers page will support this. After end user clicks on the request link, they can follow to get to stdout page for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org