You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Chen Ge (JIRA)" <ji...@apache.org> on 2016/07/13 04:50:20 UTC

[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374343#comment-15374343 ] 

Chen Ge commented on YARN-4091:
-------------------------------

Hi all,

Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief descriptions about newly added classes and test REST API.

Newly Added Classes:
ActivityManager:
	A class to store node or application allocations. It mainly contains operations for allocation start, add, update and finish.

NodeAllocation:
	It contains allocation information for one allocation in a node heartbeat. Detailed allocation activities are first stored in "AllocationActivity" as operations, then transformed to a tree structure. Tree structure starts from root queue and ends in leaf queue, application or container allocation.

AllocationActivity:
	It records an activity operation in allocation, which can be classified as queue, application or container activity. Other information include state, diagnostic, priority.

ActivityNode:
	It represents tree node in "NodeAllocation" tree structure. Each node may represent queue, application or container in allocation activity. Node may have children node if successfully allocated to next level.

ActivityDiagnosticConstant:
	Collection of diagnostics.

ActivityState:
	Collection of activity operation states.

AllocationState:
	Collection of allocation final states.

AllocationActivityType:
	Collection of types for activity operation.

AppAllocation:
	It contains allocation information for one application within a period of time. Each application allocation may have several allocation attempts.

ActivitiesInfo:
	DAO object to display node allocation activity.

NodeAllocationInfo:
	DAO object to display each node allocation in node heartbeat.

ActivityNodeInfo:
	DAO object to display node information in allocation tree. It corresponds to "ActivityNode" class.

AppActivitiesInfo:
	DAO object to display application activity.

AppAllocationInfo:
	DAO object to display application allocation detailed information.


Test REST API:
	look at next node’s activities(by default):
	http://localhost:18088/ws/v1/cluster/scheduler/activities

	Only look at specific node:
	http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87:75
	OR without port number
	http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87

	look at activities for specific application within a period of time(3s in default):
	http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022
	http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022&maxTime=5.2


Test class:
	TestRMWebServicesCapacitySched.java
	org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testActivityJSON
	org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testAppActivityJSON

Thanks for review. Please feel free to put forward any suggestions for improvements.

> Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, YARN-4091-design-doc-v1.pdf, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes the schedulers starts to take actions such as limit assigning containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler where it skips/rejects container assignment, activate application etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org