You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Chen Ge (JIRA)" <ji...@apache.org> on 2016/08/01 17:14:20 UTC

[jira] [Comment Edited] (YARN-4091) Add REST API to retrieve scheduler activity

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402442#comment-15402442 ] 

Chen Ge edited comment on YARN-4091 at 8/1/16 5:14 PM:
-------------------------------------------------------

Thanks [~Sunil G] for tests and comments. I have modified patch based on your suggestions.

1, 2, 3, 6 have addressed.
For 4, *finalAllocationState* are final state for one allocation. It is possible that it fails to allocate any containers due to queue issues, so there is no chance to go into application level. If we change the name to *finalAppAllocationState*, it is not proper to describe the condition that only relates to queue.
Also for 5, We think *allocationState* is meaningful. If it is accepted, it means allocation process successfully goes to next level. If it fails in queue level, we need state to indicate that. It does not always go into application level.
For 7 and 9, it is helpful to add these information, but it will change a lot based on current implementations and may need further code optimization. I am afraid I could not complete it due to limited time. I believe there will be more thoughts and improvements in the future.
For 8, it is missing because second app is not added into application allocation list during node heartbeat. When AM resource has not been successfully allocated, there is no activity in node heartbeat. Not to mention the activity recording for it.

Thanks again for the detailed tests!


was (Author: chenge):
Thanks ~Sunil G for tests and comments. I have modified patch based on your suggestions.

1, 2, 3, 6 have addressed.
For 4, *finalAllocationState* are final state for one allocation. It is possible that it fails to allocate any containers due to queue issues, so there is no chance to go into application level. If we change the name to *finalAppAllocationState*, it is not proper to describe the condition that only relates to queue.
Also for 5, We think *allocationState* is meaningful. If it is accepted, it means allocation process successfully goes to next level. If it fails in queue level, we need state to indicate that. It does not always go into application level.
For 7 and 9, it is helpful to add these information, but it will change a lot based on current implementations and may need further code optimization. I am afraid I could not complete it due to limited time. I believe there will be more thoughts and improvements in the future.
For 8, it is missing because second app is not added into application allocation list during node heartbeat. When AM resource has not been successfully allocated, there is no activity in node heartbeat. Not to mention the activity recording for it.

Thanks again for the detailed tests!

> Add REST API to retrieve scheduler activity
> -------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, SchedulerActivityManager-TestReport v2.pdf, SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.preliminary.1.patch, app_activities.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes the schedulers starts to take actions such as limit assigning containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler where it skips/rejects container assignment, activate application etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org