You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Mac Yang (JIRA)" <ji...@apache.org> on 2008/10/15 03:36:44 UTC

[jira] Created: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Capacity Scheduler to provide a scheduler history log to record actions taken and why
-------------------------------------------------------------------------------------

                 Key: HADOOP-4413
                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/capacity-sched
            Reporter: Mac Yang


It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4413:
--------------------------------

    Attachment: 4413.2.patch

Attaching patch (4413.2.patch) that does the following: 
* the log format is the same as that used by JobHistory
* Added JobID parameters to some of the event methods
* updated the Scheduler documentation
* modified log4j.properties to support logging to a separate file



> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch, 4413.2.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666245#action_12666245 ] 

Hemanth Yamijala commented on HADOOP-4413:
------------------------------------------

Vivek, if we want to correlate events about a job from two disparate logs - the jt logs and the capacity scheduler logs, there must be some key that should tie them together, right ? I thought that should be the job id or in case of tasks, the task id. These fields should be there in both the logs. Am I missing something here ?

bq. Plus, we don't want too many changes to CapacitySchedulerInstrumentation - it acts like an interface.

Agreed. +1. So, let's leave the scheduler instance in.

bq. Again, I sense that what all we want to capture will become clearer once we run this thing and start analyzing life cycle events. I've tried to capture whatever I thought would be important. But feel free to suggest other events.

I can see this is going to be an ongoing effort. So, your argument about adding new events as the need arises seems very valid. So, let's ignore thinking about new events. 

For the events defined now, the general approach I would take is to include more information than less - so it leaves options open. Again for same reasons, because this is an interface, we may not want to frequently change it as we see the need to do so. So, I think there are only 2 I can add:

foundMapTask: Include Jobid and taskid
blockonHighMemJob: include jobid

Rest look fine.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665684#action_12665684 ] 

Hemanth Yamijala commented on HADOOP-4413:
------------------------------------------

Vivek, I reviewed the patch (from the patch file) and have a few comments:

- Can you explain why you need the scheduler object in the instrumentation classes ? It seems like the dependency should be the other way round, and I also couldn't see where it is being used.
- For many of the APIs defined, it seems to make sense to include some more information like which job and which task are affected. This will allow us to consolidate information by task or job and give better information.
- I did not see events related to job lifecycle - like when it was submitted, initialized, scheduled, completed etc. I think this is required, no ?
- At the same time, I don't see how the lookingFor*Task events are useful. Can you explain a use case ?
- For creating an instance of the CapacitySchedulerInstrumentation, you can use the ReflectionUtils API. Not entirely sure, but this may mean you have a default constructor with a setter for the Scheduler config object, and if necessary the scheduler object as well.
- The API toFullPropertyName is made private, but is being used by TestQueueCapacities. Either the code must be duplicated in the test method, or it should be left package private.

Apart from these, I also had a discussion with Mac and the Chukwa team, and we thought of two things that would really help integration with Chukwa:
- Like Mac indicated above, it would be good if the event log was more formatted than it is now.
- For the time series data, which is being captured via the setQueueStats, it was felt that this could be very easily done via a MetricsContext, and the advantage is that there is automatic integration with Chukwa (as it is doing this for all other parts of Hadoop - like DFS, Mapred etc). Please look at o.a.h.ipc.metrics.RpcMetrics for a simple example of how to use it.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665527#action_12665527 ] 

Hemanth Yamijala commented on HADOOP-4413:
------------------------------------------

Vivek, the patch doesn't apply cleanly to trunk anymore. Can you please regenerate a new patch ?

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663894#action_12663894 ] 

Mac Yang commented on HADOOP-4413:
----------------------------------

When logging an event, would it make sense to use something like <event type> key="value" type of structure (similar to the JobHistory)?

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639816#action_12639816 ] 

Vinod K V commented on HADOOP-4413:
-----------------------------------

All the scheduler log goes to the same log as JobTracker now. May be we should move the scheduler logging to a new log file. I already see that scheduler logging statements and JT logging statements get drowned in each other. And in debug level, the situation only gets worse. But one big advantage we can have by a single log file is the time-line of various events occurring in a scheduler w.r.t events happening in the JT.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666251#action_12666251 ] 

Hemanth Yamijala commented on HADOOP-4413:
------------------------------------------

I am in two minds about setQueueStats which captures time series data. While on the one hand, a log file based approach to capture this time series data would help us move faster on this JIRA, on the other hand, if the Chukwa team cannot consume it at all, we might need to end up changing the implementation anyway.

So, Mac, what would you suggest ? For the time being, would it be OK to go with the log based approach, as defined presently ? And then change it (over time) to use the Metrics API ?

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666061#action_12666061 ] 

Vivek Ratan commented on HADOOP-4413:
-------------------------------------

@Mac:
bq. I think either way we will want to be able to correlate the job life cycle events with the scheduler events.
Absolutely. That's why I kept the Job IDs out of the methods of CapacitySchedulerInstrumentation. If we can't synchronize the scheduler's events with the jobs' events,  we can look at modify these methods. We're logging, or collecting, a lot of information. The key is to see how to parse this information to present a unified life cycle view - for a job, for a queue, etc. 

@hemanth:
bq. The other two classes are using it, and so they need it. We could add it when required, no ?
ChukwaTTInstru doesn't use the TaskTracker member variable, though TaskTrackerMetricsInst does. The Scheduler member variable seems useful (for future classes) and logical to be in CapacitySchedulerInstrumentation. Plus, we don't want too many changes to CapacitySchedulerInstrumentation - it acts like an interface. 

bq. I think some of the information is not captured by the jobtracker instrumentation at a job level - memory based blocking for instance, also our initialization logic is different.
We capture memory based blocking through CapacitySchedulerInstrumentation.blockOnHighMemJob. Does that need a job parameter? Maybe not. Maybe we only care to know about how many times we blocked. If we also want to know on which job we blocked, we can add a job parameter. 
Do we want to capture events in job initialization? I'm not sure. On one hand, job initialization is an internal thing - it's not an external facing event. I see CapacitySchedulerInstrumentation as capturing the external events of the scheduler, events that are familiar to a use or to Ops. If a job's running, I know it's initialized. If I want to detect how well my initialization routine is running, I'd use log files for that. However, if we feel the need to capture and track job initialization events, we can add them. I just didn't see a need. But if you do, it would be great if you can suggest what methods to add to capture initialization of jobs. 

bq. Essentially, if we could work a little bit on what kind of information we want captured, it might help us better
I think we have, at least to get started. There's a listing of what we want to capture at the beginning of this Jira. I think we're covering all of that. Do you feel we're missing something?  Again, I sense that what all we want to capture will become clearer once we run this thing and start analyzing life cycle events. I've tried to capture whatever I thought would be important. But feel free to suggest other events. 



> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala resolved HADOOP-4413.
--------------------------------------

    Resolution: Duplicate

This issue and the patch will no longer apply to the current version of the capacity scheduler because there have been serious changes - like removal of pre-emption etc. I have filed HADOOP-5930 to start afresh, so that it will be easier to track. Closing this bug as a duplicate.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch, 4413.2.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665694#action_12665694 ] 

Vivek Ratan commented on HADOOP-4413:
-------------------------------------

bq. Vivek, the patch doesn't apply cleanly to trunk anymore. Can you please regenerate a new patch ?
I've been working on a new patch. Should be out soon. 

bq. Can you explain why you need the scheduler object in the instrumentation classes ?
You're right - it is not used currently, but it could very well be in the future. It seemed fair that the instrumentation class for the Scheduler have access to the Scheduler object, as the former decides what to log or capture. The same model is followed by JobTrackerInstrumentation and TaskTrackerInstrmentation, and I was being consistent.

bq. For many of the APIs defined, it seems to make sense to include some more information like which job and which task are affected.
bq. I did not see events related to job lifecycle - like when it was submitted, initialized, scheduled, completed etc
Remember that there is a JobTrackerInstrumentation class that captures job related information - jobs added/removed, tasks launched, etc. This is information independent of schedulers. CapacitySchedulerInstrumentation is instrumenting just the capacity scheduler, so I've included events that mimic the scheduler's logic. I didn't see it very useful to include job-specific information in the scheduler events, especially as that is captured by JobTrackerInstrumentation , but we can add that in later if we feel it's important for our analysis. 

bq. At the same time, I don't see how the lookingFor*Task events are useful.
Part of the Capacity Scheduler logic is to determine whether to assign a TT a map or reduce task, if a TT can accept both. The lookingFor*Task events capture this logic decision.

bq. For creating an instance of the CapacitySchedulerInstrumentation, you can use the ReflectionUtils API. ..
I'm following the same logic that is used to create the other instrumentation classes - JobTrackerInstrumentation and TaskTrackerInstrmentation.

bq. The API toFullPropertyName is made private, but is being used by TestQueueCapacities. Either the code must be duplicated in the test method, or it should be left package private.
Yes, I caught this yesterday too. Not sure why I made the change, but it's not needed. I'll remove it. 

bq. Like Mac indicated above, it would be good if the event log was more formatted than it is now.
Agreed. Am looking into that. 


> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640032#action_12640032 ] 

Mac Yang commented on HADOOP-4413:
----------------------------------

For the scheduler log entries that are meant for human consumption, there are definitely pros and cons as to whether to put them in the JobTracker log. However, in a fashion similar to the job log history, it would be very nice if the scheduler can output structured events and metrics into a separate file so that it's easier to parse.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665410#action_12665410 ] 

Vivek Ratan commented on HADOOP-4413:
-------------------------------------

bq. When logging an event, would it make sense to use something like <event type> key="value" type of structure (similar to the JobHistory)?

Sure. You could re-use parsing scripts, and you don't invent yet another format. Though you'd have to keep track of any changes to the JobHistory format. 

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4413:
--------------------------------

    Attachment: 4413.1.patch

Attaching the first patch (4413.1.patch), to get feedback on the approach.

* Much like _JobTrackerInstrumentation_, I've defined a _CapacitySchedulerInstrumentation_ class that defines the events and data we want to capture for the Capacity Scheduler. 
* There is a single 'implementation' of this class, _CapacitySchedulerLogInst_, that writes stuff to a log file. We may have implementations in the future that interact with Chukwa directly.
* The time series data is captured the same way the scheduler UI does. The Capacity Scheduler provides an object whose toString() method generates all the data that needs to be captured. This is the same object used by the UI. A thread in _CapacitySchedulerLogInst_ periodically writes this data to a log file. The default period is 5 seconds, but can be overwritten through the capacity scheduler's configuration. 
* Events are written to the log file right away. If this proves to be expensive, we can buffer them up (in a simple linked list of strings, perhaps) and write them periodically as well. These events capture the main scheduler decisions. 

To be done: 
* update documentation on capacity scheduler configuration
* log4J settings to log to a separate file
* make sure we're capturing all relevant events

Feedback welcome.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643056#action_12643056 ] 

Mac Yang commented on HADOOP-4413:
----------------------------------

The following list is from a meeting with Arun, Owen, Runping, and Vivek.

- time series data on the queue for SLA: (maps, reducer) * (running, pending, guaranteed capacity)
- % of queue for the top N users
- # of sec after capacity = quota
- preemption event
- priority inversion (only user limit)
- timer expiration event
- idle slots
- record the "why" when taking an action

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665900#action_12665900 ] 

Hemanth Yamijala commented on HADOOP-4413:
------------------------------------------

bq. It seemed fair that the instrumentation class for the Scheduler have access to the Scheduler object, as the former decides what to log or capture. The same model is followed by JobTrackerInstrumentation and TaskTrackerInstrmentation, and I was being consistent.

The other two classes are using it, and so they need it. We could add it when required, no ?

bq. CapacitySchedulerInstrumentation is instrumenting just the capacity scheduler, so I've included events that mimic the scheduler's logic. I didn't see it very useful to include job-specific information in the scheduler events, especially as that is captured by JobTrackerInstrumentation , but we can add that in later if we feel it's important for our analysis.

I think some of the information is not captured by the jobtracker instrumentation at a job level - memory based blocking for instance, also our initialization logic is different. If we want to fine tune our initialization configuration we may want to look at details like how many jobs have we initialized and how many of them wait until they actually are scheduled. This will help us tune the number of jobs to initialize per user.

Essentially, if we could work a little bit on what kind of information we want captured, it might help us better.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4413) Capacity Scheduler to provide a scheduler history log to record actions taken and why

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665906#action_12665906 ] 

Mac Yang commented on HADOOP-4413:
----------------------------------


bq. Remember that there is a JobTrackerInstrumentation class that captures job related information - jobs added/removed, tasks launched, etc. This is information independent of schedulers. CapacitySchedulerInstrumentation is instrumenting just the capacity scheduler, so I've included events that mimic the scheduler's logic. I didn't see it very useful to include job-specific information in the scheduler events, especially as that is captured by JobTrackerInstrumentation , but we can add that in later if we feel it's important for our analysis. 

I think either way we will want to be able to correlate the job life cycle events with the scheduler events. If we leave the job related events to JobTrackerInstrumentation then we should make sure that the scheduler events contains the necessary information so we can correlate them easily during post processing.

> Capacity Scheduler to provide a scheduler history log to record actions taken and why
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4413
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Mac Yang
>         Attachments: 4413.1.patch
>
>
> It would be very useful if the capacity scheduler can provide a log that record the decisions made and actions taken by the scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.