You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2008/08/09 02:10:44 UTC

[jira] Created: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Decide how to integrate scheduler info into CLI and job tracker web page
------------------------------------------------------------------------

                 Key: HADOOP-3930
                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Matei Zaharia
            Priority: Minor


We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631653#action_12631653 ] 

Owen O'Malley commented on HADOOP-3930:
---------------------------------------

I think we need to have a separate class for the queue command. Otherwise, users can cross the commands like:

{quote}
hadoop queue -kill job_00001
{quote}

which would be confusing. With separate classes, we can only support the sub-commands that make sense.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: 3930-1.patch

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626182#action_12626182 ] 

Owen O'Malley commented on HADOOP-3930:
---------------------------------------

QueueInfo should have private fields and getters.

JobSubmissionProtocol is not public and therefore the JobClient needs identical methods.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-10.patch

Attaching patch with following changes:

- Made _JobQueueClient_ to default level access instead of public access.
- Renamed methods in _JobSubmissionProtocol_ to _getQueues()_,_getQueueInfo(queueName)_ and getJobsFromQueue(queueName)_
- Mirrored these methods in _JobClient_.
- Made change in _JobQueueClient_ to use public methods in _JobClient_ to use these methods for Job Queue querying.
- Added javadoc for all Public classes and public methods introduced by the patch.
- Removed the map which stored _JobQueueInfo_ in _QueueManager_.
- Changed type of SchedulingInfo in _JobQueueInfo_ to String.
- Constructing _JobQueueInfo_ on fly when requested in _QueueManager_


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622440#action_12622440 ] 

Vivek Ratan commented on HADOOP-3930:
-------------------------------------

bq. Regarding the comparator, I made it that ...it would also be possible to return the whole job list and filter it afterwards - which one is easier?

I don't think a Comparator is the right abstraction here. There is a difference between filtering and reordering. A Comparator is probably needed for the latter, but not for filtering. The Scheduler imposes an ordering on the jobs. A caller may choose to see (filter) only some of those jobs, but the ordering is determined by the Scheduler. I think you need a method like: 
{code}
Collection<JobInProgress> getJobs(String queueName)
{code}

Users can filter this collection as they seem fit. 


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625294#action_12625294 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

A couple of points if we are moving towards Owen's proposal:

- In Sreekanth's comment above, he's mentioned that the attached patch adds one column for each entry in the Map returned by {{get...SchedulingInfo()}} APIs. The other option, of course, is to display all scheduling info in a single column. 
-- The advantage of the multi-column approach is purely usability and aesthetics (schedulers which have per queue scheduling info will show only a name, and the scheduling info as a string, which will look quite odd). Also, it will allow changes to the UI easier, IMHO.
-- The advantage of the single column approach is simplicity for the current implementation.
- I personally prefer multi-column, but willing to go with consensus.
- If we go with the multi-column approach though, building scheduling information out of a {{toString}} API becomes harder.

- If we do go with Owen's approach, I think we might also need:
{code}
class TaskScheduler {
  Object getSchedulerInfo(String queueName);
}
{code}
to handle cases where the scheduler has a per queue specific info.

Please try to vote on your preferred UI approach, if any, so we can move this forward.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-7.patch

Attaching patch with following modification from the previous patch:
- Added a new class called _JobQueueClient_ which implements the Command Line Interface methods for JobQueue related operations.
- Refactored _percentageGraph_ from JspHelper in the namenode class to ServletUtil.
- Removed dependency on _JspHelper_ conf from mapred jsp pages.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: mockup.JPG

UI Mockup with default JobQueueTaskScheduler

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-3930:
-------------------------------------

    Status: Open  (was: Patch Available)

Sreekanth selected Submit Patch by mistake instead of attaching the patch. Canceling it on his behalf, as he's not added to the list of contributors yet.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623595#action_12623595 ] 

dhruba borthakur commented on HADOOP-3930:
------------------------------------------

The jobs-page is actually useful to users who have submitted jobs. The scheduler information is typically important for cluster administrators. Does it make sense to make the scheduler information show up as a separate page rather than the same page that lists all user's jobs? Of course, power users might want to see scheduling information sometime.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3930:
------------------------------------

    Release Note: Changed TaskScheduler to expose API for Web UI and Command Line Tool.  (was: Changes to TaskScheduler to expose API which Web UI and Command Line Tool can use)

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-11.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-3.patch

Attaching a new patch with following changes:

Made modification to _CapacityTaskScheduler_ to use the new API for the web UI and command line interface

Added test case which makes use of the new methods introduced in the _JobSubmissionProtocol_

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622167#action_12622167 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

We probably also need a way to get all the configured queues. Something like:

- public List<String> getQueues();

By default, this would return the single default queue that is there today in the jobtracker. Makes sense ?

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626154#action_12626154 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

One thing we haven't discussed this far is the changes to the framework to aid the CLI.

Showing scheduling information related to a job seems easy. We can augment {{JobStatus}} to contain a {{String schedulerInfo}}.

For showing the queue related information, one approach could be as follows:
{code}
public class QueueInfo implements Writable {
  String queueName;
  String schedulerInfo;
  ...
}

public interface JobSubmissionProtocol {
  ...
  QueueInfo[] getQueues();
  QueueInfo getQueueInfo(String queue);
  JobStatus[] getJobs(String queue);  
}
{code}

These APIs are similar to the Job related APIs, like {{getAllJobs(), getJobStatus(JobID), and getMap/ReduceTaskReports(JobID)}}. Still, I am a little worried adding these to {{JobSubmissionProtocol}} since {{getQueues() and getQueueInfo()}}  don't per-se relate to jobs directly. The alternative though seems to be to define a new protocol that has this info. Open to comments on which is better.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631433#action_12631433 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

bq. Left jobsToComplete as is.
I was thinking of something like:
{code}
private JobStatus[] getJobStatus(Collection<JobInProgress> jips, boolean onlyRunning) {
  // ..
  if (onlyRunning) {
    // consider only jobs which are running or prep.
  }
}
{code}

Would that work ?

Regarding tests, taking cue from APIs like {{getAllJobs}}, I think it is OK to provide wrapper APIs around the queue info related methods. These could be package private and the test case can directly access these. So, something like:
{code}
JobQueueInfo[] getJobQueueInfos() { return jobSubmitClient.getJobQueueInfos(); }
private void displayQueueList() {
  JobQueueInfo[] queues = getJobQueueInfos();
  //
}
{code}

Agree with rest of your explanations.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621865#action_12621865 ] 

Vivek Ratan commented on HADOOP-3930:
-------------------------------------

I think we need to first decide whether queues are explicit in this API or not. The problem with making queues explicit in the API is that every scheduler will have to support one,  or at least a default one. But that's not so bad, IMO. 

getSchedulingInfo() should really return key-value pairs for queues, not for jobs. In the HADOOP-3445 scheduler, for example, we need to display scheduling information associated with a queue - its capacity (both 'guaranteed' and 'allocated'), how many unique users have submitted jobs, how many tasks are running, how many are waiting. etc. This information is per queue, and doesn't make sense per job. I'd much rather have getSchedulingInfo() take in a queue name as a parameter, if we make queues explicit. In fact, I don't see what kind of scheduling information you'd associate with a job. Matei, do you have examples of what getSchedulingInfo would return for jobs? 

Similarly, getJobComparator() makes more sense when applied to a queue. In 3445, jobs are ordered per queue, and there is no global ordering. Furthermore, doesn't it make more sense to get a sorted collection of jobs, per queue, back from the scheduler, rather than a Comparator? Or are you imagining the UI and CLI to maintain a list of jobs all the time and then apply the comparator periodically? 



> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624292#action_12624292 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

bq. Also, as we are discussing in HADOOP-3698, it may be that queues being acknowledged as part of the mapred system, might not be in the scheduler API. I think we should wait for a consensus on that before moving forward on this issue

I must clarify that I was talking only about the {{List<String> getQueues()}} API, and not the rest of the methods proposed here. Sorry for any confusion caused.

As mentioned [here|https://issues.apache.org/jira/browse/HADOOP-3698?focusedCommentId=12624270#action_12624270], we are now considering adding this to a new class called {{QueueManager}}.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-6.patch

Refactored _getJobsToComplete_, _getAllJobs_ and _getAllJobs(queue) in JobTracker.

Introduced two new package level methods for getting _JobQueueInfo_ from JobClient and used the same in test case.

Fixed No newline at end of file message

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-4.patch

Updated the _JobQueueInfo_ class because of a NullPointerException being thrown for the default scheduler, pointed out by Hemanth.

Fixed a findbugs warnings in _LimitTasksPerJobTaskScheduler_ regarding incorrect synchronization.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624884#action_12624884 ] 

Owen O'Malley commented on HADOOP-3930:
---------------------------------------

After thinking about this for a bit, I think a more natural interface for getSchedulingInfo would be:

{code}
class JobInProgress {
  ...
  Object getSchedulerInfo() { ... }
  void setSchedulerInfo(Object info) {...}
{code}

The scheduler can then add its own information directly into the JobInProgress. Clearly each scheduler would have its own type for scheduler info. The framework would use the scheduler info's toString() method to generate the string for the user. Thoughts?

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631463#action_12631463 ] 

Doug Cutting commented on HADOOP-3930:
--------------------------------------

> we should make the APIs public and not package private.

That's the subject of HADOOP-3822.  Let's not introduce new public APIs in this issue.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-9.patch

Fixing the start time issue, which was missed out while refactoring the code.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-5.patch

Made modifications according to comments. I have also mentioned reasons why somethings are left as it is from previous version of the patch.

{quote}
JobTracker:
    - There's code being repeated in getAllJobs(), getAllJobs(String queue) and jobsToComplete. I think it should be factored out so changes to one of the methods (for e.g. to return a new field) need not be duplicated.
{quote} 
Code repetition for converting collection _JobInProgress_ to an array of _JobStatus_ has been removed. Modified getAllJobs and getAllJobs(Queue). Left jobsToComplete as is.

{quote}
JobQueueInfo:
    - schedulingInfo stored here is a stringified version. I think it should be declared a String and get/set should deal with strings. The caller should basically call with actualObject.toString(). This makes it similar to JobStatus.
{quote}
The reason why we are using an object and passing only String over wire is because we are setting scheduling information only once. Then underlying reference of the scheduling information is updated by the respective TaskScheduler's and we do a toString() while passing over wire. This way we can avoid to constantly update the scheduling information in queue manager. For example check _CapacityTaskScheduler_.

{quote}
JspUtil:
    - This is including JspHelper which is a class from the NameNode package. I don't think it is a good idea for a MapRed class to depend on this, however I understand this has always been this way. Maybe we should file a new JIRA to fix it.
{quote}
It is using JSPHelper from the package to generate the percentage graph. Maybe that method should be moved into ServletUtil class in core util package.
{quote}
CapacityTaskScheduler:
- Does not need supportsPriority as a separate field in the SchedulingInfo class. You can pick it up from one of the QueueSchedulingInfo objects.
{quote}
If a queue supports priority or not is stored by the JobQueueManager in capacity scheduler. The queue scheduling information object does not contain if a particular queue can support priority or not. So that is why there is a seperate field.
{quote}
TestJobQueueInformation:
    - I think you can use JobClient, instead of directly dealing with JobSubmissionProtocol and having to duplicate the methods for createRPCProxy etc.
{quote}

Reason why I am not using JobClient directly is because: by calling them we are going to call up display methods, if we call up display methods then we would have to parse the output of the job client and then do the test for equality. Moreover all the display method newly defined are private. If it is really required I can make them public then change test to parse the display string and test equality.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-3930:
----------------------------------

    Description: 
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
* public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

  was:
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.


Added getSchedulingInfo(queue).

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631728#action_12631728 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

One nit. Previously, the {{JobTracker}} was setting the start time in the {{JobStatus}} for all the jobs. This is missing from the refactored code, and hence client is showing start time as 0.

Other than that, it looks good to me.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633338#action_12633338 ] 

Hudson commented on HADOOP-3930:
--------------------------------

Integrated in Hadoop-trunk #611 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/])

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-11.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622253#action_12622253 ] 

Matei Zaharia commented on HADOOP-3930:
---------------------------------------

Okay, I added that too.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-3930:
----------------------------------

    Description: 
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
* public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.
* public List<String> getQueues();

  was:
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
* public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624262#action_12624262 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

Regarding where the scheduler information should be - there are two types of scheduler information:
- Job specific - which should be with the jobs
- System wide - which can be either with the jobs or on a different page, as Dhruba points. I am fine with either, but I am leaning more towards having it on the jobs page because:
-- Then all scheduler information is on one page
-- As Dhruba agrees, power users might still want to see scheduling information.

Also, as we are discussing in HADOOP-3698, it may be that queues being acknowledged as part of the mapred system, might not be in the scheduler API. I think we should wait for a consensus on that before moving forward on this issue.



> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631458#action_12631458 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

Looked at other changes. They seem fine to me.
Except in jobqueue_details.jsp, there's a line coming at the end as follows: {{Hadoop, 2008. \ No newline at end of file}}




> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-3930:
-------------------------------------

    Fix Version/s: 0.19.0

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624895#action_12624895 ] 

Matei Zaharia commented on HADOOP-3930:
---------------------------------------

That's a good idea, and should make the schedulers more efficient as well.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624818#action_12624818 ] 

Vinod Kumar Vavilapalli commented on HADOOP-3930:
-------------------------------------------------

Some comments about the patch ( independent of where the scheduler information is displayed)
- _If_ job specific information is seen to be not very common across schedulers, we can give a default implementation for {{getSchedulingInfo(JobInProgress job)}} returning null in TaskScheduler.
- Every scheduler may not be concerned about how clients of scheduling information ordered it. Either {{getQueueSchedulingParameterList()}} and {{getJobSchedulingParameterList()}} can have default implementations to return the keys of the corresponding SchedulingInfo maps, or we can altogether remove these methods and treat scheduler information similar to the way we treat job-list, let the scheduler give out information in the order that it imposes.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626042#action_12626042 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

I had an offline conversation with Owen and we came to the following proposal:

- While the usability of the UI is enhanced by displaying a one-column per queue / job scheduling attribute, in the interest of simplicity, we are proposing to display the information as a single string in a single column.
- This information would be available via a {{toString()}} API on the SchedulerInfo object proposed by Owen above.
-- One of the most important reasons to do it this way is to keep in mind that this information needs to be consumed by the CLI too, which should be transferred on wire. Passing something like a map is going to be tricky for the framework.
-- Also, as seen from discussions above, requiring additional APIs to determine the column order etc become unnecessary if we assume the scheduler will take care of formatting the string in the scheduler info as it pleases. This makes the API simpler.
- Regarding getting the scheduler info per queue, Owen proposed adding this to the {{QueueManager}} class being discussed in HADOOP-3698. Something like:
{code}
class QueueManager {
  public void setSchedulingInfo(String queue, Object queueInfo);
  public Object getSchedulingInfo(String queue);
}
{code}



> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623575#action_12623575 ] 

sreekanth edited comment on HADOOP-3930 at 8/18/08 11:42 PM:
--------------------------------------------------------------------------

Attaching patch with adding API's in the TaskScheduler to expose scheduling information related to it.

Added following methods to TaskScheduler:

Map<String,String> getSchedulingInfo(JobInProgress job) - Returns map containing scheduling information related to a particular job
Map<String,String> getSchedulingInfo(String queueName ) - Returns map containing scheduling information related to a particular queue.
Collection<JobInProgress> getJobs(String queue) - Returns a list of jobs for particular queue
List<String> getQueues()  - Returns all the queues which scheduler uses.

List<String> getQueueSchedulingParameterList() - Returns ordered List of the scheduling parameters related to queues.
List<String> getJobSchedulingParameterList() - Returns ordered List of the scheduling parameters related to a particular Job

The above two methods were introduced, to determine the the order in which the columns in a table have to be generated by the web UI.

A new method was introduced in JobTracker:
TaskScheduler getTaskScheduler() - Returns the instance of task scheduler which is used by JobTracker.

JobQueueTaskScheduler and LimitTasksPerJobTaskScheduler have been modified to implement the new API's to expose scheduling information.

Have made changes in the jobtracker.jsp to do the following:

Create a new section called Scheduler information and build a table dynamically for displaying the scheduler information pertaining to queues which scheduler holds. The order of the column is determined by value returned from getQueueSchedulingParameterList(). 
Created sections in the Job Table generation for displaying scheduling information pertaining to the particular job. The order of the column is determined by value returned from getJobSchedulingParameterList (). 
 
If a particular scheduler returns null for getQueueSchedulingParameterList, then the new section called Scheduler information is not displayed in the jobtracker.jsp
If a particular scheduler returns null for the getSchedulingInfo(JobInProgress job) then no new section is added on to the Job Table.


Any thoughts on improving the above approach

      was (Author: sreekanth):
    Attaching patch with adding API's in the TaskScheduler to expose scheduling information related to it.

Added following methods to TaskScheduler:

Map<String,String> getSchedulingInfo(JobInProgress job) - Returns map containing scheduling information related to a particular job
Map<String,String> getSchedulingInfo(String queueName ) - Returns map containing scheduling information related to a particular queue.
Collection<JobInProgress> getJobs(String queue) - Returns a list of jobs for particular queue
List<String> getQueues()  - Returns all the queues which scheduler uses.

List<String> getQueueSchedulingParameterList() - Returns ordered List of the scheduling parameters related to queues.
List<String> getJobSchedulingParameterList() - Returns ordered List of the scheduling parameters related to a particular Job

The above two methods were introduced, to determine the the order in which the columns in a table have to be generated by the web UI.

A new method was introduced in JobTracker:
TaskScheduler getTaskScheduler() - Returns the instance of task scheduler which is used by JobTracker.

JobQueueTaskScheduler and LimitTasksPerJobTaskScheduler have been modified to implement the new API's to expose scheduling information.

Have made changes in the jobtracker.jsp to do the following:

Create a new section called Scheduler information and build a table dynamically for displaying the scheduler information pertaining to queues which scheduler holds. The order of the column is determined by value returned from getQueueSchedulingParameterList(). 
Created sections in the Job Table generation for displaying scheduling information pertaining to the particular job. The order of the column is determined by value returned from getJobSchedulingParameterList (). 
 
If a particular scheduler returns null for getQueueSchedulingParameterList, then the new section called Scheduler information is not displayed in the jobtracker.jsp
If a particular scheduler returns null for the getSchedulingInfo(JobInProgress job) then no new section is added on to the Job Table.
  
> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-3930:
----------------------------------

    Description: 
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
* public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
* public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
* public List<String> getQueues();

  was:
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
* public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.
* public List<String> getQueues();


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621877#action_12621877 ] 

Matei Zaharia commented on HADOOP-3930:
---------------------------------------

Making queues explicit makes sense for the purposes of getSchedulingInfo then. As for what it should do when applied to a job, in the fair scheduler at least we can have it show the job's fair share of map slots / reduce slots and its weight in the fair sharing calculations. This was useful both for debugging and for letting administrators understand the effects of putting jobs in a particular pool, changing their priority, etc.

Regarding the comparator, I made it that because Owen/Sameer/Arun wanted to also be able to compare a subset of the jobs, for example to be able to filter jobs by user or something of that sort. With a comparator, you choose your subset as you wish and then sort it. (In all this I'm assuming that the JobTracker or JobQueueManager knows the full list of jobs and can therefore filter it.) However, it would also be possible to return the whole job list and filter it afterwards - which one is easier?

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631721#action_12631721 ] 

Sreekanth Ramakrishnan commented on HADOOP-3930:
------------------------------------------------

In addition to the above changes, following also has been modified according to Hemanth's comments:

- Modified _bin/hadoop_ usage to display information about new option(./hadoop queue) introduced.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala reassigned HADOOP-3930:
----------------------------------------

    Assignee: Sreekanth Ramakrishnan

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-8.patch

Incorporating  comments from Hemanth:

- Add javadoc for _JobQueueClient_
- Refactor common code to a new class rather than using static methods in _JobClient_
- Apache license header for _JobQueueClient_

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-3930.
-----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

I just committed this. Thanks, Sreekanth.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-11.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631876#action_12631876 ] 

Owen O'Malley commented on HADOOP-3930:
---------------------------------------

This is getting close, but I have a few suggestions.

When I asked you to split the queue queries out of JobClient, I didn't think about the API. I think the API is better in JobClient and JobQueueClient is only about the main that supports the cli commands. JobQueueClient shouldn't be a public class, because otherwise it ends up in the public API. So the API access is still through JobClient and JobQueueClient implements Tool, etc.

Let's make JobSubmissionProtocol.getJobQueueInfos to getQueues(). getJobQueueInfo should be getQueueInfo(queueName).

The methods in JobQueueClient should be public, moved to JobClient, and renamed:
{quote}
getAllQueueSchedulingInfo -> JobClient.getQueues()
getAllJobs -> JobClient.getJobsFromQueue(queueName)
getQueueSchedulingInfo -> JobClient.getQueueInfo(queueName)
{quote}

mapred.JSPUtil should *not* be public.

Several of the new public API classes and methods are missing javadoc.

JobQueueInfo.schedulerInfo should be a string, rather than an object. Since the serialization forces it to be a string, it should just be typed/stored that way. The QueueManager should probably have a map like:
{code}
  Map<String, Object> schedulerInfo; // map from queue name to scheduler specific object
{code}
and just create the JobQueueInfo when the JobSubmissionProtocol methods are called. The constructor should take the two strings and don't bother with the setSchedulerInfo.

I'm not very happy with ClientUtil. It seems like a weak abstraction. Is it really necessary, especially if you fold back into JobClient?




> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632263#action_12632263 ] 

Sreekanth Ramakrishnan commented on HADOOP-3930:
------------------------------------------------

I am pasting inline the output of _ant test-patch_ on my local machine.
{noformat}
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

{noformat}

_ant test-core_ and _ant test-patch_ did not generate any build failure, on today's trunk on my local machine


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-11.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-2.patch

Attaching a patch with following changes according to Owens and Hemanth's comments:
Added following method to _JobSubmissionProtocol_

{code}
public JobQueueInfo getJobQueueInfo(String queue);
public JobQueueInfo[] getJobQueueInfos();
public JobStatus[] getAllJobs(String queue);
{code}

Added a new method to _TaskScheduler_

{code}
public abstract Collection<JobInProgress> getJobs(String queueName);
{code}

Added a new class to encapsulate the Scheduling information related to Job Queues :: _JobQueueInfo_

Added new jsp page to display queue details and list of jobs held by the queue along with the Queue Scheduling Information:  _jobqueue_details.jsp_

Refactored Job Table generation into a new class in _org.apache.hadoop.mapred.JSPUtil_

Added new command line options in the _JobClient.java_

Currently the patch has no test case attached alongwith it. Would be attaching them soon.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-3930:
----------------------------------

    Description: 
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.

My first-order proposal at an API: Augment the TaskScheduler with

* public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI.
* public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

  was:
We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
* A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
* Some sorting order for jobs - maybe a method to sort a list of jobs.

Let's figure out what the best way to do this is and implement it in the existing schedulers.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator that can be used to determine the order in which jobs will be run, for sorting the jobs in the CLI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631340#action_12631340 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

JobTracker:
- getAllJobs: if the scheduler returns null, it should return an empty JobStatus array.
- There's code being repeated in getAllJobs(), getAllJobs(String queue) and jobsToComplete. I think it should be factored out so changes to one of the methods (for e.g. to return a new field) need not be duplicated.

JobQueueInfo:
- schedulingInfo stored here is a stringified version. I think it should be declared a String and get/set should deal with strings. The caller should basically call with actualObject.toString(). This makes it similar to JobStatus.
- In JobStatus, we are using Text.readString whereas in JobQueueInfo, we are using readUTF. I think in similar cases elsewhere we use the UTF versions. Similar comments for the write APIs.

JspUtil:
- This is including JspHelper which is a class from the NameNode package. I don't think it is a good idea for a MapRed class to depend on this, however I understand this has always been this way. Maybe we should file a new JIRA to fix it.

JobSubmissionProtocol:
- Include HADOOP JIRA number in the comment related to version field.

JobClient:
- Usage prints: [-queueinfo <job-queue-name> [-showJobs] - this is missing a closing ']'
- Return code should be set to 0 when the command syntax is found to be correct.
- Since scheduler information is set to empty, it can never be null. I think in any case, it should print something like: 
{code}
Queue Name: default
Scheduling Information: N/A
{code}
- The line "Job List for the queue ::" needs a newline. Also, I think it can just read "Job list:"

jobqueue_details.jsp:
- Needs a backlink to the main jobtracker page
- Needs a link to Hadoop web page - like in other pages.

jobtracker.jsp:
- The scheduling info column is not being split into rows. The HTML code generated does look fine. But still it is not showing up. Can you please check ?

CapacityTaskScheduler:
- Does not need supportsPriority as a separate field in the SchedulingInfo class. You can pick it up from one of the QueueSchedulingInfo objects.
- guaranteedCapacity actual must be split between reduce and map slots. Currently, only the value for the map is being displayed.
- Number of reclaimed resources is an internal variable and does not need to be displayed.
- Rename getQSI to getQueueSchedulingInfo

TestJobQueueInformation:
- I think you can use JobClient, instead of directly dealing with JobSubmissionProtocol and having to duplicate the methods for createRPCProxy etc.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624894#action_12624894 ] 

Owen O'Malley commented on HADOOP-3930:
---------------------------------------

The framework would also need to handle null values in the scheduler info. Also note that this will replace the map in all of the schedulers that looks like:

{code}
Map<JobInProgress, JobInfo> infos = ...
{code}

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Affects Version/s: 0.17.2
         Release Note: Changes to TaskScheduler to expose API which Web UI and Command Line Tool can use
               Status: Patch Available  (was: Open)

Attaching patch with adding API's in the TaskScheduler to expose scheduling information related to it.

Added following methods to TaskScheduler:

Map<String,String> getSchedulingInfo(JobInProgress job) - Returns map containing scheduling information related to a particular job
Map<String,String> getSchedulingInfo(String queueName ) - Returns map containing scheduling information related to a particular queue.
Collection<JobInProgress> getJobs(String queue) - Returns a list of jobs for particular queue
List<String> getQueues()  - Returns all the queues which scheduler uses.

List<String> getQueueSchedulingParameterList() - Returns ordered List of the scheduling parameters related to queues.
List<String> getJobSchedulingParameterList() - Returns ordered List of the scheduling parameters related to a particular Job

The above two methods were introduced, to determine the the order in which the columns in a table have to be generated by the web UI.

A new method was introduced in JobTracker:
TaskScheduler getTaskScheduler() - Returns the instance of task scheduler which is used by JobTracker.

JobQueueTaskScheduler and LimitTasksPerJobTaskScheduler have been modified to implement the new API's to expose scheduling information.

Have made changes in the jobtracker.jsp to do the following:

Create a new section called Scheduler information and build a table dynamically for displaying the scheduler information pertaining to queues which scheduler holds. The order of the column is determined by value returned from getQueueSchedulingParameterList(). 
Created sections in the Job Table generation for displaying scheduling information pertaining to the particular job. The order of the column is determined by value returned from getJobSchedulingParameterList (). 
 
If a particular scheduler returns null for getQueueSchedulingParameterList, then the new section called Scheduler information is not displayed in the jobtracker.jsp
If a particular scheduler returns null for the getSchedulingInfo(JobInProgress job) then no new section is added on to the Job Table.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.17.2
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631449#action_12631449 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

Actually going over some of the comments above, I see this comment from Owen:

bq. JobSubmissionProtocol is not public and therefore the JobClient needs identical methods.

So, this agrees with what I've proposed above. In fact, we should make the APIs public and not package private.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-3930:
----------------------------------

          Component/s: mapred
             Priority: Major  (was: Minor)
    Affects Version/s:     (was: 0.17.2)
                       0.19.0

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-3930:
-------------------------------------------

    Attachment: HADOOP-3930-11.patch

Adding Apache license to _TestJobQueueInformation_ and _JSPUtil_ according to Hemanth's comment.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.0
>
>         Attachments: 3930-1.patch, HADOOP-3930-10.patch, HADOOP-3930-11.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, HADOOP-3930-5.patch, HADOOP-3930-6.patch, HADOOP-3930-7.patch, HADOOP-3930-8.patch, HADOOP-3930-9.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622443#action_12622443 ] 

Matei Zaharia commented on HADOOP-3930:
---------------------------------------

Makes sense, I've changed it to that.

> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.