You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2008/09/16 13:18:45 UTC
[jira] Commented: (HADOOP-3930) Decide how to integrate scheduler info into CLI and job tracker web page

    [ https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631340#action_12631340 ] 

Hemanth Yamijala commented on HADOOP-3930:
------------------------------------------

JobTracker:
- getAllJobs: if the scheduler returns null, it should return an empty JobStatus array.
- There's code being repeated in getAllJobs(), getAllJobs(String queue) and jobsToComplete. I think it should be factored out so changes to one of the methods (for e.g. to return a new field) need not be duplicated.

JobQueueInfo:
- schedulingInfo stored here is a stringified version. I think it should be declared a String and get/set should deal with strings. The caller should basically call with actualObject.toString(). This makes it similar to JobStatus.
- In JobStatus, we are using Text.readString whereas in JobQueueInfo, we are using readUTF. I think in similar cases elsewhere we use the UTF versions. Similar comments for the write APIs.

JspUtil:
- This is including JspHelper which is a class from the NameNode package. I don't think it is a good idea for a MapRed class to depend on this, however I understand this has always been this way. Maybe we should file a new JIRA to fix it.

JobSubmissionProtocol:
- Include HADOOP JIRA number in the comment related to version field.

JobClient:
- Usage prints: [-queueinfo <job-queue-name> [-showJobs] - this is missing a closing ']'
- Return code should be set to 0 when the command syntax is found to be correct.
- Since scheduler information is set to empty, it can never be null. I think in any case, it should print something like: 
{code}
Queue Name: default
Scheduling Information: N/A
{code}
- The line "Job List for the queue ::" needs a newline. Also, I think it can just read "Job list:"

jobqueue_details.jsp:
- Needs a backlink to the main jobtracker page
- Needs a link to Hadoop web page - like in other pages.

jobtracker.jsp:
- The scheduling info column is not being split into rows. The HTML code generated does look fine. But still it is not showing up. Can you please check ?

CapacityTaskScheduler:
- Does not need supportsPriority as a separate field in the SchedulingInfo class. You can pick it up from one of the QueueSchedulingInfo objects.
- guaranteedCapacity actual must be split between reduce and map slots. Currently, only the value for the map is being displayed.
- Number of reclaimed resources is an internal variable and does not need to be displayed.
- Rename getQSI to getQueueSchedulingInfo

TestJobQueueInformation:
- I think you can use JobClient, instead of directly dealing with JobSubmissionProtocol and having to duplicate the methods for createRPCProxy etc.


> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3930
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: 3930-1.patch, HADOOP-3930-2.patch, HADOOP-3930-3.patch, HADOOP-3930-4.patch, mockup.JPG
>
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to provide info to display on the JobTracker web interface and in the CLI. The main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and in the CLI - something as simple as a single string, or a map<string, int> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of jobs.
> * public Map<String, String> getSchedulingInfo(String queue) -- returns key-value pairs which are displayed in columns on the web UI or the CLI for the list of queues.
> * public Collection<JobInProgress> getJobs(String queueName) -- returns the list of jobs in a given queue, sorted by a scheduler-specific order (the order it wants to run them in / schedule the next task in / etc).
> * public List<String> getQueues();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.