You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ahmed Radwan (JIRA)" <ji...@apache.org> on 2012/07/19 10:34:35 UTC
[jira] [Commented] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418165#comment-13418165 ] 

Ahmed Radwan commented on MAPREDUCE-4346:
-----------------------------------------

Thanks Arun,

Sorry, it took me sometime to get back to this. Based on your above reference to MR2, I discussed it more with tucu offline. I wanted to reassess the situation and understand more the difficulty you are referring to.

In MR2, we already have getAllJobs() that returns all jobs in any statuses. So to support the new refined version (similar to the MR1 version proposed here), we have two options for filtering this list:

* 1) Client-side filtering: The new implementation will just call getAllJobs() and the list will be filtered in the JobClient. Obviously, this option is just providing the required compatibility without removing the overhead we discussed earlier. So, I wouldn't prefer this option.

* 2) Resource Manager filtering: Currently, getAllJobs() in MR2 uses the TypeConverter to convert the whole list of returned jobs from List<ApplicationReport> to JobStatus[] to be compatible with MR1. So to be able to filter this list and avoid doing this type conversion on the server-side, we can have the JobClient do this conversion before sending the request to the resource manager.

Independent of this refined version of getAllJobs(), there is also more stuff that need to be done in MR2 in this context, like:

* Dealing with retiredJobs and separately getting them from the history server (if requested).
* Dealing with different applications types, since it doesn't make sense for an MR client to get statuses for distributed shell jobs, or other types of applications that are submitted by other types of clients, etc.

I'll file a separate jira for these MR2 changes, and will work on a patch for it. Please let me know if you have any comments or considerations for this route.
                
> Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4346
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>         Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch
>
>
> The current implementation for JobTracker.getAllJobs() returns all submitted jobs in any state, in addition to retired jobs. This list can be long and represents an unneeded overhead especially in the case of clients only interested in jobs in specific state(s). 
> It is beneficial to include a refined version where only jobs having specific statuses are returned and retired jobs are optional to include. 
> I'll be uploading an initial patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira