You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Amar Kamat (Created) (JIRA)" <ji...@apache.org> on 2012/01/31 06:23:10 UTC

[jira] [Created] (MAPREDUCE-3769) [Gridmix] Improve the way job monitor maintains running jobs

[Gridmix] Improve the way job monitor maintains running jobs
------------------------------------------------------------

                 Key: MAPREDUCE-3769
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3769
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/gridmix
    Affects Versions: 0.24.0
            Reporter: Amar Kamat
             Fix For: 0.23.1, 0.24.0


Gridmix maintains a list (L) of running jobs via {{JobMonitor}}. As soon as a job is submitted, a handle for that job is cached inside the {{JobMonitor}}. The {{JobMonitor}} does the following in a thread:
{code}
1. remove the first job in the list, say j
2. if j is complete :
     goto #1.
   else :
     add j to the end of the list L. 
     sleep for 5 seconds. 
     goto #1.
{code}

Gridmix STRESS mode logic uses the list L to compute the cluster load. It iterates over map/reduce progress of each and every job in L to figure out the pending+running task count. We need to investigate and optimize the {{JobMonitor}} algorithm and make sure that the total number of completed jobs in L is minimum. The overhead of polling for the map and reduce task progress of a completed job is pretty high as it incurs an additional (RPC) step of contacting the JobHistory server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3769) [Gridmix] Improve the way job monitor maintains running jobs

Posted by "Arun C Murthy (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3769:
-------------------------------------

    Fix Version/s:     (was: 0.23.1)
                   0.24.0
    
> [Gridmix] Improve the way job monitor maintains running jobs
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-3769
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3769
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/gridmix
>    Affects Versions: 0.24.0
>            Reporter: Amar Kamat
>              Labels: gridmix, job-monitor
>             Fix For: 0.24.0
>
>
> Gridmix maintains a list (L) of running jobs via {{JobMonitor}}. As soon as a job is submitted, a handle for that job is cached inside the {{JobMonitor}}. The {{JobMonitor}} does the following in a thread:
> {code}
> 1. remove the first job in the list, say j
> 2. if j is complete :
>      goto #1.
>    else :
>      add j to the end of the list L. 
>      sleep for 5 seconds. 
>      goto #1.
> {code}
> Gridmix STRESS mode logic uses the list L to compute the cluster load. It iterates over map/reduce progress of each and every job in L to figure out the pending+running task count. We need to investigate and optimize the {{JobMonitor}} algorithm and make sure that the total number of completed jobs in L is minimum. The overhead of polling for the map and reduce task progress of a completed job is pretty high as it incurs an additional (RPC) step of contacting the JobHistory server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3769) [Gridmix] Improve the way job monitor maintains running jobs

Posted by "Amar Kamat (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-3769:
----------------------------------

    Priority: Minor  (was: Major)

We have worked around this issue by enhancing other parts of Gridmix and making the monitor multi-threaded. See MAPREDUCE-1687 and MAPREDUCE-3787 for more details. Since there is some scope of future enhancements, I will keep this ticket open but lower its priority.
                
> [Gridmix] Improve the way job monitor maintains running jobs
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-3769
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3769
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/gridmix
>    Affects Versions: 0.24.0
>            Reporter: Amar Kamat
>            Priority: Minor
>              Labels: gridmix, job-monitor
>             Fix For: 0.24.0
>
>
> Gridmix maintains a list (L) of running jobs via {{JobMonitor}}. As soon as a job is submitted, a handle for that job is cached inside the {{JobMonitor}}. The {{JobMonitor}} does the following in a thread:
> {code}
> 1. remove the first job in the list, say j
> 2. if j is complete :
>      goto #1.
>    else :
>      add j to the end of the list L. 
>      sleep for 5 seconds. 
>      goto #1.
> {code}
> Gridmix STRESS mode logic uses the list L to compute the cluster load. It iterates over map/reduce progress of each and every job in L to figure out the pending+running task count. We need to investigate and optimize the {{JobMonitor}} algorithm and make sure that the total number of completed jobs in L is minimum. The overhead of polling for the map and reduce task progress of a completed job is pretty high as it incurs an additional (RPC) step of contacting the JobHistory server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira