You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Bhallamudi Venkata Siva Kamesh (Created) (JIRA)" <ji...@apache.org> on 2011/10/13 16:28:12 UTC

[jira] [Created] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Capacity Schedular shows incorrect cluster information in the RM logs
---------------------------------------------------------------------

                 Key: MAPREDUCE-3178
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/capacity-sched
    Affects Versions: 0.24.0
            Reporter: Bhallamudi Venkata Siva Kamesh


When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.

I have encountered this issue in a pseudo cluster mode and steps to reproduce are

1) start the YARN cluster
2) stop a NM and start the NM again (in a quick session)

There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
 
If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 

After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.

Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Arun C Murthy (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned MAPREDUCE-3178:
----------------------------------------

    Assignee: Bhallamudi Venkata Siva Kamesh

Bala, this is a good to fix bug. Can you pls add a unit test? Tx.
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Devaraj K (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132376#comment-13132376 ] 

Devaraj K commented on MAPREDUCE-3178:
--------------------------------------

Hi Kamesh/Arun,

{code}
+        int time = conf.getInt(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
+            YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
+        String msg = "Duplicate registration from the node!";
+        LOG.info(msg + " Waiting " + time + " ms, for registration.");
+        try {
+          Thread.sleep(time);
+        } catch (InterruptedException e) {
+        }
{code}

I think it is not a good idea to make the registration process sleep for YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS time when the node manager goes down and comes up before the expiry interval. By default this value is 10 mins. During this time node manager will not be able to serve any request.

I also saw the same issue for any scheduler and commented the same in MAPREDUCE-3070, trying to solve as part of that. 
https://issues.apache.org/jira/browse/MAPREDUCE-3070?focusedCommentId=13125711&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13125711
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Blocker
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Bhallamudi Venkata Siva Kamesh (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132311#comment-13132311 ] 

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-3178:
-----------------------------------------------------------

Thanks Arun. I Will provide a complete patch.
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Blocker
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Arun C Murthy (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3178:
-------------------------------------

    Priority: Blocker  (was: Major)
    
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Blocker
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Arun C Murthy (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-3178.
--------------------------------------

    Resolution: Duplicate

Will be fixed via MAPREDUCE-2775
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Blocker
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Mahadev konar (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated MAPREDUCE-3178:
-------------------------------------

          Component/s:     (was: contrib/capacity-sched)
                       mrv2
    Affects Version/s:     (was: 0.24.0)
                       0.23.0
    
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Bhallamudi Venkata Siva Kamesh (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bhallamudi Venkata Siva Kamesh updated MAPREDUCE-3178:
------------------------------------------------------

    Attachment: MAPREDUCE-3178.patch

I have attached a solution as a patch.Can be validated.
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Bhallamudi Venkata Siva Kamesh (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126615#comment-13126615 ] 

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-3178:
-----------------------------------------------------------

When we start the NM again, as part of the registration, the following object will be created.

{code}
      RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort,
          httpPort, resolve(host), capability);
{code}

The above code internally calls  {code} context.getDispatcher().getEventHandler().handle(new NodeAddedSchedulerEvent(this)); {code}, which calls the CS#addNode() method. Here again this node's capability will be added to the clusterResource. In the mean time, the following part of the code throws exception

{code}
      if (this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode) != null) {
        throw new IOException("Duplicate registration from the node!");
      }
{code}
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.24.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

Posted by "Arun C Murthy (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132483#comment-13132483 ] 

Arun C Murthy commented on MAPREDUCE-3178:
------------------------------------------

I missed the 'Thread.sleep'. That isn't something we shud ever do. We should just do the check early and throw, no need for the Thread.sleep.
                
> Capacity Schedular shows incorrect cluster information in the RM logs
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3178
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Blocker
>         Attachments: MAPREDUCE-3178.patch
>
>
> When we start the NM, after stopping it (in a quick session) CS shows incorrect information about clusterResource in the logs.
> I have encountered this issue in a pseudo cluster mode and steps to reproduce are
> 1) start the YARN cluster
> 2) stop a NM and start the NM again (in a quick session)
> There should be a NM running in the cluster however as I observed RM detects NM as dead, after default time since its actual unavailability(In this case NM has been stopped).
>  
> If you start your NM before this time (default time), ResourceTracker throws IOEx, however, CS adds the NM's capacity to the clusterResource. 
> After elapsed time (default time) when RM detects NM as dead, RM removes the NM and hence capacity of the cluster will be subtracted by the amount NM capacity.
> Eventually there is no NM running in the cluster, but capacity of the cluster is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira