You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2009/03/20 21:26:50 UTC

[jira] Created: (HADOOP-5548) Observed negative running maps on the job tracker

Observed negative running maps on the job tracker
-------------------------------------------------

                 Key: HADOOP-5548
                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Owen O'Malley


We saw in both the web/ui and cli tools:

{{
Cluster Summary (Heap Size is 11.7 GB/13.37 GB)

Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
              Submissions        Capacity   Capacity   Tasks/Node Nodes
-971  0       133         434     1736        1736      8.00        0
}}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-5548:
----------------------------------

    Description: 
We saw in both the web/ui and cli tools:

{noformat}
Cluster Summary (Heap Size is 11.7 GB/13.37 GB)

Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
              Submissions        Capacity   Capacity   Tasks/Node Nodes
-971  0       133         434     1736        1736      8.00        0
{noformat}



  was:
We saw in both the web/ui and cli tools:

{{
Cluster Summary (Heap Size is 11.7 GB/13.37 GB)

Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
              Submissions        Capacity   Capacity   Tasks/Node Nodes
-971  0       133         434     1736        1736      8.00        0
}}




> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696479#action_12696479 ] 

Amar Kamat commented on HADOOP-5548:
------------------------------------

Looked at the patch. Looks like there is no other way to this. There are cyclic calls from jobtracker->jobinprogress->taskinprogress->jobtracker. There is no documentation today to indicate this. For now we can go with Amareshwari's patch and provide a quick fix. Later on in another jira we can get rid of these implicit assumptions. +1.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12685413#action_12685413 ] 

Devaraj Das commented on HADOOP-5548:
-------------------------------------

The recovery manager processes one job at a time. During the recovery, the updateTaskTrackerStatus would create dummy TaskTrackerStatus objects with 0 map/reduce slots for the trackers. Also, the IPC server is not up until the recovery manager returns. I don't think the negative value for maps in the cluster summary has anything to do with the recovery process. This has been observed before and reported - HADOOP-5231.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696510#action_12696510 ] 

Hadoop QA commented on HADOOP-5548:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404804/patch-5548-2.txt
  against trunk revision 762509.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/159/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/159/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/159/console

This message is automatically generated.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Fix Version/s: 0.20.0
                   0.19.2
           Status: Patch Available  (was: Open)

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Status: Patch Available  (was: Open)

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Attachment: patch-5548.txt

Patch fixing -ve numbers in cluster summary, and adding synchronization for JobTracker methods in RecoveryManager.

-ve numbers are reproducible on cluster by running MRReliability test 95% of the time. Patch has been tested by running 4 runs of MRReliability on 100 node clusters. And we dont see -ve numbers anymore.

test-patch result:
{noformat}
     [exec]
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
{noformat}
It is difficult to write unit-test for this.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>         Attachments: patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688276#action_12688276 ] 

Hemanth Yamijala commented on HADOOP-5548:
------------------------------------------

Just FYI, the fair scheduler does not use the cluster status to find the total running maps / reduces. This is a different problem. But fortunately, this issue will not result in underutilization when used with the fair scheduler.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Attachment: patch-5548-1.txt

Patch incorporating Devaraj's comments for synchronization.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5548:
-----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.19.2)
     Release Note: Adds synchronization for JobTracker methods in RecoveryManager.
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks Amareshwari!

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688096#action_12688096 ] 

Owen O'Malley commented on HADOOP-5548:
---------------------------------------

It should still lock the JobTracker, since it violates the assumption of the method and could easily be broken during later maintenance. 

We still need an explanation of what is going on.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696055#action_12696055 ] 

Sharad Agarwal commented on HADOOP-5548:
----------------------------------------

unfortunately the patch doesn't apply to trunk now.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688271#action_12688271 ] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

As Devaraj pointed out, problem is not with JobTracker restart.
In JobTracker, TaskTrackerStatus is cached in {{taskTrackers}} and is supposed to be read-only. But it is passed to updateTaskStatuses() method, in which task reports (TaskStatus objects) are passed to JobInProgress. In JobInProgress.updaTaskStatuses() and tip.updateStatus(), the TaskStatus object is getting modified.
The code in TaskInProgress modifying the TaskStatus reference :
{code}
    if (!isCleanupAttempt(taskid)) {
      taskStatuses.put(taskid, status);
    } else {
      taskStatuses.get(taskid).statusUpdate(status.getRunState(),
        status.getProgress(), status.getStateString(), status.getPhase(),
        status.getFinishTime());
    }
{code}

This could make total count negative in following scenario:
Tracker1 reported a task *t_0* is KILLED_UNCLEAN. 
Tracker2 is given the cleanup attempt for t_0.
Tracker2 reports saying it is running cleanup attempt t_0. Updates taskStatuses object,  which is holding TaskStatus object from tracker1's status.
JT calculates total count assuming the task is run by both the trackers, thus leading to negative totals.

Cloning TaskStatus object and passing to JIP looks like the correct solution. Thoughts?

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696572#action_12696572 ] 

Nigel Daley commented on HADOOP-5548:
-------------------------------------

{quote}
Later on in another jira we can get rid of these implicit assumptions
{quote}

Amar, please file this Jira now so the issues doesn't get totally lost.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696039#action_12696039 ] 

Amar Kamat commented on HADOOP-5548:
------------------------------------

+1. Changes look fine to me.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695240#action_12695240 ] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

Hudson is not able to run core tests for some reason. All core tests passed on my machine.
Patch adds synchronization to RecoveryManager. Existing tests TestJobTrackerRestart, TestRecoveryManager will test the code changes.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688637#action_12688637 ] 

Amar Kamat commented on HADOOP-5548:
------------------------------------

Changes look good to me. 

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Status: Open  (was: Patch Available)

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696672#action_12696672 ] 

Hudson commented on HADOOP-5548:
--------------------------------

Integrated in Hadoop-trunk #800 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/800/])
    . Add synchronization for JobTracker methods in RecoveryManager. Contributed by Amareshwari Sriramadasu.


> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695237#action_12695237 ] 

Hadoop QA commented on HADOOP-5548:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404501/patch-5548-1.txt
  against trunk revision 761482.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/106/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/106/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/106/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/106/console

This message is automatically generated.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12684016#action_12684016 ] 

Owen O'Malley commented on HADOOP-5548:
---------------------------------------

My current guess is that the JobTracker went through recovery with running jobs. When looking at the callers to JobTracker.updateTaskTrackerStatus, which assums that the JobTracker is locked by the caller, the call tree from RecoverManager.recover doesn't ever lock the job tracker.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu reassigned HADOOP-5548:
-----------------------------------------------

    Assignee: Amareshwari Sriramadasu

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-5548:
----------------------------------

    Priority: Blocker  (was: Major)

Forgot to mark this a blocker.

The number of running + pending tasks is used to determine the right level of loading for the cluster. If it is negative, the cluster will be under-utilized. 

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696427#action_12696427 ] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

test-patch and  ant test passed on machine.
The same patch applies to branch 0.20 as well.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-5548:
--------------------------------

    Status: Open  (was: Patch Available)

This patch doesn't look like the right fix w.r.t the JobTracker locking in the JobRecovery methods. For example, the place where the call to updateTaskTrackerStatus is made, it locks only the taskTrackers object, but it should be locking JobTracker and trackerExpiryQueue as well. This is in line with keeping future maintenance in mind as Owen had pointed out.
For now, what I will do is that I will commit a portion of the patch that handles the negative counts of maps/reduces, and commit that part as part of HADOOP-5231.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Attachment: patch-5548-2.txt

Patch updated with trunk.
Patch also adds JIP and TIP update statuses under JobTracker lock.



> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548-2.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5548:
--------------------------------------------

    Status: Patch Available  (was: Open)

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548-1.txt, patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688649#action_12688649 ] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

All unit tests passed on my machine. Also Ran Sort benchmark with the patch.

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: patch-5548.txt
>
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.