You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Suresh Srinivas (JIRA)" <ji...@apache.org> on 2009/04/10 00:07:13 UTC

[jira] Created: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Namenode log that indicates why it is not leaving safemode may be confusing
---------------------------------------------------------------------------

                 Key: HADOOP-5650
                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Suresh Srinivas
            Assignee: Suresh Srinivas
            Priority: Minor


A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
{{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}

With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-5650:
------------------------------------

    Attachment: 5650.patch

Incorporating changes suggested by Nicholas

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5650:
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
           Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Suresh!

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699940#action_12699940 ] 

Hadoop QA commented on HADOOP-5650:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12405479/5650.patch
  against trunk revision 765713.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/202/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/202/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/202/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/202/console

This message is automatically generated.

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5650:
-------------------------------------------

    Component/s: dfs

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-5650:
------------------------------------

    Attachment: 5650.patch

The text in Namenode UI, logs and text in {{SafeModeException}} are changed as follows:
* Old: The ratio of reported blocks 0.0000 has not reached the threshold 1.0000. Safe mode will be turned off automatically.
* New: The reported blocks 0 needs additional 5176 blocks to reach the threshold 1.0000 of total blocks 5176. Safe mode will be turned off automatically.

* Old: The ratio of reported blocks 2.0000 has reached the threshold 1.0000. Safe mode will be turned off automatically in 29 seconds.
* New: The reported blocks 5176 has reached the threshold 1.0000 of total blocks 5176. Safe mode will be turned off automatically in 29 seconds.



> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698199#action_12698199 ] 

dhruba borthakur commented on HADOOP-5650:
------------------------------------------

This fix will be immensely helpful!

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698992#action_12698992 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5650:
------------------------------------------------

+1 patch looks good.

A nit: seems to me that the following is more clear.
{code}
    boolean needEnter() {
      return threshold != 0 && blockSafe < blockThreshold;
    }
{code}

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-5650:
------------------------------------

    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699950#action_12699950 ] 

Suresh Srinivas commented on HADOOP-5650:
-----------------------------------------

No unit tests added since this is mainly change in how the message is printed in the logs and UI. The testcases below are failing for other tests previously run by Hudson. (For example it happened for 5589):
org.apache.hadoop.mapred.TestQueueCapacities.testSingleQueue
org.apache.hadoop.mapred.TestQueueCapacities.testSingleQueueMultipleJobs
org.apache.hadoop.mapred.TestQueueCapacities.testMultipleQueues
org.apache.hadoop.mapred.TestTaskFail.testWithDFS
org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts
org.apache.hadoop.mapred.TestMRServerPorts.testTaskTrackerPorts


> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5650) Namenode log that indicates why it is not leaving safemode may be confusing

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700723#action_12700723 ] 

Hudson commented on HADOOP-5650:
--------------------------------

Integrated in Hadoop-trunk #811 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/811/])
    . Fix safemode messages in the Namenode log.  Contributed by Suresh Srinivas


> Namenode log that indicates why it is not leaving safemode may be confusing
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5650
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: 5650.patch, 5650.patch
>
>
> A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
> {{The ratio of reported blocks 1.0000 has not reached the threshold 1.0000}}
> With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.