You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2009/05/07 21:38:45 UTC

[jira] Commented: (HADOOP-5777) ResolutionMonitor dies on an exception

    [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707061#action_12707061 ] 

Jakob Homan commented on HADOOP-5777:
-------------------------------------

Hairong and I determined the issue was caused by a race condition created by having lots of nodes with the same storage ID registering at the same time (due to being from cloned drives, not something that should normally happen), and the ResolutionMonitor not being properly synchronized.  The network location for a particular node is reset to UNRESOLVED (empty string, "") before being passed to add, which causes the substring to fail.

Since the ResolutionMonitor is now removed, it's not worth fixing it and will close as won't fix.

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.