You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/05/06 02:24:30 UTC
[jira] Created: (HADOOP-5777) ResolutionMointor dies on an
exception
ResolutionMointor dies on an exception
--------------------------------------
Key: HADOOP-5777
URL: https://issues.apache.org/jira/browse/HADOOP-5777
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.18.3
Reporter: Hairong Kuang
One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:
ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1938)
at java.lang.String.substring(String.java:1905)
at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-5777) ResolutionMonitor dies on an
exception
Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan resolved HADOOP-5777.
---------------------------------
Resolution: Won't Fix
Closing issue.
> ResolutionMonitor dies on an exception
> --------------------------------------
>
> Key: HADOOP-5777
> URL: https://issues.apache.org/jira/browse/HADOOP-5777
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Hairong Kuang
> Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1938)
> at java.lang.String.substring(String.java:1905)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
> at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
> at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5777) ResolutionMonitor dies on an
exception
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5777:
----------------------------------
Description:
One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1938)
at java.lang.String.substring(String.java:1905)
at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
at java.lang.Thread.run(Thread.java:619)
was:
One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:
ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1938)
at java.lang.String.substring(String.java:1905)
at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
at java.lang.Thread.run(Thread.java:619)
Affects Version/s: (was: 0.18.3)
0.18.0
Summary: ResolutionMonitor dies on an exception (was: ResolutionMointor dies on an exception)
> ResolutionMonitor dies on an exception
> --------------------------------------
>
> Key: HADOOP-5777
> URL: https://issues.apache.org/jira/browse/HADOOP-5777
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Hairong Kuang
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1938)
> at java.lang.String.substring(String.java:1905)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
> at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
> at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5777) ResolutionMonitor dies on an
exception
Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707061#action_12707061 ]
Jakob Homan commented on HADOOP-5777:
-------------------------------------
Hairong and I determined the issue was caused by a race condition created by having lots of nodes with the same storage ID registering at the same time (due to being from cloned drives, not something that should normally happen), and the ResolutionMonitor not being properly synchronized. The network location for a particular node is reset to UNRESOLVED (empty string, "") before being passed to add, which causes the substring to fail.
Since the ResolutionMonitor is now removed, it's not worth fixing it and will close as won't fix.
> ResolutionMonitor dies on an exception
> --------------------------------------
>
> Key: HADOOP-5777
> URL: https://issues.apache.org/jira/browse/HADOOP-5777
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Hairong Kuang
> Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1938)
> at java.lang.String.substring(String.java:1905)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
> at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
> at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-5777) ResolutionMonitor dies on an
exception
Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan reassigned HADOOP-5777:
-----------------------------------
Assignee: Jakob Homan
> ResolutionMonitor dies on an exception
> --------------------------------------
>
> Key: HADOOP-5777
> URL: https://issues.apache.org/jira/browse/HADOOP-5777
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Hairong Kuang
> Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1938)
> at java.lang.String.substring(String.java:1905)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
> at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
> at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5777) ResolutionMointor dies on an
exception
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706269#action_12706269 ]
Hairong Kuang commented on HADOOP-5777:
---------------------------------------
ResolutionMonitor was removed by HADOOP-3620 in 0.19. So it may not worth fixing the dead thread problem. But it is still worth figuring out what caused the StringIndexOutofBoundsException.
> ResolutionMointor dies on an exception
> --------------------------------------
>
> Key: HADOOP-5777
> URL: https://issues.apache.org/jira/browse/HADOOP-5777
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1938)
> at java.lang.String.substring(String.java:1905)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
> at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
> at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
> at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.