You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/06/12 19:04:35 UTC
[jira] Commented: (HADOOP-1486) ReplicationMonitor thread goes away
[ https://issues.apache.org/jira/browse/HADOOP-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503926 ]
dhruba borthakur commented on HADOOP-1486:
------------------------------------------
The Replication Monitor got a Runtime Exception as described in HADOOP-1232. The Namenode server threads do not catch RuntimeExceptions. The real fix is to find the cause of HADOOP-1232, but there are a few additional things that we can do to address in this issue:
1. Make the namenode exit when a system thread encounters a RuntimeException. Have another deamon that monitors HDFS processes and restarts them if they die.
2. Make the namenode fall into safemode when a system thread encounters a runtime exception.
3. Make the namenode exit when a system thread encounters a RuntimeException. It will remain dead until administrator manually intervenes.
I prefer option 1.
> ReplicationMonitor thread goes away
> ------------------------------------
>
> Key: HADOOP-1486
> URL: https://issues.apache.org/jira/browse/HADOOP-1486
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.3
> Reporter: Koji Noguchi
> Fix For: 0.14.0
>
>
> Saw many over/under replicated blocks in fsck output.
> .out file showed
> Exception in thread "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@2785982c" java.lang.IllegalArgumentException: Unexpected non-existing data node: /99.9.99.0/99.9.99.42:99999
> at org.apache.hadoop.net.NetworkTopology.checkArgument(NetworkTopology.java:379)
> at org.apache.hadoop.net.NetworkTopology.isOnSameRack(NetworkTopology.java:424)
> at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2853)
> at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2816)
> at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2658)
> at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1774)
> at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1723)
> at java.lang.Thread.run(Thread.java:619)
> (same as HADOOP-1232)
> And, jstack showed no ReplicationMonitor thread.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.