You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/05/06 02:24:30 UTC

[jira] Created: (HADOOP-5777) ResolutionMointor dies on an exception

ResolutionMointor dies on an exception
--------------------------------------

                 Key: HADOOP-5777
                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.3
            Reporter: Hairong Kuang


One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:

ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1938)
        at java.lang.String.substring(String.java:1905)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
        at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
        at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
        at java.lang.Thread.run(Thread.java:619)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5777) ResolutionMonitor dies on an exception

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan resolved HADOOP-5777.
---------------------------------

    Resolution: Won't Fix

Closing issue.

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5777) ResolutionMonitor dies on an exception

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5777:
----------------------------------

          Description: 
One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:

ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1938)
        at java.lang.String.substring(String.java:1905)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
        at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
        at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
        at java.lang.Thread.run(Thread.java:619)



  was:
One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:

ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1938)
        at java.lang.String.substring(String.java:1905)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
        at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
        at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
        at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
        at java.lang.Thread.run(Thread.java:619)



    Affects Version/s:     (was: 0.18.3)
                       0.18.0
              Summary: ResolutionMonitor dies on an exception  (was: ResolutionMointor dies on an exception)

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5777) ResolutionMonitor dies on an exception

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707061#action_12707061 ] 

Jakob Homan commented on HADOOP-5777:
-------------------------------------

Hairong and I determined the issue was caused by a race condition created by having lots of nodes with the same storage ID registering at the same time (due to being from cloned drives, not something that should normally happen), and the ResolutionMonitor not being properly synchronized.  The network location for a particular node is reset to UNRESOLVED (empty string, "") before being passed to add, which causes the substring to fail.

Since the ResolutionMonitor is now removed, it's not worth fixing it and will close as won't fix.

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5777) ResolutionMonitor dies on an exception

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan reassigned HADOOP-5777:
-----------------------------------

    Assignee: Jakob Homan

> ResolutionMonitor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Hairong Kuang
>            Assignee: Jakob Homan
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMonitor thread died on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5777) ResolutionMointor dies on an exception

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706269#action_12706269 ] 

Hairong Kuang commented on HADOOP-5777:
---------------------------------------

ResolutionMonitor was removed by HADOOP-3620 in 0.19. So it may not worth fixing the dead thread problem. But it is still worth figuring out what caused the StringIndexOutofBoundsException. 

> ResolutionMointor dies on an exception
> --------------------------------------
>
>                 Key: HADOOP-5777
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5777
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>
> One of our dfs clusters went into an unhealthy state, where many datanodes have non-zero bytes but no rack information. It turned out the ResolutionMoinitor thread dies on an exception. Here is the stack trace of the exception that caused the problem:
> ERROR org.apache.hadoop.fs.FSNamesystem: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>         at java.lang.String.substring(String.java:1938)
>         at java.lang.String.substring(String.java:1905)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.getNextAncestorName(NetworkTopology.java:119)
>         at org.apache.hadoop.net.NetworkTopology$InnerNode.add(NetworkTopology.java:153)
>         at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:329)
>         at org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1885)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.