You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Bryan A. P. Pendleton" <bp...@geekdom.net> on 2007/01/17 19:58:27 UTC

Bug in today's DFS code?

I'm not sure what to make of it, but restarting my cluster running today's
checkout of code gives me a terrible state. Some random fraction of my
datanodes seem to be showing up as having no blocks, probably because of a
NPE during the IPC processing block reports. My logs are riddled with
sequences like these:

2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-2540546879629459821 on hostXX:50010
2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-2088395555965573137 on hostXX:50010
2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-1801448787921913203 on hostXX:50010
2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-1732709916964926315 on hostXX:50010
2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-1582025333970910026 on hostXX:50010
2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: Redundant addStoredBlock request received for
blk_-869004184333473970 on hostXX:50010
2007-01-17 10:55:39,716 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000 call error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException


-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Re: Bug in today's DFS code?

Posted by "Bryan A. P. Pendleton" <bp...@geekdom.net>.
Oh, of course, nodes that have the error are logging this once or twice a
second:

2007-01-16 03:02:53,239 WARN org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.NullPointerException

        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:258)


I'm ill equipped to live-debug IPC problems, and the logging of exceptions
isn't exceptionally helpful at this point. Why does this logging of the
RemoteException, for instance, only show that it occured at the line that
error handling happens in Client.java? The actual point where the exception
is thrown is lost, making debugging a whole lot harder.


On 1/17/07, Bryan A. P. Pendleton <bp...@geekdom.net> wrote:
>
> I'm not sure what to make of it, but restarting my cluster running today's
> checkout of code gives me a terrible state. Some random fraction of my
> datanodes seem to be showing up as having no blocks, probably because of a
> NPE during the IPC processing block reports. My logs are riddled with
> sequences like these:
>
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-2540546879629459821 on hostXX:50010
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-2088395555965573137 on hostXX:50010
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange : BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-1801448787921913203 on hostXX:50010
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock : Redundant addStoredBlock request received for
> blk_-1732709916964926315 on hostXX:50010
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-1582025333970910026 on hostXX:50010
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-869004184333473970 on hostXX:50010
> 2007-01-17 10:55:39,716 INFO org.apache.hadoop.ipc.Server : IPC Server
> handler 6 on 9000 call error: java.io.IOException:
> java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
>
>

-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Re: Bug in today's DFS code?

Posted by Doug Cutting <cu...@apache.org>.
Bryan A. P. Pendleton wrote:
> I'm not sure what to make of it, but restarting my cluster running today's
> checkout of code gives me a terrible state. Some random fraction of my
> datanodes seem to be showing up as having no blocks, probably because of a
> NPE during the IPC processing block reports. My logs are riddled with
> sequences like these:
> 
> 2007-01-17 10:55:39,716 WARN org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: Redundant addStoredBlock request received for
> blk_-2540546879629459821 on hostXX:50010

FYI, a bug has been filed to track this:

https://issues.apache.org/jira/browse/HADOOP-898

Doug