You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Barry Haddow <bh...@inf.ed.ac.uk> on 2008/01/30 11:48:37 UTC

datanode errors during nutch crawl

Hi

I'm trying to set up a nutch crawl using hadoop, and the crawl normally stops 
at depth 0, although sometimes it goes to depth 1. It should continue to 
depth 3.

I think the problem may be in hadoop, since I'm seeing various errors in the 
datanode log files, such as:

2008-01-30 10:27:51,487 WARN  dfs.DataNode - Failed to transfer 
blk_3160625876530276979 to 129.215.164.52:51010 got java.net.SocketException: 
Connection reset

I can telnet to this ip/port so I don't think it's firewalled.

Also:
2008-01-30 10:27:17,157 ERROR dfs.DataNode - DataXceiver: java.io.IOException: 
Block blk_-3070006959369401863 has already been started (though not 
completed), and thus cannot be created.
2008-01-30 10:27:56,217 ERROR dfs.DataNode - DataXceiver: java.io.IOException: 
Block blk_-712543843244766261 is valid, and cannot be written to.
2008-01-30 10:34:59,510 ERROR dfs.DataNode - DataXceiver: java.io.EOFException

I assume these errors are indicative of some problem in my hadoop 
configuration, but I can't see what. 
I'm using hadoop 0.15.0, distributed with nutch 2008-01-25

Any suggestions?
thanks and regards
Barry