You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Barry Haddow <bh...@inf.ed.ac.uk> on 2008/01/30 11:48:37 UTC
datanode errors during nutch crawl
Hi
I'm trying to set up a nutch crawl using hadoop, and the crawl normally stops
at depth 0, although sometimes it goes to depth 1. It should continue to
depth 3.
I think the problem may be in hadoop, since I'm seeing various errors in the
datanode log files, such as:
2008-01-30 10:27:51,487 WARN dfs.DataNode - Failed to transfer
blk_3160625876530276979 to 129.215.164.52:51010 got java.net.SocketException:
Connection reset
I can telnet to this ip/port so I don't think it's firewalled.
Also:
2008-01-30 10:27:17,157 ERROR dfs.DataNode - DataXceiver: java.io.IOException:
Block blk_-3070006959369401863 has already been started (though not
completed), and thus cannot be created.
2008-01-30 10:27:56,217 ERROR dfs.DataNode - DataXceiver: java.io.IOException:
Block blk_-712543843244766261 is valid, and cannot be written to.
2008-01-30 10:34:59,510 ERROR dfs.DataNode - DataXceiver: java.io.EOFException
I assume these errors are indicative of some problem in my hadoop
configuration, but I can't see what.
I'm using hadoop 0.15.0, distributed with nutch 2008-01-25
Any suggestions?
thanks and regards
Barry