You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by AJ Chen <aj...@web2express.org> on 2010/09/27 00:19:38 UTC
updatedb fails
I fetch 3 segments and then do updatedb with 3 segments. The updatedb job is
completed, but the crawldb is not updated (by checking urls in crawldb). The
locked file and temp directory are still in the crawldb directory.
Apparently updatedb stops before merging is done. There is only one error
message:
2010-09-26 16:02:48,122 WARN mapred.TaskTracker - Error running child
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at
org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
What causes updatedb to stop?
the updatedb job status is here:
map() completion: 1.0
reduce() completion: 1.0
Counters: 18
Job Counters
Launched reduce tasks=8
Launched map tasks=171
Data-local map tasks=171
FileSystemCounters
FILE_BYTES_READ=31437553119
HDFS_BYTES_READ=17803591396
FILE_BYTES_WRITTEN=47532638653
HDFS_BYTES_WRITTEN=7460484375
Map-Reduce Framework
Reduce input groups=45926460
Combine output records=0
Map input records=139824153
Reduce shuffle bytes=16103425228
Reduce output records=45926460
Spilled Records=409880576
Map output bytes=15813496752
Map input bytes=17803440460
Combine input records=0
Map output records=137962810
Reduce input records=137962810
thanks
aj
--
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA
Re: updatedb fails
Posted by AJ Chen <aj...@web2express.org>.
I found similar "Filesystem closed" error was reported several time before
on nutch/hadoop mailing list, but could not find any explanation or fix.
Does anyone have an idea what might cause "Filesystem closed" during mapred
operation? Note that my hadoop cluster has been running successfully for
some time. This fatal error starts to occur only recently.
-aj
On Sun, Sep 26, 2010 at 3:19 PM, AJ Chen <aj...@web2express.org> wrote:
> I fetch 3 segments and then do updatedb with 3 segments. The updatedb job
> is completed, but the crawldb is not updated (by checking urls in crawldb).
> The locked file and temp directory are still in the crawldb directory.
> Apparently updatedb stops before merging is done. There is only one error
> message:
>
> 2010-09-26 16:02:48,122 WARN mapred.TaskTracker - Error running child
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
> at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
> at java.io.FilterInputStream.close(FilterInputStream.java:155)
> at
> org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> What causes updatedb to stop?
>
> the updatedb job status is here:
> map() completion: 1.0
> reduce() completion: 1.0
> Counters: 18
> Job Counters
> Launched reduce tasks=8
> Launched map tasks=171
> Data-local map tasks=171
> FileSystemCounters
> FILE_BYTES_READ=31437553119
> HDFS_BYTES_READ=17803591396
> FILE_BYTES_WRITTEN=47532638653
> HDFS_BYTES_WRITTEN=7460484375
> Map-Reduce Framework
> Reduce input groups=45926460
> Combine output records=0
> Map input records=139824153
> Reduce shuffle bytes=16103425228
> Reduce output records=45926460
> Spilled Records=409880576
> Map output bytes=15813496752
> Map input bytes=17803440460
> Combine input records=0
> Map output records=137962810
> Reduce input records=137962810
>
> thanks
> aj
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>