You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by AJ Chen <aj...@web2express.org> on 2010/09/27 00:19:38 UTC

updatedb fails

I fetch 3 segments and then do updatedb with 3 segments. The updatedb job is
completed, but the crawldb is not updated (by checking urls in crawldb). The
locked file and temp directory are still in the crawldb directory.
Apparently updatedb stops before merging is done.  There is only one error
message:

2010-09-26 16:02:48,122 WARN  mapred.TaskTracker - Error running child
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
        at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
        at java.io.FilterInputStream.close(FilterInputStream.java:155)
        at
org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
        at
org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

What causes updatedb to stop?

the updatedb job status is here:
map() completion: 1.0
reduce() completion: 1.0
Counters: 18
        Job Counters
                Launched reduce tasks=8
                Launched map tasks=171
                Data-local map tasks=171
        FileSystemCounters
                FILE_BYTES_READ=31437553119
                HDFS_BYTES_READ=17803591396
                FILE_BYTES_WRITTEN=47532638653
                HDFS_BYTES_WRITTEN=7460484375
        Map-Reduce Framework
                Reduce input groups=45926460
                Combine output records=0
                Map input records=139824153
                Reduce shuffle bytes=16103425228
                Reduce output records=45926460
                Spilled Records=409880576
                Map output bytes=15813496752
                Map input bytes=17803440460
                Combine input records=0
                Map output records=137962810
                Reduce input records=137962810

thanks
aj
-- 
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA

Re: updatedb fails

Posted by AJ Chen <aj...@web2express.org>.
I found similar "Filesystem closed" error was reported several time before
on nutch/hadoop mailing list, but could not find any explanation or fix.
Does anyone have an idea what might cause "Filesystem closed" during mapred
operation?  Note that my hadoop cluster has been running successfully for
some time. This fatal error starts to occur only recently.
-aj

On Sun, Sep 26, 2010 at 3:19 PM, AJ Chen <aj...@web2express.org> wrote:

> I fetch 3 segments and then do updatedb with 3 segments. The updatedb job
> is completed, but the crawldb is not updated (by checking urls in crawldb).
> The locked file and temp directory are still in the crawldb directory.
> Apparently updatedb stops before merging is done.  There is only one error
> message:
>
> 2010-09-26 16:02:48,122 WARN  mapred.TaskTracker - Error running child
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
>         at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
>         at java.io.FilterInputStream.close(FilterInputStream.java:155)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
>         at
> org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125)
>         at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> What causes updatedb to stop?
>
> the updatedb job status is here:
> map() completion: 1.0
> reduce() completion: 1.0
> Counters: 18
>         Job Counters
>                 Launched reduce tasks=8
>                 Launched map tasks=171
>                 Data-local map tasks=171
>         FileSystemCounters
>                 FILE_BYTES_READ=31437553119
>                 HDFS_BYTES_READ=17803591396
>                 FILE_BYTES_WRITTEN=47532638653
>                 HDFS_BYTES_WRITTEN=7460484375
>         Map-Reduce Framework
>                 Reduce input groups=45926460
>                 Combine output records=0
>                 Map input records=139824153
>                 Reduce shuffle bytes=16103425228
>                 Reduce output records=45926460
>                 Spilled Records=409880576
>                 Map output bytes=15813496752
>                 Map input bytes=17803440460
>                 Combine input records=0
>                 Map output records=137962810
>                 Reduce input records=137962810
>
> thanks
> aj
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>