You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/07/05 14:06:15 UTC
Re: Nutch CrawlDbReader -stats gives EOFException error on hadoop
Hi Viksit,
It's a known issue now: https://issues.apache.org/jira/browse/NUTCH-1029
Cheers,
On Thursday 12 May 2011 22:10:12 Viksit Gaur wrote:
> Hi all,
>
> When trying to run nutch's crawldb reader to get stats for my crawl
> database, I get the following error when calling it using hadoop,
>
> Is this a known issue?
>
> Thanks,
> Viksit
>
>
> sudo -u hdfs hadoop jar /opt/nutch-build/build/nutch-1.2.job
> org.apache.nutch.crawl.CrawlDbReader
> /crawl/crawl-dir-1305167589/crawldb -stats
> 1
> 1/05/12 19:48:08 INFO crawl.CrawlDbReader: CrawlDb statistics start:
> /crawl/crawl-dir-1305167589/crawldb
> 11/05/12 19:48:08 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 11/05/12 19:48:09 INFO mapred.FileInputFormat: Total input paths to process
> : 10 11/05/12 19:48:09 INFO mapred.JobClient: Running job:
> job_201105120113_0202 11/05/12 19:48:10 INFO mapred.JobClient: map 0%
> reduce 0%
> 11/05/12 19:48:18 INFO mapred.JobClient: map 10% reduce 0%
> 11/05/12 19:48:19 INFO mapred.JobClient: map 20% reduce 0%
> 11/05/12 19:48:20 INFO mapred.JobClient: map 30% reduce 0%
> 11/05/12 19:48:23 INFO mapred.JobClient: map 40% reduce 0%
> 11/05/12 19:48:24 INFO mapred.JobClient: map 50% reduce 0%
> 11/05/12 19:48:25 INFO mapred.JobClient: map 60% reduce 0%
> 11/05/12 19:48:27 INFO mapred.JobClient: map 70% reduce 0%
> 11/05/12 19:48:28 INFO mapred.JobClient: map 80% reduce 0%
> 11/05/12 19:48:30 INFO mapred.JobClient: map 90% reduce 0%
> 11/05/12 19:48:31 INFO mapred.JobClient: map 100% reduce 0%
> 11/05/12 19:52:22 INFO mapred.JobClient: map 100% reduce 3%
> 11/05/12 19:52:23 INFO mapred.JobClient: map 100% reduce 10%
> 11/05/12 19:52:38 INFO mapred.JobClient: map 100% reduce 13%
> 11/05/12 19:52:39 INFO mapred.JobClient: map 100% reduce 20%
> 11/05/12 19:52:48 INFO mapred.JobClient: map 100% reduce 30%
> 11/05/12 19:53:01 INFO mapred.JobClient: map 100% reduce 33%
> 11/05/12 19:53:02 INFO mapred.JobClient: map 100% reduce 40%
> 11/05/12 19:53:20 INFO mapred.JobClient: map 100% reduce 43%
> 11/05/12 19:53:21 INFO mapred.JobClient: map 100% reduce 50%
> 11/05/12 19:53:36 INFO mapred.JobClient: map 100% reduce 53%
> 11/05/12 19:53:38 INFO mapred.JobClient: map 100% reduce 60%
> 11/05/12 19:53:44 INFO mapred.JobClient: map 100% reduce 63%
> 11/05/12 19:53:46 INFO mapred.JobClient: map 100% reduce 70%
> 11/05/12 19:53:54 INFO mapred.JobClient: map 100% reduce 73%
> 11/05/12 19:53:55 INFO mapred.JobClient: map 100% reduce 80%
> 11/05/12 19:53:57 INFO mapred.JobClient: map 100% reduce 90%
> 11/05/12 19:54:05 INFO mapred.JobClient: map 100% reduce 100%
> 11/05/12 19:54:07 INFO mapred.JobClient: Job complete:
> job_201105120113_0202 11/05/12 19:54:07 INFO mapred.JobClient: Counters:
> 23
> 11/05/12 19:54:07 INFO mapred.JobClient: Job Counters
> 11/05/12 19:54:07 INFO mapred.JobClient: Launched reduce tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=46180
> 11/05/12 19:54:07 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/05/12 19:54:07 INFO mapred.JobClient: Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/05/12 19:54:07 INFO mapred.JobClient: Launched map tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient: Data-local map tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=87373
> 11/05/12 19:54:07 INFO mapred.JobClient: FileSystemCounters
> 11/05/12 19:54:07 INFO mapred.JobClient: FILE_BYTES_READ=34517
> 11/05/12 19:54:07 INFO mapred.JobClient: HDFS_BYTES_READ=111602383
> 11/05/12 19:54:07 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1395398
> 11/05/12 19:54:07 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1871
> 11/05/12 19:54:07 INFO mapred.JobClient: Map-Reduce Framework
> 11/05/12 19:54:07 INFO mapred.JobClient: Reduce input groups=49
> 11/05/12 19:54:07 INFO mapred.JobClient: Combine output records=219
> 11/05/12 19:54:07 INFO mapred.JobClient: Map input records=808925
> 11/05/12 19:54:07 INFO mapred.JobClient: Reduce shuffle bytes=3161
> 11/05/12 19:54:07 INFO mapred.JobClient: Reduce output records=49
> 11/05/12 19:54:07 INFO mapred.JobClient: Spilled Records=657
> 11/05/12 19:54:07 INFO mapred.JobClient: Map output bytes=42873025
> 11/05/12 19:54:07 INFO mapred.JobClient: Map input bytes=111599813
> 11/05/12 19:54:07 INFO mapred.JobClient: Combine input records=3235700
> 11/05/12 19:54:07 INFO mapred.JobClient: Map output records=3235700
> 11/05/12 19:54:07 INFO mapred.JobClient: SPLIT_RAW_BYTES=1710
> 11/05/12 19:54:07 INFO mapred.JobClient: Reduce input records=219
> Exception in thread "main" java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at java.io.DataInputStream.readFully(DataInputStream.java:152)
> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1465)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1437)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
> at
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileO
> utputFormat.java:89) at
> org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:320
> ) at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 39) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
> pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350