You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/07/05 14:06:15 UTC

Re: Nutch CrawlDbReader -stats gives EOFException error on hadoop

Hi Viksit,

It's a known issue now: https://issues.apache.org/jira/browse/NUTCH-1029

Cheers,


On Thursday 12 May 2011 22:10:12 Viksit Gaur wrote:
> Hi all,
> 
> When trying to run nutch's crawldb reader to get stats for my crawl
> database, I get the following error when calling it using hadoop,
> 
> Is this a known issue?
> 
> Thanks,
> Viksit
> 
> 
> sudo -u hdfs hadoop jar /opt/nutch-build/build/nutch-1.2.job
> org.apache.nutch.crawl.CrawlDbReader
> /crawl/crawl-dir-1305167589/crawldb -stats
> 1
> 1/05/12 19:48:08 INFO crawl.CrawlDbReader: CrawlDb statistics start:
> /crawl/crawl-dir-1305167589/crawldb
> 11/05/12 19:48:08 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 11/05/12 19:48:09 INFO mapred.FileInputFormat: Total input paths to process
> : 10 11/05/12 19:48:09 INFO mapred.JobClient: Running job:
> job_201105120113_0202 11/05/12 19:48:10 INFO mapred.JobClient:  map 0%
> reduce 0%
> 11/05/12 19:48:18 INFO mapred.JobClient:  map 10% reduce 0%
> 11/05/12 19:48:19 INFO mapred.JobClient:  map 20% reduce 0%
> 11/05/12 19:48:20 INFO mapred.JobClient:  map 30% reduce 0%
> 11/05/12 19:48:23 INFO mapred.JobClient:  map 40% reduce 0%
> 11/05/12 19:48:24 INFO mapred.JobClient:  map 50% reduce 0%
> 11/05/12 19:48:25 INFO mapred.JobClient:  map 60% reduce 0%
> 11/05/12 19:48:27 INFO mapred.JobClient:  map 70% reduce 0%
> 11/05/12 19:48:28 INFO mapred.JobClient:  map 80% reduce 0%
> 11/05/12 19:48:30 INFO mapred.JobClient:  map 90% reduce 0%
> 11/05/12 19:48:31 INFO mapred.JobClient:  map 100% reduce 0%
> 11/05/12 19:52:22 INFO mapred.JobClient:  map 100% reduce 3%
> 11/05/12 19:52:23 INFO mapred.JobClient:  map 100% reduce 10%
> 11/05/12 19:52:38 INFO mapred.JobClient:  map 100% reduce 13%
> 11/05/12 19:52:39 INFO mapred.JobClient:  map 100% reduce 20%
> 11/05/12 19:52:48 INFO mapred.JobClient:  map 100% reduce 30%
> 11/05/12 19:53:01 INFO mapred.JobClient:  map 100% reduce 33%
> 11/05/12 19:53:02 INFO mapred.JobClient:  map 100% reduce 40%
> 11/05/12 19:53:20 INFO mapred.JobClient:  map 100% reduce 43%
> 11/05/12 19:53:21 INFO mapred.JobClient:  map 100% reduce 50%
> 11/05/12 19:53:36 INFO mapred.JobClient:  map 100% reduce 53%
> 11/05/12 19:53:38 INFO mapred.JobClient:  map 100% reduce 60%
> 11/05/12 19:53:44 INFO mapred.JobClient:  map 100% reduce 63%
> 11/05/12 19:53:46 INFO mapred.JobClient:  map 100% reduce 70%
> 11/05/12 19:53:54 INFO mapred.JobClient:  map 100% reduce 73%
> 11/05/12 19:53:55 INFO mapred.JobClient:  map 100% reduce 80%
> 11/05/12 19:53:57 INFO mapred.JobClient:  map 100% reduce 90%
> 11/05/12 19:54:05 INFO mapred.JobClient:  map 100% reduce 100%
> 11/05/12 19:54:07 INFO mapred.JobClient: Job complete:
> job_201105120113_0202 11/05/12 19:54:07 INFO mapred.JobClient: Counters:
> 23
> 11/05/12 19:54:07 INFO mapred.JobClient:   Job Counters
> 11/05/12 19:54:07 INFO mapred.JobClient:     Launched reduce tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=46180
> 11/05/12 19:54:07 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/05/12 19:54:07 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/05/12 19:54:07 INFO mapred.JobClient:     Launched map tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient:     Data-local map tasks=10
> 11/05/12 19:54:07 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=87373
> 11/05/12 19:54:07 INFO mapred.JobClient:   FileSystemCounters
> 11/05/12 19:54:07 INFO mapred.JobClient:     FILE_BYTES_READ=34517
> 11/05/12 19:54:07 INFO mapred.JobClient:     HDFS_BYTES_READ=111602383
> 11/05/12 19:54:07 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1395398
> 11/05/12 19:54:07 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1871
> 11/05/12 19:54:07 INFO mapred.JobClient:   Map-Reduce Framework
> 11/05/12 19:54:07 INFO mapred.JobClient:     Reduce input groups=49
> 11/05/12 19:54:07 INFO mapred.JobClient:     Combine output records=219
> 11/05/12 19:54:07 INFO mapred.JobClient:     Map input records=808925
> 11/05/12 19:54:07 INFO mapred.JobClient:     Reduce shuffle bytes=3161
> 11/05/12 19:54:07 INFO mapred.JobClient:     Reduce output records=49
> 11/05/12 19:54:07 INFO mapred.JobClient:     Spilled Records=657
> 11/05/12 19:54:07 INFO mapred.JobClient:     Map output bytes=42873025
> 11/05/12 19:54:07 INFO mapred.JobClient:     Map input bytes=111599813
> 11/05/12 19:54:07 INFO mapred.JobClient:     Combine input records=3235700
> 11/05/12 19:54:07 INFO mapred.JobClient:     Map output records=3235700
> 11/05/12 19:54:07 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1710
> 11/05/12 19:54:07 INFO mapred.JobClient:     Reduce input records=219
> Exception in thread "main" java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1465)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1437)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
> 	at
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileO
> utputFormat.java:89) at
> org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:320
> ) at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 39) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
> pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350