You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/07/31 02:40:39 UTC

hung threads - NullPointerException in getPos(FSDataInputStream.java:87)

Are hung threads natural?

I ran a crawl:
nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads 200 -depth 10 -topN 103103

it ran a few hours after which I noticed that it seemed hung:

fetching http://www.mediarights.org/film/the_rules_of_the_game.php
fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 failed with: Http code=500, url=http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5
Aborting with 46 hung threads.
java.lang.NullPointerException
at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)

lather, rinse, repeat
.
.
.
one final:
java.lang.NullPointerException


then it didn't progress (though I didn't wait long).

though hadoop.log seemed to keep going:

2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher caught:java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher caught:java.lang.NullPointerException
2007-07-30 16:16:02,932 INFO  fetcher.Fetcher - Fetcher: done
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: starting
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: db: /usr/tmp/86sites/crawldb
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: segments: [/usr/tmp/86sites/segments/20070730124436]
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: additions allowed: true
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing: true
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL filtering: true
2007-07-30 16:16:02,993 INFO  crawl.CrawlDb - CrawlDb update: Merging segment data into db.






      ____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

Re: hung threads - NullPointerException in getPos(FSDataInputStream.java:87)

Posted by LE QuocAnh <qu...@gmail.com>.
Your threads is larger than capacity of internet bandwidth => content ==
null or contentType == null

2007/7/31, Kai_testing Middleton <ka...@yahoo.com>:
>
> Are hung threads natural?
>
> I ran a crawl:
> nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads
> 200 -depth 10 -topN 103103
>
> it ran a few hours after which I noticed that it seemed hung:
>
> fetching http://www.mediarights.org/film/the_rules_of_the_game.php
> fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5failed with: Http code=500, url=
> http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5
> Aborting with 46 hung threads.
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java
> :125)
> at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> fetcher caught:java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java
> :125)
> at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> fetcher caught:java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
>
> lather, rinse, repeat
> .
> .
> .
> one final:
> java.lang.NullPointerException
>
>
> then it didn't progress (though I didn't wait long).
>
> though hadoop.log seemed to keep going:
>
> 2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher caught:
> java.lang.NullPointerException
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher -
> java.lang.NullPointerException
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher caught:
> java.lang.NullPointerException
> 2007-07-30 16:16:02,932 INFO  fetcher.Fetcher - Fetcher: done
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: starting
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: db:
> /usr/tmp/86sites/crawldb
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: segments:
> [/usr/tmp/86sites/segments/20070730124436]
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: additions
> allowed: true
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL
> normalizing: true
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL
> filtering: true
> 2007-07-30 16:16:02,993 INFO  crawl.CrawlDb - CrawlDb update: Merging
> segment data into db.
>
>
>
>
>
>
>
>       ____________________________________________________________________________________
> Park yourself in front of a world of choices in alternative vehicles.
> Visit the Yahoo! Auto Green Center.
> http://autos.yahoo.com/green_center/




-- 
********************************************************
Le Quoc Anh
Tel: 0912643289
http://quocanh263.googlepages.com/wedding
4/268 Le Trong Tan, Hanoi, Vietnam
********************************************************