You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/28 18:18:05 UTC
"unknown protocol" and some other problems in 0.8.
Hello
In this extract of "hadoop.log" we can find three problems I'm finding
with the freshly downloaded 0.8 version.
- Something happens with the links in some PDF files. This didn't happen
with 0.7 using the same version of PDFBox.
- I get "Aborting with hung threads" from time to time
- And also some NullPointerException here and there.
I'm using java1.5 running on Linux.
Regards
Daniel
2006-07-28 16:49:04,799 ERROR parse.OutlinkExtractor - getOutlinks
java.net.MalformedURLException: unknown protocol: roles
at java.net.URL.<init>(URL.java:574)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78)
at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35)
at
org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111)
at
org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70)
at
org.apache.nutch.parse.pdf.PdfParser.getParse(PdfParser.java:150)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152)
2006-07-28 16:53:12,686 WARN fetcher.Fetcher - Aborting with 7 hung
threads.
2006-07-28 16:53:54,871 FATAL fetcher.Fetcher -
java.lang.NullPointerException
2006-07-28 16:53:54,871 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:205)
2006-07-28 16:53:54,871 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:248)
2006-07-28 16:53:54,871 FATAL fetcher.Fetcher - at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:462)
2006-07-28 16:53:54,871 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.SequenceFileRecordReader.getPos(SequenceFileRecordReader.ja
va:68)
2006-07-28 16:53:54,872 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:115)
2006-07-28 16:53:54,872 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:114)
2006-07-28 16:53:54,872 FATAL fetcher.Fetcher - fetcher
caught:java.lang.NullPointerException
2006-07-28 16:54:02,805 INFO fetcher.Fetcher - Fetcher: done
2006-07-28 16:54:02,805 INFO crawl.CrawlDb - CrawlDb update: starting
2006-07-28 16:54:02,805 INFO crawl.CrawlDb - CrawlDb update: db:
crawl-20060728164608/crawldb
--
Daniel Varela Santoalla
European Centre for Medium-Range Weather Forecasts (ECMWF)