You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Stubblefield <ja...@gmail.com> on 2007/06/12 21:59:46 UTC
(Unknown)
Hi
I am having a problem with the nutch-0.9 fetcher. During a fetch the
fetch process I get the following message in my hadoop.log:
2007-06-12 12:23:25,892 INFO plugin.PluginRepository - Nutch
URL Filter (org.apache.nutch.net.URLFilter)2007-06-12 12:23:25,892
INFO plugin.PluginRepository - Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)2007-06-12 12:23:25,892
INFO plugin.PluginRepository - Nutch Online Search Results
Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-06-12 12:23:25,892 INFO plugin.PluginRepository - HTML
Parse Filter (org.apache.nutch.parse.HtmlParseFilter)2007-06-12
12:23:25,905 INFO plugin.PluginRepository - Nutch Content
Parser (org.apache.nutch.parse.Parser)2007-06-12 12:23:25,905 INFO
plugin.PluginRepository - Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)2007-06-12 12:23:25,905 INFO
plugin.PluginRepository - Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)2007-06-12 12:23:25,905 INFO
plugin.PluginRepository - Ontology Model Loader
(org.apache.nutch.ontology.Ontology)2007-06-12 12:23:25,990 WARN
regex.RegexURLNormalizer - can't find rules for scope 'outlink',
using default
this is the last message before the process uses 100% of the system
resources. It never exits or gives any other errors.
I am using the local file system on a single machine without map-
reduce. I have tried several configurations including JDK5 and JDK 6
with the same error. I have had success crawling a different list of
urls with the exact same settings on the same machine.
~Jason
Jason Stubblefield
jason.stubby@gmail.com
Please enjoy one of my web properties:
http://www.geothingy.com/
http://www.fivemushrooms.com/
http://www.wikitourist.com/