You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/08/01 01:02:20 UTC

fetching stops for one hour

I logged in to check my crawl this morning and noticed that fetching seemed
frozen.  Console output was showing an exception.  I had gotten that yesterday
too. But today I thought I would let it run.  I logged back in a while later
and I noticed it had recovered.  My login was fortuitous because an inspection
of the whole hadoop.log file revealed only that one gap:

2007-07-31 07:47:10,003 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'outlink', using default
2007-07-31 08:47:59,528 INFO  fetcher.Fetcher - Fetcher: done

(A bigger chunk of the log surrounding this gap follows.)

That's a gap of one hour and 49 seconds.  What would cause nutch to freeze up
like that for a whole hour?

Here's the command I used to run the crawl:

$ nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/85sites -threads 20
-depth 10 -topN 103103

I'm using the nightly build nutch-2007-06-27_06-52-44.

At the recommendation of LE QuocAnh in regard to my problem from yesterday I
decreased the number of threads from 200 to 20:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg08985.html

Here is a larger chunk of hadoop.log:

2007-07-31 07:43:01,332 INFO  fetcher.Fetcher - fetching
http://blogs.ign.com/sng-ign/
2007-07-31 07:43:01,459 INFO  fetcher.Fetcher - fetching
http://www.mediarights.org/film/the_rules_of_the_game.php
2007-07-31 07:45:07,869 WARN  parse.ParseUtil - No suitable parser found when
trying to parse content http://www.fest21.com/_textimage/image/1185865273 of
type image/png
2007-07-31 07:45:07,869 WARN  fetcher.Fetcher - Error parsing:
http://www.fest21.com/_textimage/image/1185865273:
org.apache.nutch.parse.ParseException: parser not found for
contentType=image/png url=http://www.fest21.com/_textimage/image/1185865273
	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154)

2007-07-31 07:46:28,395 WARN  parse.ParseUtil - No suitable parser found when
trying to parse content http://www.fest21.com/_textimage/image/1185864507 of
type image/png
2007-07-31 07:46:28,395 WARN  fetcher.Fetcher - Error parsing:
http://www.fest21.com/_textimage/image/1185864507:
org.apache.nutch.parse.ParseException: parser not found for
contentType=image/png url=http://www.fest21.com/_textimage/image/1185864507
	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:76)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:309)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:154)

2007-07-31 07:47:08,977 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 07:47:09,274 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 07:47:09,275 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 07:47:09,276 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 07:47:10,003 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'outlink', using default
2007-07-31 08:47:59,528 INFO  fetcher.Fetcher - Fetcher: done
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: starting
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: db:
/usr/tmp/85sites/crawldb
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: segments:
[/usr/tmp/85sites/segments/20070731002418]
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: additions
allowed: true
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing:
true
2007-07-31 08:47:59,530 INFO  crawl.CrawlDb - CrawlDb update: URL filtering:
true
2007-07-31 08:47:59,559 INFO  crawl.CrawlDb - CrawlDb update: Merging segment
data into db.
2007-07-31 08:48:08,626 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:48:08,713 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:08,714 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:08,737 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:48:21,597 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:21,682 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:21,683 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:21,706 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:48:34,141 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:48:34,239 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:48:34,240 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:48:34,264 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:50:03,544 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:50:03,632 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:50:03,633 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:50:03,656 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:51:07,967 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:51:08,063 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:51:08,064 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:51:08,086 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:52:07,809 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:52:07,899 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	the nutch core
extension points (nutch-extensionpoints)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in
(scoring-opic)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-07-31 08:52:07,900 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-07-31 08:52:07,923 WARN  regex.RegexURLNormalizer - can't find rules for
scope 'crawldb', using default
2007-07-31 08:53:25,575 INFO  plugin.PluginRepository - Plugins: looking in:
/usr/local/nutch-2007-06-27_06-52-44/plugins
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - Registered Plugins:
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser
(lib-nekohtml)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Basic URL Normalizer
(urlnormalizer-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Html Parse Plug-in
(parse-html)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Feed Parse/Index/Query
Plug-in (feed)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Basic Indexing Filter
(index-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Text Parse Plug-in
(parse-text)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	JavaScript Parser
(parse-js)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Basic Query Filter
(query-basic)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	XML Libraries
(lib-xml)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Regex URL Normalizer
(urlnormalizer-regex)
2007-07-31 08:53:25,670 INFO  plugin.PluginRepository - 	Http Protocol Plug-in
(protocol-http)



       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/