You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tim Benke <ze...@fusemail.com> on 2007/01/11 15:16:20 UTC

nutch in eclipse, No input directories specified

Hi,

thanks to these guides, I was able to get nutch into eclipse;
http://wiki.media-style.com/display/nutchDocu/use+eclipse+to+debug+nutch
http://wiki.apache.org/nutch/RunNutchInEclipse

I get the exception:
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
hadoop-site.xml

arguments in eclipse:
to the program:
urls -dir crawl -depth 3 -topN 50

to the vm:
-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log

environment variables NUTCH_JAVA_HOME, JAVA_HOME are set.
file urls/nutch:
http://lucene.apache.org/nutch/

I really hope someone can help me with this, I need nutch for my
bachelor thesis.

regards,

Tim Benke

the complete log is:

2007-01-11 14:03:29,831 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:29,940 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:30,003 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:30,018 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,018 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(89)) - crawl
started in: crawl
2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(90)) -
rootUrlDir = urls
2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(91)) -
threads = 10
2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(92)) - depth = 3
2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(94)) - topN = 50
2007-01-11 14:03:30,097 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:30,112 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:30,128 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(135))
- Injector: starting
2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(136))
- Injector: crawlDb: crawl/crawldb
2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(137))
- Injector: urlDir: urls
2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(147))
- Injector: Converting injected urls to crawl db entries.
2007-01-11 14:03:30,175 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:30,175 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:30,190 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:30,206 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,206 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,425 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:30,425 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:30,440 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:30,440 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,456 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,456 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,472 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:30,487 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,503 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
2007-01-11 14:03:30,518 INFO  mapred.JobClient
(JobClient.java:runJob(370)) - Running job: job_qo4f9q
2007-01-11 14:03:30,534 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:30,534 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,534 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
2007-01-11 14:03:30,565 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:30,643 INFO  mapred.MapTask (MapTask.java:run(155)) -
opened part-0.out
2007-01-11 14:03:30,675 INFO  plugin.PluginRepository
(PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
C:\wkspc\nutch_trunk\tmpBuild\src\plugin
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
mode: [true]
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(310)) - Registered Plugins:
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Creative Commons
Plugins (creativecommons)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Site Query Filter
(query-site)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
Plug-in (protocol-httpclient)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
(parse-html)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
(parse-pdf)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
(parse-msexcel)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     JavaScript Parser
(parse-js)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     URL Query Filter
(query-url)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
(parse-swf)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
(protocol-ftp)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
(analysis-fr)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
(parse-mp3)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
(parse-zip)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Online Search Results
Clustering using Carrot2's Lingo component (clustering-carrot2)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
(urlfilter-suffix)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
Parser/Indexer/Querier (microformats-reltag)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
(parse-rtf)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Language Identification
Parser/Filter (language-identifier)
2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
(parse-msword)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
(parse-text)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
(analysis-de)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
(urlnormalizer-regex)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
Parse Plug-in (parse-oo)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
(urlfilter-automaton)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
Summary Plug-in (summary-lucene)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Subcollection indexing
and query filter (subcollection)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
Framework (lib-regex-filter)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Analysers
(lib-lucene-analyzers)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
(index-basic)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Summarizer
Plug-in (summary-basic)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
(urlfilter-regex)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
(parse-ext)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
(protocol-http)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     the nutch core
extension points (nutch-extensionpoints)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Indexing Filter
(index-more)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Query Filter
(query-more)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
(lib-nekohtml)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
(urlfilter-prefix)
2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
Plug-in (parse-mspowerpoint)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
(urlnormalizer-basic)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pass-through URL
Normalizer (urlnormalizer-pass)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
Client (lib-commons-httpclient)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
(protocol-file)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
To Access Microsoft Format Files (lib-jakarta-poi)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Query Filter
(query-basic)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Parse MS Documents
Framework (lib-parsems)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
(parse-rss)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
(scoring-opic)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-01-11 14:03:31,065 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
suffix-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
2007-01-11 14:03:31,065 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
automaton-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
2007-01-11 14:03:31,456 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
crawl-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
2007-01-11 14:03:31,472 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
not found
2007-01-11 14:03:31,487 WARN  regex.RegexURLNormalizer
(RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for
scope 'inject', using default
2007-01-11 14:03:31,487 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) - C:/wkspc/nutch_trunk/urls/nutch:0+33
2007-01-11 14:03:31,503 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:31,503 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:31,503 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
2007-01-11 14:03:31,518 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:31,534 INFO  mapred.JobClient
(JobClient.java:runJob(385)) -  map 100% reduce 0%
2007-01-11 14:03:31,753 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) - reduce > reduce
2007-01-11 14:03:32,534 INFO  mapred.JobClient
(JobClient.java:runJob(401)) - Job complete: job_qo4f9q
2007-01-11 14:03:32,534 INFO  crawl.Injector (Injector.java:inject(163))
- Injector: Merging injected urls into crawl db.
2007-01-11 14:03:32,534 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:32,534 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:32,534 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:32,550 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,550 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,581 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:32,597 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:32,597 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:32,597 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,612 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,612 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,628 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:32,628 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,628 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
2007-01-11 14:03:32,628 INFO  mapred.JobClient
(JobClient.java:runJob(370)) - Running job: job_xiod9g
2007-01-11 14:03:32,643 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:32,643 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,643 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
2007-01-11 14:03:32,643 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,675 INFO  mapred.MapTask (MapTask.java:run(155)) -
opened part-0.out
2007-01-11 14:03:32,675 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) -
C:/tmp/hadoop-tbenke/mapred/temp/inject-temp-2045807797/part-00000:0+82
2007-01-11 14:03:32,690 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:32,706 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,706 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
2007-01-11 14:03:32,706 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:32,722 INFO  plugin.PluginRepository
(PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
C:\wkspc\nutch_trunk\tmpBuild\src\plugin
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
mode: [true]
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(310)) - Registered Plugins:
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Creative Commons
Plugins (creativecommons)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Site Query Filter
(query-site)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
Plug-in (protocol-httpclient)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
(parse-html)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
(parse-pdf)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
(parse-msexcel)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     JavaScript Parser
(parse-js)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     URL Query Filter
(query-url)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
(parse-swf)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
(protocol-ftp)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
(analysis-fr)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
(parse-mp3)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
(parse-zip)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Online Search Results
Clustering using Carrot2's Lingo component (clustering-carrot2)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
(urlfilter-suffix)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
Parser/Indexer/Querier (microformats-reltag)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
(parse-rtf)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Language Identification
Parser/Filter (language-identifier)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
(parse-msword)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
(parse-text)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
(analysis-de)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
(urlnormalizer-regex)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
Parse Plug-in (parse-oo)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
(urlfilter-automaton)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
Summary Plug-in (summary-lucene)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Subcollection indexing
and query filter (subcollection)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
Framework (lib-regex-filter)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Analysers
(lib-lucene-analyzers)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
(index-basic)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Summarizer
Plug-in (summary-basic)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
(urlfilter-regex)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
(parse-ext)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
(protocol-http)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     the nutch core
extension points (nutch-extensionpoints)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Indexing Filter
(index-more)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Query Filter
(query-more)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
(lib-nekohtml)
2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
(urlfilter-prefix)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
Plug-in (parse-mspowerpoint)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
(urlnormalizer-basic)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pass-through URL
Normalizer (urlnormalizer-pass)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
Client (lib-commons-httpclient)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
(protocol-file)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
To Access Microsoft Format Files (lib-jakarta-poi)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Query Filter
(query-basic)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Parse MS Documents
Framework (lib-parsems)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
(parse-rss)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
(scoring-opic)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-01-11 14:03:33,143 WARN  util.NativeCodeLoader
(NativeCodeLoader.java:<clinit>(50)) - Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2007-01-11 14:03:33,175 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) - reduce > reduce
2007-01-11 14:03:33,628 INFO  mapred.JobClient
(JobClient.java:runJob(401)) - Job complete: job_xiod9g
2007-01-11 14:03:33,659 INFO  crawl.Injector (Injector.java:inject(173))
- Injector: done
2007-01-11 14:03:34,659 INFO  crawl.Generator
(Generator.java:generate(371)) - Generator: Selecting best-scoring urls
due for fetch.
2007-01-11 14:03:34,659 INFO  crawl.Generator
(Generator.java:generate(372)) - Generator: starting
2007-01-11 14:03:34,659 INFO  crawl.Generator
(Generator.java:generate(373)) - Generator: segment:
crawl/segments/20070111140334
2007-01-11 14:03:34,659 INFO  crawl.Generator
(Generator.java:generate(374)) - Generator: filtering: false
2007-01-11 14:03:34,659 INFO  crawl.Generator
(Generator.java:generate(376)) - Generator: topN: 50
2007-01-11 14:03:34,659 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:34,659 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:34,675 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:34,675 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,675 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,675 INFO  crawl.Generator
(Generator.java:generate(388)) - Generator: jobtracker is 'local',
generating exactly one partition.
2007-01-11 14:03:34,706 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:34,722 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:34,722 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:34,737 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,737 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,737 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,737 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:34,753 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,753 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
2007-01-11 14:03:34,753 INFO  mapred.JobClient
(JobClient.java:runJob(370)) - Running job: job_m7h3ig
2007-01-11 14:03:34,753 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:34,768 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,768 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
2007-01-11 14:03:34,784 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:34,784 INFO  mapred.MapTask (MapTask.java:run(155)) -
opened part-0.out
2007-01-11 14:03:34,784 INFO  plugin.PluginRepository
(PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
C:\wkspc\nutch_trunk\tmpBuild\src\plugin
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
mode: [true]
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(310)) - Registered Plugins:
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Creative Commons
Plugins (creativecommons)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Site Query Filter
(query-site)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
Plug-in (protocol-httpclient)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
(parse-html)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
(parse-pdf)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
(parse-msexcel)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     JavaScript Parser
(parse-js)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     URL Query Filter
(query-url)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
(parse-swf)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
(protocol-ftp)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
(analysis-fr)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
(parse-mp3)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
(parse-zip)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Online Search Results
Clustering using Carrot2's Lingo component (clustering-carrot2)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
(urlfilter-suffix)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
Parser/Indexer/Querier (microformats-reltag)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
(parse-rtf)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Language Identification
Parser/Filter (language-identifier)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
(parse-msword)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
(parse-text)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
(analysis-de)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
(urlnormalizer-regex)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
Parse Plug-in (parse-oo)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
(urlfilter-automaton)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
Summary Plug-in (summary-lucene)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Subcollection indexing
and query filter (subcollection)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
Framework (lib-regex-filter)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Analysers
(lib-lucene-analyzers)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
(index-basic)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Summarizer
Plug-in (summary-basic)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
(urlfilter-regex)
2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
(parse-ext)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
(protocol-http)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     the nutch core
extension points (nutch-extensionpoints)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Indexing Filter
(index-more)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Query Filter
(query-more)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
(lib-nekohtml)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
(urlfilter-prefix)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
Plug-in (parse-mspowerpoint)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
(urlnormalizer-basic)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pass-through URL
Normalizer (urlnormalizer-pass)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
Client (lib-commons-httpclient)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
(protocol-file)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
To Access Microsoft Format Files (lib-jakarta-poi)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Query Filter
(query-basic)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Parse MS Documents
Framework (lib-parsems)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
(parse-rss)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
(scoring-opic)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-01-11 14:03:35,018 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
suffix-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
2007-01-11 14:03:35,018 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
automaton-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
2007-01-11 14:03:35,128 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
crawl-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
2007-01-11 14:03:35,128 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
not found
2007-01-11 14:03:35,143 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) -
C:/wkspc/nutch_trunk/crawl/crawldb/current/part-00000/data:0+125
2007-01-11 14:03:35,159 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:35,175 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,175 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
2007-01-11 14:03:35,175 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,190 INFO  plugin.PluginRepository
(PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
C:\wkspc\nutch_trunk\tmpBuild\src\plugin
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
mode: [true]
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(310)) - Registered Plugins:
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Creative Commons
Plugins (creativecommons)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Site Query Filter
(query-site)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
Plug-in (protocol-httpclient)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
(parse-html)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
(parse-pdf)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
(parse-msexcel)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     JavaScript Parser
(parse-js)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     URL Query Filter
(query-url)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
(parse-swf)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
(protocol-ftp)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
(analysis-fr)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
(parse-mp3)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
(parse-zip)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Online Search Results
Clustering using Carrot2's Lingo component (clustering-carrot2)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
(urlfilter-suffix)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
Parser/Indexer/Querier (microformats-reltag)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
(parse-rtf)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Language Identification
Parser/Filter (language-identifier)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
(parse-msword)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
(parse-text)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
(analysis-de)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
(urlnormalizer-regex)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
Parse Plug-in (parse-oo)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
(urlfilter-automaton)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
Summary Plug-in (summary-lucene)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Subcollection indexing
and query filter (subcollection)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
Framework (lib-regex-filter)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Lucene Analysers
(lib-lucene-analyzers)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
(index-basic)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Summarizer
Plug-in (summary-basic)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Regex URL Filter
(urlfilter-regex)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
(parse-ext)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
(protocol-http)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     the nutch core
extension points (nutch-extensionpoints)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Indexing Filter
(index-more)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     More Query Filter
(query-more)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
(lib-nekohtml)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
(urlfilter-prefix)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
Plug-in (parse-mspowerpoint)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
(urlnormalizer-basic)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Pass-through URL
Normalizer (urlnormalizer-pass)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
Client (lib-commons-httpclient)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
(protocol-file)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
To Access Microsoft Format Files (lib-jakarta-poi)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Basic Query Filter
(query-basic)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     Parse MS Documents
Framework (lib-parsems)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
(parse-rss)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
(scoring-opic)
2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
(org.apache.nutch.parse.Parser)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
(PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2007-01-11 14:03:35,409 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
suffix-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
2007-01-11 14:03:35,409 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
automaton-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
2007-01-11 14:03:35,519 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(441)) - found resource
crawl-urlfilter.txt at
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
2007-01-11 14:03:35,519 INFO  conf.Configuration
(Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
not found
2007-01-11 14:03:35,706 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(169)) - reduce > reduce
2007-01-11 14:03:35,753 INFO  mapred.JobClient
(JobClient.java:runJob(401)) - Job complete: job_m7h3ig
2007-01-11 14:03:35,753 WARN  crawl.Generator
(Generator.java:generate(419)) - Generator: 0 records selected for
fetching, exiting ...
2007-01-11 14:03:35,753 INFO  crawl.Crawl (Crawl.java:main(121)) -
Stopping at depth=0 - no more URLs to fetch.
2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(219)) -
LinkDb: starting
2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(220)) -
LinkDb: linkdb: crawl/linkdb
2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(221)) -
LinkDb: URL normalize: true
2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(222)) -
LinkDb: URL filter: true
2007-01-11 14:03:35,769 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:35,769 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:35,784 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:35,784 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,784 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,800 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:35,800 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
2007-01-11 14:03:35,815 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
2007-01-11 14:03:35,815 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,815 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,815 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,831 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-11 14:03:35,831 INFO  conf.Configuration
(Configuration.java:loadResource(495)) - parsing
jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-11 14:03:35,847 INFO  conf.Configuration
(Configuration.java:loadResource(504)) - parsing
/tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xml
2007-01-11 14:03:35,847 INFO  mapred.JobClient
(JobClient.java:runJob(370)) - Running job: job_kumfin
2007-01-11 14:03:35,847 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(147)) - job_kumfin
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
hadoop-site.xml
    at
org.apache.hadoop.mapred.InputFormatBase.listPaths(InputFormatBase.java:99)
    at
org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(SequenceFileInputFormat.java:39)
    at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:119)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:93)
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)


Re: nutch-0.8 bundle for eclipse

Posted by Renaud Richardet <re...@oslutions.com>.
I will try ;-)

Cheers,
Renaud


jian chen wrote:
> Hi, Renaud,
>
> Thanks for the info, this is very useful stuff. Especially for using 
> Eclipse to develop java apps.
>
> Is it possible to keep this going for future releases of Nutch?
>
> Cheers,
>
> Jian
> www.hongandjian.com <http://www.hongandjian.com>
>
> On 1/15/07, *Renaud Richardet* <ren@oslutions.com 
> <ma...@oslutions.com>> wrote:
>
>     Hello,
>
>     It seems like many people are having questions re running Nutch in
>     Eclipse, so here's a bundled version of Nutch-0.8 that can be imported
>     into Eclipse. It should get you up to speed very quickly. I tested
>     it on
>     Ubuntu and WinXP. Please let me know if find some configuration
>     problems.
>
>     http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.tar.gz (*nix)
>     http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.zip (windows)
>
>     Requirements:
>     Eclipse 3.2
>     Java 1.4 or higher, tested with 1.5
>
>     Import project into Eclipse:
>     From the "File" menu select "Import..." and select "General",
>     "Existing
>     Project into Workspace", Click "Next >"
>     Click "Browse..." next to "Select Root directory " and navigate to
>     the
>     root of this document. Click "Open"
>     Click "Finish" and the Package Explorer will show the project.
>
>     Configure:
>     Change the value CHANGE<ME in the file conf\nutch-site.xml
>     NUTCH WILL NOT RUN OTHERWISE
>
>     Run it:
>     Crawl: menu "Run", "Run..." then double click on "Crawl" on the
>     left list
>     Search: menu "Run", "Run..." then double click on "SearchBean"
>     By default, Nutch is set up to crawl http://www.cnn.com and
>     http://www.nytimes.com/
>
>     More infos:
>     see README-FIRST.txt
>     http://lucene.apache.org/nutch/tutorial.html
>     http://wiki.apache.org/nutch/RunNutchInEclipse
>
>     HTH,
>     Renaud
>


Re: nutch-0.8 bundle for eclipse

Posted by jian chen <ch...@gmail.com>.
Hi, Renaud,

Thanks for the info, this is very useful stuff. Especially for using Eclipse
to develop java apps.

Is it possible to keep this going for future releases of Nutch?

Cheers,

Jian
www.hongandjian.com

On 1/15/07, Renaud Richardet <re...@oslutions.com> wrote:
>
> Hello,
>
> It seems like many people are having questions re running Nutch in
> Eclipse, so here's a bundled version of Nutch-0.8 that can be imported
> into Eclipse. It should get you up to speed very quickly. I tested it on
> Ubuntu and WinXP. Please let me know if find some configuration problems.
>
> http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.tar.gz (*nix)
> http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.zip (windows)
>
> Requirements:
> Eclipse 3.2
> Java 1.4 or higher, tested with 1.5
>
> Import project into Eclipse:
> From the "File" menu select "Import..." and select "General", "Existing
> Project into Workspace", Click "Next >"
> Click "Browse..." next to "Select Root directory " and navigate to the
> root of this document. Click "Open"
> Click "Finish" and the Package Explorer will show the project.
>
> Configure:
> Change the value CHANGE<ME in the file conf\nutch-site.xml
> NUTCH WILL NOT RUN OTHERWISE
>
> Run it:
> Crawl: menu "Run", "Run..." then double click on "Crawl" on the left list
> Search: menu "Run", "Run..." then double click on "SearchBean"
> By default, Nutch is set up to crawl http://www.cnn.com and
> http://www.nytimes.com/
>
> More infos:
> see README-FIRST.txt
> http://lucene.apache.org/nutch/tutorial.html
> http://wiki.apache.org/nutch/RunNutchInEclipse
>
> HTH,
> Renaud
>
> --
> renaud richardet                           +1 617 230 9112
> renaud <at> oslutions.com         http://www.oslutions.com
>
>

nutch-0.8 bundle for eclipse

Posted by Renaud Richardet <re...@oslutions.com>.
Hello,

It seems like many people are having questions re running Nutch in 
Eclipse, so here’s a bundled version of Nutch-0.8 that can be imported 
into Eclipse. It should get you up to speed very quickly. I tested it on 
Ubuntu and WinXP. Please let me know if find some configuration problems.

http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.tar.gz (*nix)
http://www.oslutions.com/ren/nutch/nutch-0.8-eclipse.zip (windows)

Requirements:
Eclipse 3.2
Java 1.4 or higher, tested with 1.5

Import project into Eclipse:
 From the "File" menu select "Import..." and select "General", "Existing 
Project into Workspace", Click "Next >"
Click "Browse..." next to "Select Root directory " and navigate to the 
root of this document. Click "Open"
Click "Finish" and the Package Explorer will show the project.

Configure:
Change the value CHANGE<ME in the file conf\nutch-site.xml
NUTCH WILL NOT RUN OTHERWISE

Run it:
Crawl: menu "Run", "Run..." then double click on "Crawl" on the left list
Search: menu "Run", "Run..." then double click on "SearchBean"
By default, Nutch is set up to crawl http://www.cnn.com and 
http://www.nytimes.com/

More infos:
see README-FIRST.txt
http://lucene.apache.org/nutch/tutorial.html
http://wiki.apache.org/nutch/RunNutchInEclipse

HTH,
Renaud

-- 
renaud richardet                           +1 617 230 9112
renaud <at> oslutions.com         http://www.oslutions.com


Re: nutch in eclipse, No input directories specified

Posted by Dennis Kubes <nu...@dragonflymc.com>.
please post (you can't attach) a copy of your nutch-site.xml file and 
your .classpath file.

Dennis Kubes

Tim Benke wrote:
> Guys please help me with this, I tried to get it running for more than a 
> week and I don't have a clue what to try else...
> 
>  > On Thu, 2007-01-11 at 15:16 +0100, Tim Benke wrote:
>  >
>  >> Hi,
>  >>
>  >> thanks to these guides, I was able to get nutch into eclipse;
>  >> 
> http://wiki.media-style.com/display/nutchDocu/use+eclipse+to+debug+nutch
>  >> http://wiki.apache.org/nutch/RunNutchInEclipse
>  >>
>  >> I get the exception:
>  >> java.io.IOException: No input directories specified in: Configuration:
>  >> defaults: hadoop-default.xml , mapred-default.xml ,
>  >> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
>  >> hadoop-site.xml
>  >>
>  >>
>  >
> 
> Thorsten Scherler wrote:
>  > Hmm, not sure but above sounds that you have not
>  > "add the folder "conf" to the classpath (scroll down the list and
>  > right-click on "conf". This step is necessary)"
> 
> I tried that, the same exception is thrown, but some of the INFO-Log
> messages are omitted.
> I suspect the problem has to do with reading or evaluating the 
> urls-file. Everything works fine with the same url-file on the commandline;
> file urls/nutch:
> http://lucene.apache.org/nutch/
> 
> in Eclipse: urls/nutch contains the url
> 
> arguments in eclipse:
> to the program:
> urls -dir crawl -depth 3 -topN 50
> 
> 
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> topN = 50
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawl/segments/20070111170258
> Generator: filtering: false
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> LinkDb: starting
> LinkDb: linkdb: crawl/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> Exception in thread "main" java.io.IOException: Job failed!
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
>    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)
> 
> commandline:
> $ ./bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> topN = 50
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: starting
> Generator: segment: crawl/segments/20070111165009
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: crawl/segments/20070111165009
> Fetcher: threads: 10
> fetching http://lucene.apache.org/nutch/
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawl/crawldb
> CrawlDb update: segment: crawl/segments/20070111165009
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: starting
> ...
> 
> 
>>> arguments in eclipse:
>>> to the program:
>>> urls -dir crawl -depth 3 -topN 50
>>>
>>> to the vm:
>>> -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
>>>
>>> environment variables NUTCH_JAVA_HOME, JAVA_HOME are set.
>>> file urls/nutch:
>>> http://lucene.apache.org/nutch/
>>>
>>> I really hope someone can help me with this, I need nutch for my
>>> bachelor thesis.
>>>
>>> regards,
>>>
>>> Tim Benke
>>>
>>> the complete log is:
>>>
>>> 2007-01-11 14:03:29,831 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:29,940 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:30,003 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(89)) - crawl
>>> started in: crawl
>>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(90)) -
>>> rootUrlDir = urls
>>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(91)) -
>>> threads = 10
>>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(92)) - 
>>> depth = 3
>>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(94)) - 
>>> topN = 50
>>> 2007-01-11 14:03:30,097 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:30,112 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:30,128 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(135))
>>> - Injector: starting
>>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(136))
>>> - Injector: crawlDb: crawl/crawldb
>>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(137))
>>> - Injector: urlDir: urls
>>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(147))
>>> - Injector: Converting injected urls to crawl db entries.
>>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:30,190 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,472 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:30,487 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,503 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>>> 2007-01-11 14:03:30,518 INFO  mapred.JobClient
>>> (JobClient.java:runJob(370)) - Running job: job_qo4f9q
>>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>>> 2007-01-11 14:03:30,565 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:30,643 INFO  mapred.MapTask (MapTask.java:run(155)) -
>>> opened part-0.out
>>> 2007-01-11 14:03:30,675 INFO  plugin.PluginRepository
>>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>>> mode: [true]
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>>> Plugins (creativecommons)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>>> (query-site)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>>> Plug-in (protocol-httpclient)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>>> (parse-html)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>>> (parse-pdf)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>>> (parse-msexcel)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>>> (parse-js)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>>> (query-url)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>>> (parse-swf)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in 
>>> (ontology)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>>> (protocol-ftp)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>>> (analysis-fr)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>>> (parse-mp3)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>>> (parse-zip)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>>> (urlfilter-suffix)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>>> Parser/Indexer/Querier (microformats-reltag)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>>> (parse-rtf)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>>> Parser/Filter (language-identifier)
>>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>>> (parse-msword)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>>> (parse-text)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>>> (analysis-de)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>>> (urlnormalizer-regex)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>>> Parse Plug-in (parse-oo)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>>> (urlfilter-automaton)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>>> Summary Plug-in (summary-lucene)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>>> and query filter (subcollection)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> Framework (lib-regex-filter)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>>> (lib-lucene-analyzers)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>>> (index-basic)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>>> Plug-in (summary-basic)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> (urlfilter-regex)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework 
>>> (lib-http)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>>> (parse-ext)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>>> (protocol-http)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>>> extension points (nutch-extensionpoints)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>>> (index-more)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>>> (query-more)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>>> (lib-nekohtml)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>>> (urlfilter-prefix)
>>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>>> Plug-in (parse-mspowerpoint)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>>> (urlnormalizer-basic)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>>> Normalizer (urlnormalizer-pass)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>>> Client (lib-commons-httpclient)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>>> (protocol-file)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>>> To Access Microsoft Format Files (lib-jakarta-poi)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>>> (query-basic)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>>> Framework (lib-parsems)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>>> (parse-rss)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>>> (scoring-opic)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(320)) - Registered 
>>> Extension-Points:
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>>> (org.apache.nutch.searcher.Summarizer)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>>> (org.apache.nutch.scoring.ScoringFilter)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>>> (org.apache.nutch.protocol.Protocol)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>>> (org.apache.nutch.net.URLNormalizer)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>>> (org.apache.nutch.net.URLFilter)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>>> (org.apache.nutch.parse.HtmlParseFilter)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>>> (org.apache.nutch.indexer.IndexingFilter)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>>> (org.apache.nutch.parse.Parser)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>>> (org.apache.nutch.ontology.Ontology)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>>> (org.apache.nutch.analysis.NutchAnalyzer)
>>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>>> (org.apache.nutch.searcher.QueryFilter)
>>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> suffix-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> automaton-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>>> 2007-01-11 14:03:31,456 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> crawl-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>>> 2007-01-11 14:03:31,472 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>>> not found
>>> 2007-01-11 14:03:31,487 WARN  regex.RegexURLNormalizer
>>> (RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for
>>> scope 'inject', using default
>>> 2007-01-11 14:03:31,487 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) - 
>>> C:/wkspc/nutch_trunk/urls/nutch:0+33
>>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>>> 2007-01-11 14:03:31,518 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:31,534 INFO  mapred.JobClient
>>> (JobClient.java:runJob(385)) -  map 100% reduce 0%
>>> 2007-01-11 14:03:31,753 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>>> 2007-01-11 14:03:32,534 INFO  mapred.JobClient
>>> (JobClient.java:runJob(401)) - Job complete: job_qo4f9q
>>> 2007-01-11 14:03:32,534 INFO  crawl.Injector (Injector.java:inject(163))
>>> - Injector: Merging injected urls into crawl db.
>>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,581 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>>> 2007-01-11 14:03:32,628 INFO  mapred.JobClient
>>> (JobClient.java:runJob(370)) - Running job: job_xiod9g
>>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,675 INFO  mapred.MapTask (MapTask.java:run(155)) -
>>> opened part-0.out
>>> 2007-01-11 14:03:32,675 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) -
>>> C:/tmp/hadoop-tbenke/mapred/temp/inject-temp-2045807797/part-00000:0+82
>>> 2007-01-11 14:03:32,690 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:32,722 INFO  plugin.PluginRepository
>>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>>> mode: [true]
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>>> Plugins (creativecommons)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>>> (query-site)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>>> Plug-in (protocol-httpclient)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>>> (parse-html)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>>> (parse-pdf)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>>> (parse-msexcel)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>>> (parse-js)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>>> (query-url)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>>> (parse-swf)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in 
>>> (ontology)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>>> (protocol-ftp)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>>> (analysis-fr)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>>> (parse-mp3)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>>> (parse-zip)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>>> (urlfilter-suffix)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>>> Parser/Indexer/Querier (microformats-reltag)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>>> (parse-rtf)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>>> Parser/Filter (language-identifier)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>>> (parse-msword)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>>> (parse-text)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>>> (analysis-de)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>>> (urlnormalizer-regex)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>>> Parse Plug-in (parse-oo)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>>> (urlfilter-automaton)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>>> Summary Plug-in (summary-lucene)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>>> and query filter (subcollection)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> Framework (lib-regex-filter)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>>> (lib-lucene-analyzers)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>>> (index-basic)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>>> Plug-in (summary-basic)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> (urlfilter-regex)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework 
>>> (lib-http)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>>> (parse-ext)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>>> (protocol-http)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>>> extension points (nutch-extensionpoints)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>>> (index-more)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>>> (query-more)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>>> (lib-nekohtml)
>>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>>> (urlfilter-prefix)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>>> Plug-in (parse-mspowerpoint)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>>> (urlnormalizer-basic)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>>> Normalizer (urlnormalizer-pass)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>>> Client (lib-commons-httpclient)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>>> (protocol-file)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>>> To Access Microsoft Format Files (lib-jakarta-poi)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>>> (query-basic)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>>> Framework (lib-parsems)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>>> (parse-rss)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>>> (scoring-opic)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(320)) - Registered 
>>> Extension-Points:
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>>> (org.apache.nutch.searcher.Summarizer)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>>> (org.apache.nutch.scoring.ScoringFilter)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>>> (org.apache.nutch.protocol.Protocol)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>>> (org.apache.nutch.net.URLNormalizer)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>>> (org.apache.nutch.net.URLFilter)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>>> (org.apache.nutch.parse.HtmlParseFilter)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>>> (org.apache.nutch.indexer.IndexingFilter)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>>> (org.apache.nutch.parse.Parser)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>>> (org.apache.nutch.ontology.Ontology)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>>> (org.apache.nutch.analysis.NutchAnalyzer)
>>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>>> (org.apache.nutch.searcher.QueryFilter)
>>> 2007-01-11 14:03:33,143 WARN  util.NativeCodeLoader
>>> (NativeCodeLoader.java:<clinit>(50)) - Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 2007-01-11 14:03:33,175 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>>> 2007-01-11 14:03:33,628 INFO  mapred.JobClient
>>> (JobClient.java:runJob(401)) - Job complete: job_xiod9g
>>> 2007-01-11 14:03:33,659 INFO  crawl.Injector (Injector.java:inject(173))
>>> - Injector: done
>>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>>> (Generator.java:generate(371)) - Generator: Selecting best-scoring urls
>>> due for fetch.
>>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>>> (Generator.java:generate(372)) - Generator: starting
>>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>>> (Generator.java:generate(373)) - Generator: segment:
>>> crawl/segments/20070111140334
>>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>>> (Generator.java:generate(374)) - Generator: filtering: false
>>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>>> (Generator.java:generate(376)) - Generator: topN: 50
>>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,675 INFO  crawl.Generator
>>> (Generator.java:generate(388)) - Generator: jobtracker is 'local',
>>> generating exactly one partition.
>>> 2007-01-11 14:03:34,706 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>>> 2007-01-11 14:03:34,753 INFO  mapred.JobClient
>>> (JobClient.java:runJob(370)) - Running job: job_m7h3ig
>>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>>> 2007-01-11 14:03:34,784 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:34,784 INFO  mapred.MapTask (MapTask.java:run(155)) -
>>> opened part-0.out
>>> 2007-01-11 14:03:34,784 INFO  plugin.PluginRepository
>>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>>> mode: [true]
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>>> Plugins (creativecommons)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>>> (query-site)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>>> Plug-in (protocol-httpclient)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>>> (parse-html)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>>> (parse-pdf)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>>> (parse-msexcel)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>>> (parse-js)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>>> (query-url)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>>> (parse-swf)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in 
>>> (ontology)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>>> (protocol-ftp)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>>> (analysis-fr)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>>> (parse-mp3)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>>> (parse-zip)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>>> (urlfilter-suffix)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>>> Parser/Indexer/Querier (microformats-reltag)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>>> (parse-rtf)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>>> Parser/Filter (language-identifier)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>>> (parse-msword)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>>> (parse-text)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>>> (analysis-de)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>>> (urlnormalizer-regex)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>>> Parse Plug-in (parse-oo)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>>> (urlfilter-automaton)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>>> Summary Plug-in (summary-lucene)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>>> and query filter (subcollection)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> Framework (lib-regex-filter)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>>> (lib-lucene-analyzers)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>>> (index-basic)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>>> Plug-in (summary-basic)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> (urlfilter-regex)
>>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework 
>>> (lib-http)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>>> (parse-ext)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>>> (protocol-http)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>>> extension points (nutch-extensionpoints)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>>> (index-more)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>>> (query-more)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>>> (lib-nekohtml)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>>> (urlfilter-prefix)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>>> Plug-in (parse-mspowerpoint)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>>> (urlnormalizer-basic)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>>> Normalizer (urlnormalizer-pass)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>>> Client (lib-commons-httpclient)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>>> (protocol-file)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>>> To Access Microsoft Format Files (lib-jakarta-poi)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>>> (query-basic)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>>> Framework (lib-parsems)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>>> (parse-rss)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>>> (scoring-opic)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(320)) - Registered 
>>> Extension-Points:
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>>> (org.apache.nutch.searcher.Summarizer)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>>> (org.apache.nutch.scoring.ScoringFilter)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>>> (org.apache.nutch.protocol.Protocol)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>>> (org.apache.nutch.net.URLNormalizer)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>>> (org.apache.nutch.net.URLFilter)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>>> (org.apache.nutch.parse.HtmlParseFilter)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>>> (org.apache.nutch.indexer.IndexingFilter)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>>> (org.apache.nutch.parse.Parser)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>>> (org.apache.nutch.ontology.Ontology)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>>> (org.apache.nutch.analysis.NutchAnalyzer)
>>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>>> (org.apache.nutch.searcher.QueryFilter)
>>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> suffix-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> automaton-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> crawl-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>>> not found
>>> 2007-01-11 14:03:35,143 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) -
>>> C:/wkspc/nutch_trunk/crawl/crawldb/current/part-00000/data:0+125
>>> 2007-01-11 14:03:35,159 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,190 INFO  plugin.PluginRepository
>>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>>> mode: [true]
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>>> Plugins (creativecommons)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>>> (query-site)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>>> Plug-in (protocol-httpclient)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>>> (parse-html)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>>> (parse-pdf)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>>> (parse-msexcel)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>>> (parse-js)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>>> (query-url)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>>> (parse-swf)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in 
>>> (ontology)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>>> (protocol-ftp)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>>> (analysis-fr)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>>> (parse-mp3)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>>> (parse-zip)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>>> (urlfilter-suffix)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>>> Parser/Indexer/Querier (microformats-reltag)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>>> (parse-rtf)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>>> Parser/Filter (language-identifier)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>>> (parse-msword)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>>> (parse-text)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>>> (analysis-de)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>>> (urlnormalizer-regex)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>>> Parse Plug-in (parse-oo)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>>> (urlfilter-automaton)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>>> Summary Plug-in (summary-lucene)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>>> and query filter (subcollection)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> Framework (lib-regex-filter)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>>> (lib-lucene-analyzers)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>>> (index-basic)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>>> Plug-in (summary-basic)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>>> (urlfilter-regex)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework 
>>> (lib-http)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>>> (parse-ext)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>>> (protocol-http)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>>> extension points (nutch-extensionpoints)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>>> (index-more)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>>> (query-more)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>>> (lib-nekohtml)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>>> (urlfilter-prefix)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>>> Plug-in (parse-mspowerpoint)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>>> (urlnormalizer-basic)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>>> Normalizer (urlnormalizer-pass)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>>> Client (lib-commons-httpclient)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>>> (protocol-file)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>>> To Access Microsoft Format Files (lib-jakarta-poi)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>>> (query-basic)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>>> Framework (lib-parsems)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>>> (parse-rss)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>>> (scoring-opic)
>>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(320)) - Registered 
>>> Extension-Points:
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>>> (org.apache.nutch.searcher.Summarizer)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>>> (org.apache.nutch.scoring.ScoringFilter)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>>> (org.apache.nutch.protocol.Protocol)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>>> (org.apache.nutch.net.URLNormalizer)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>>> (org.apache.nutch.net.URLFilter)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>>> (org.apache.nutch.parse.HtmlParseFilter)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>>> (org.apache.nutch.indexer.IndexingFilter)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>>> (org.apache.nutch.parse.Parser)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>>> (org.apache.nutch.ontology.Ontology)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>>> (org.apache.nutch.analysis.NutchAnalyzer)
>>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>>> (org.apache.nutch.searcher.QueryFilter)
>>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> suffix-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> automaton-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>>> crawl-urlfilter.txt at
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>>> not found
>>> 2007-01-11 14:03:35,706 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>>> 2007-01-11 14:03:35,753 INFO  mapred.JobClient
>>> (JobClient.java:runJob(401)) - Job complete: job_m7h3ig
>>> 2007-01-11 14:03:35,753 WARN  crawl.Generator
>>> (Generator.java:generate(419)) - Generator: 0 records selected for
>>> fetching, exiting ...
>>> 2007-01-11 14:03:35,753 INFO  crawl.Crawl (Crawl.java:main(121)) -
>>> Stopping at depth=0 - no more URLs to fetch.
>>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(219)) -
>>> LinkDb: starting
>>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(220)) -
>>> LinkDb: linkdb: crawl/linkdb
>>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(221)) -
>>> LinkDb: URL normalize: true
>>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(222)) -
>>> LinkDb: URL filter: true
>>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>>> (Configuration.java:loadResource(495)) - parsing
>>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>>> 2007-01-11 14:03:35,847 INFO  conf.Configuration
>>> (Configuration.java:loadResource(504)) - parsing
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xml
>>> 2007-01-11 14:03:35,847 INFO  mapred.JobClient
>>> (JobClient.java:runJob(370)) - Running job: job_kumfin
>>> 2007-01-11 14:03:35,847 WARN  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(147)) - job_kumfin
>>> java.io.IOException: No input directories specified in: Configuration:
>>> defaults: hadoop-default.xml , mapred-default.xml ,
>>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
>>> hadoop-site.xml
>>>     at
>>> org.apache.hadoop.mapred.InputFormatBase.listPaths(InputFormatBase.java:99) 
>>>
>>>     at
>>> org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(SequenceFileInputFormat.java:39) 
>>>
>>>     at
>>> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:119) 
>>>
>>>     at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:93)
>>> Exception in thread "main" java.io.IOException: Job failed!
>>>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
>>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
>>>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)
>>>
>>>     
>>
>>   
> 

Re: nutch in eclipse, No input directories specified

Posted by Tim Benke <ze...@fusemail.com>.
Guys please help me with this, I tried to get it running for more than a 
week and I don't have a clue what to try else...

 > On Thu, 2007-01-11 at 15:16 +0100, Tim Benke wrote:
 >
 >> Hi,
 >>
 >> thanks to these guides, I was able to get nutch into eclipse;
 >> http://wiki.media-style.com/display/nutchDocu/use+eclipse+to+debug+nutch
 >> http://wiki.apache.org/nutch/RunNutchInEclipse
 >>
 >> I get the exception:
 >> java.io.IOException: No input directories specified in: Configuration:
 >> defaults: hadoop-default.xml , mapred-default.xml ,
 >> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
 >> hadoop-site.xml
 >>
 >>
 >

Thorsten Scherler wrote:
 > Hmm, not sure but above sounds that you have not
 > "add the folder "conf" to the classpath (scroll down the list and
 > right-click on "conf". This step is necessary)"

I tried that, the same exception is thrown, but some of the INFO-Log
messages are omitted.
I suspect the problem has to do with reading or evaluating the 
urls-file. Everything works fine with the same url-file on the commandline;
file urls/nutch:
http://lucene.apache.org/nutch/

in Eclipse: urls/nutch contains the url

arguments in eclipse:
to the program:
urls -dir crawl -depth 3 -topN 50


crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070111170258
Generator: filtering: false
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
LinkDb: starting
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)

commandline:
$ ./bin/nutch crawl urls -dir crawl -depth 3 -topN 50
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawl/segments/20070111165009
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070111165009
Fetcher: threads: 10
fetching http://lucene.apache.org/nutch/
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segment: crawl/segments/20070111165009
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
...


>> arguments in eclipse:
>> to the program:
>> urls -dir crawl -depth 3 -topN 50
>>
>> to the vm:
>> -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
>>
>> environment variables NUTCH_JAVA_HOME, JAVA_HOME are set.
>> file urls/nutch:
>> http://lucene.apache.org/nutch/
>>
>> I really hope someone can help me with this, I need nutch for my
>> bachelor thesis.
>>
>> regards,
>>
>> Tim Benke
>>
>> the complete log is:
>>
>> 2007-01-11 14:03:29,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:29,940 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,003 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(89)) - crawl
>> started in: crawl
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(90)) -
>> rootUrlDir = urls
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(91)) -
>> threads = 10
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(92)) - depth = 3
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(94)) - topN = 50
>> 2007-01-11 14:03:30,097 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,112 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,128 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(135))
>> - Injector: starting
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(136))
>> - Injector: crawlDb: crawl/crawldb
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(137))
>> - Injector: urlDir: urls
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(147))
>> - Injector: Converting injected urls to crawl db entries.
>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,190 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,472 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,487 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:30,518 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_qo4f9q
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:30,565 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,643 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:30,675 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:31,456 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:31,472 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:31,487 WARN  regex.RegexURLNormalizer
>> (RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for
>> scope 'inject', using default
>> 2007-01-11 14:03:31,487 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - C:/wkspc/nutch_trunk/urls/nutch:0+33
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:31,518 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:31,534 INFO  mapred.JobClient
>> (JobClient.java:runJob(385)) -  map 100% reduce 0%
>> 2007-01-11 14:03:31,753 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:32,534 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_qo4f9q
>> 2007-01-11 14:03:32,534 INFO  crawl.Injector (Injector.java:inject(163))
>> - Injector: Merging injected urls into crawl db.
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,581 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,628 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_xiod9g
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,675 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:32,675 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) -
>> C:/tmp/hadoop-tbenke/mapred/temp/inject-temp-2045807797/part-00000:0+82
>> 2007-01-11 14:03:32,690 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,722 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:33,143 WARN  util.NativeCodeLoader
>> (NativeCodeLoader.java:<clinit>(50)) - Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 2007-01-11 14:03:33,175 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:33,628 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_xiod9g
>> 2007-01-11 14:03:33,659 INFO  crawl.Injector (Injector.java:inject(173))
>> - Injector: done
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(371)) - Generator: Selecting best-scoring urls
>> due for fetch.
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(372)) - Generator: starting
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(373)) - Generator: segment:
>> crawl/segments/20070111140334
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(374)) - Generator: filtering: false
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(376)) - Generator: topN: 50
>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,675 INFO  crawl.Generator
>> (Generator.java:generate(388)) - Generator: jobtracker is 'local',
>> generating exactly one partition.
>> 2007-01-11 14:03:34,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:34,753 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_m7h3ig
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:34,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,784 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:34,784 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:35,143 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) -
>> C:/wkspc/nutch_trunk/crawl/crawldb/current/part-00000/data:0+125
>> 2007-01-11 14:03:35,159 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,190 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:35,706 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:35,753 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_m7h3ig
>> 2007-01-11 14:03:35,753 WARN  crawl.Generator
>> (Generator.java:generate(419)) - Generator: 0 records selected for
>> fetching, exiting ...
>> 2007-01-11 14:03:35,753 INFO  crawl.Crawl (Crawl.java:main(121)) -
>> Stopping at depth=0 - no more URLs to fetch.
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(219)) -
>> LinkDb: starting
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(220)) -
>> LinkDb: linkdb: crawl/linkdb
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(221)) -
>> LinkDb: URL normalize: true
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(222)) -
>> LinkDb: URL filter: true
>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,847 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xml
>> 2007-01-11 14:03:35,847 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_kumfin
>> 2007-01-11 14:03:35,847 WARN  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(147)) - job_kumfin
>> java.io.IOException: No input directories specified in: Configuration:
>> defaults: hadoop-default.xml , mapred-default.xml ,
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
>> hadoop-site.xml
>>     at
>> org.apache.hadoop.mapred.InputFormatBase.listPaths(InputFormatBase.java:99)
>>     at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(SequenceFileInputFormat.java:39)
>>     at
>> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:119)
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:93)
>> Exception in thread "main" java.io.IOException: Job failed!
>>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
>>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)
>>
>>     
>
>   


Re: nutch in eclipse, No input directories specified

Posted by Tim Benke <ze...@fusemail.com>.
Thorsten Scherler wrote:
> On Thu, 2007-01-11 at 15:16 +0100, Tim Benke wrote:
>   
>> Hi,
>>
>> thanks to these guides, I was able to get nutch into eclipse;
>> http://wiki.media-style.com/display/nutchDocu/use+eclipse+to+debug+nutch
>> http://wiki.apache.org/nutch/RunNutchInEclipse
>>
>> I get the exception:
>> java.io.IOException: No input directories specified in: Configuration:
>> defaults: hadoop-default.xml , mapred-default.xml ,
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
>> hadoop-site.xml
>>
>>     
>
> Hmm, not sure but above sounds that you have not
> "add the folder "conf" to the classpath (scroll down the list and
> right-click on "conf". This step is necessary)"
>
> HTH
> salu2
>
>   
I tried that, the same exception is thrown, but some of the INFO-Log. 
messages are omitted.
I suspect the problem has to do with the urls-file, because  everything 
works
fine with the same url-file on the commandline;

in Eclipse: urls/nutch contains the url

arguments in eclipse:
to the program:
urls -dir crawl -depth 3 -topN 50


crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070111170258
Generator: filtering: false
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
LinkDb: starting
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
    at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)

commandline:
$ ./bin/nutch crawl urls -dir crawl -depth 3 -topN 50
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawl/segments/20070111165009
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070111165009
Fetcher: threads: 10
fetching http://lucene.apache.org/nutch/
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segment: crawl/segments/20070111165009
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
...


>> arguments in eclipse:
>> to the program:
>> urls -dir crawl -depth 3 -topN 50
>>
>> to the vm:
>> -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
>>
>> environment variables NUTCH_JAVA_HOME, JAVA_HOME are set.
>> file urls/nutch:
>> http://lucene.apache.org/nutch/
>>
>> I really hope someone can help me with this, I need nutch for my
>> bachelor thesis.
>>
>> regards,
>>
>> Tim Benke
>>
>> the complete log is:
>>
>> 2007-01-11 14:03:29,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:29,940 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,003 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,018 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(89)) - crawl
>> started in: crawl
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(90)) -
>> rootUrlDir = urls
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(91)) -
>> threads = 10
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(92)) - depth = 3
>> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(94)) - topN = 50
>> 2007-01-11 14:03:30,097 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,112 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,128 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(135))
>> - Injector: starting
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(136))
>> - Injector: crawlDb: crawl/crawldb
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(137))
>> - Injector: urlDir: urls
>> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(147))
>> - Injector: Converting injected urls to crawl db entries.
>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,190 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,206 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,425 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:30,440 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,456 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,472 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,487 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:30,518 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_qo4f9q
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:30,565 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:30,643 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:30,675 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:31,065 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:31,456 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:31,472 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:31,487 WARN  regex.RegexURLNormalizer
>> (RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for
>> scope 'inject', using default
>> 2007-01-11 14:03:31,487 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - C:/wkspc/nutch_trunk/urls/nutch:0+33
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:31,503 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
>> 2007-01-11 14:03:31,518 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:31,534 INFO  mapred.JobClient
>> (JobClient.java:runJob(385)) -  map 100% reduce 0%
>> 2007-01-11 14:03:31,753 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:32,534 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_qo4f9q
>> 2007-01-11 14:03:32,534 INFO  crawl.Injector (Injector.java:inject(163))
>> - Injector: Merging injected urls into crawl db.
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:32,534 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,550 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,581 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:32,597 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,612 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,628 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,628 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_xiod9g
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,643 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,675 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:32,675 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) -
>> C:/tmp/hadoop-tbenke/mapred/temp/inject-temp-2045807797/part-00000:0+82
>> 2007-01-11 14:03:32,690 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
>> 2007-01-11 14:03:32,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:32,722 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:33,143 WARN  util.NativeCodeLoader
>> (NativeCodeLoader.java:<clinit>(50)) - Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 2007-01-11 14:03:33,175 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:33,628 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_xiod9g
>> 2007-01-11 14:03:33,659 INFO  crawl.Injector (Injector.java:inject(173))
>> - Injector: done
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(371)) - Generator: Selecting best-scoring urls
>> due for fetch.
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(372)) - Generator: starting
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(373)) - Generator: segment:
>> crawl/segments/20070111140334
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(374)) - Generator: filtering: false
>> 2007-01-11 14:03:34,659 INFO  crawl.Generator
>> (Generator.java:generate(376)) - Generator: topN: 50
>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,659 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,675 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,675 INFO  crawl.Generator
>> (Generator.java:generate(388)) - Generator: jobtracker is 'local',
>> generating exactly one partition.
>> 2007-01-11 14:03:34,706 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:34,722 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,737 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:34,753 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_m7h3ig
>> 2007-01-11 14:03:34,753 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,768 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:34,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:34,784 INFO  mapred.MapTask (MapTask.java:run(155)) -
>> opened part-0.out
>> 2007-01-11 14:03:34,784 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:35,018 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:35,128 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:35,143 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) -
>> C:/wkspc/nutch_trunk/crawl/crawldb/current/part-00000/data:0+125
>> 2007-01-11 14:03:35,159 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
>> 2007-01-11 14:03:35,175 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,190 INFO  plugin.PluginRepository
>> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
>> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
>> mode: [true]
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Creative Commons
>> Plugins (creativecommons)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
>> (query-site)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
>> Plug-in (protocol-httpclient)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
>> (parse-html)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
>> (parse-pdf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
>> (parse-msexcel)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
>> (parse-js)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
>> (query-url)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
>> (parse-swf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
>> (protocol-ftp)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
>> (analysis-fr)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
>> (parse-mp3)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
>> (parse-zip)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Online Search Results
>> Clustering using Carrot2's Lingo component (clustering-carrot2)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
>> (urlfilter-suffix)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
>> Parser/Indexer/Querier (microformats-reltag)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
>> (parse-rtf)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Language Identification
>> Parser/Filter (language-identifier)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
>> (parse-msword)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
>> (parse-text)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
>> (analysis-de)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
>> (urlnormalizer-regex)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
>> Parse Plug-in (parse-oo)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
>> (urlfilter-automaton)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
>> Summary Plug-in (summary-lucene)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
>> and query filter (subcollection)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> Framework (lib-regex-filter)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
>> (lib-lucene-analyzers)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
>> (index-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
>> Plug-in (summary-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
>> (urlfilter-regex)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
>> (parse-ext)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
>> (protocol-http)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     the nutch core
>> extension points (nutch-extensionpoints)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
>> (index-more)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     More Query Filter
>> (query-more)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
>> (lib-nekohtml)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
>> (urlfilter-prefix)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
>> Plug-in (parse-mspowerpoint)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
>> (urlnormalizer-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
>> Normalizer (urlnormalizer-pass)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
>> Client (lib-commons-httpclient)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
>> (protocol-file)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
>> To Access Microsoft Format Files (lib-jakarta-poi)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
>> (query-basic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
>> Framework (lib-parsems)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
>> (parse-rss)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
>> (scoring-opic)
>> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
>> (org.apache.nutch.searcher.Summarizer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
>> (org.apache.nutch.scoring.ScoringFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
>> (org.apache.nutch.protocol.Protocol)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
>> (org.apache.nutch.net.URLNormalizer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
>> (org.apache.nutch.net.URLFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
>> (org.apache.nutch.parse.HtmlParseFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
>> (org.apache.nutch.indexer.IndexingFilter)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
>> (org.apache.nutch.parse.Parser)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
>> (org.apache.nutch.ontology.Ontology)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
>> (org.apache.nutch.analysis.NutchAnalyzer)
>> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
>> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
>> (org.apache.nutch.searcher.QueryFilter)
>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> suffix-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
>> 2007-01-11 14:03:35,409 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> automaton-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(441)) - found resource
>> crawl-urlfilter.txt at
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
>> 2007-01-11 14:03:35,519 INFO  conf.Configuration
>> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
>> not found
>> 2007-01-11 14:03:35,706 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:progress(169)) - reduce > reduce
>> 2007-01-11 14:03:35,753 INFO  mapred.JobClient
>> (JobClient.java:runJob(401)) - Job complete: job_m7h3ig
>> 2007-01-11 14:03:35,753 WARN  crawl.Generator
>> (Generator.java:generate(419)) - Generator: 0 records selected for
>> fetching, exiting ...
>> 2007-01-11 14:03:35,753 INFO  crawl.Crawl (Crawl.java:main(121)) -
>> Stopping at depth=0 - no more URLs to fetch.
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(219)) -
>> LinkDb: starting
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(220)) -
>> LinkDb: linkdb: crawl/linkdb
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(221)) -
>> LinkDb: URL normalize: true
>> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(222)) -
>> LinkDb: URL filter: true
>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,769 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,784 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,800 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,815 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
>> 2007-01-11 14:03:35,831 INFO  conf.Configuration
>> (Configuration.java:loadResource(495)) - parsing
>> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
>> 2007-01-11 14:03:35,847 INFO  conf.Configuration
>> (Configuration.java:loadResource(504)) - parsing
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xml
>> 2007-01-11 14:03:35,847 INFO  mapred.JobClient
>> (JobClient.java:runJob(370)) - Running job: job_kumfin
>> 2007-01-11 14:03:35,847 WARN  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(147)) - job_kumfin
>> java.io.IOException: No input directories specified in: Configuration:
>> defaults: hadoop-default.xml , mapred-default.xml ,
>> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
>> hadoop-site.xml
>>     at
>> org.apache.hadoop.mapred.InputFormatBase.listPaths(InputFormatBase.java:99)
>>     at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(SequenceFileInputFormat.java:39)
>>     at
>> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:119)
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:93)
>> Exception in thread "main" java.io.IOException: Job failed!
>>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
>>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)
>>
>>     
>
>   


Re: nutch in eclipse, No input directories specified

Posted by Thorsten Scherler <th...@juntadeandalucia.es>.
On Thu, 2007-01-11 at 15:16 +0100, Tim Benke wrote:
> Hi,
> 
> thanks to these guides, I was able to get nutch into eclipse;
> http://wiki.media-style.com/display/nutchDocu/use+eclipse+to+debug+nutch
> http://wiki.apache.org/nutch/RunNutchInEclipse
> 
> I get the exception:
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
> hadoop-site.xml
> 

Hmm, not sure but above sounds that you have not
"add the folder "conf" to the classpath (scroll down the list and
right-click on "conf". This step is necessary)"

HTH
salu2

> arguments in eclipse:
> to the program:
> urls -dir crawl -depth 3 -topN 50
> 
> to the vm:
> -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
> 
> environment variables NUTCH_JAVA_HOME, JAVA_HOME are set.
> file urls/nutch:
> http://lucene.apache.org/nutch/
> 
> I really hope someone can help me with this, I need nutch for my
> bachelor thesis.
> 
> regards,
> 
> Tim Benke
> 
> the complete log is:
> 
> 2007-01-11 14:03:29,831 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:29,940 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:30,003 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:30,018 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,018 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(89)) - crawl
> started in: crawl
> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(90)) -
> rootUrlDir = urls
> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(91)) -
> threads = 10
> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(92)) - depth = 3
> 2007-01-11 14:03:30,034 INFO  crawl.Crawl (Crawl.java:main(94)) - topN = 50
> 2007-01-11 14:03:30,097 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:30,112 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:30,128 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(135))
> - Injector: starting
> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(136))
> - Injector: crawlDb: crawl/crawldb
> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(137))
> - Injector: urlDir: urls
> 2007-01-11 14:03:30,159 INFO  crawl.Injector (Injector.java:inject(147))
> - Injector: Converting injected urls to crawl db entries.
> 2007-01-11 14:03:30,175 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:30,175 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:30,190 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:30,206 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,206 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,425 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:30,425 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:30,440 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:30,440 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,456 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,456 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,472 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:30,487 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,503 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
> 2007-01-11 14:03:30,518 INFO  mapred.JobClient
> (JobClient.java:runJob(370)) - Running job: job_qo4f9q
> 2007-01-11 14:03:30,534 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:30,534 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,534 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
> 2007-01-11 14:03:30,565 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:30,643 INFO  mapred.MapTask (MapTask.java:run(155)) -
> opened part-0.out
> 2007-01-11 14:03:30,675 INFO  plugin.PluginRepository
> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
> mode: [true]
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Creative Commons
> Plugins (creativecommons)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
> (query-site)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
> Plug-in (protocol-httpclient)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
> (parse-html)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
> (parse-pdf)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
> (parse-msexcel)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
> (parse-js)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
> (query-url)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
> (parse-swf)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
> (protocol-ftp)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
> (analysis-fr)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
> (parse-mp3)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
> (parse-zip)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Online Search Results
> Clustering using Carrot2's Lingo component (clustering-carrot2)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
> (urlfilter-suffix)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
> Parser/Indexer/Querier (microformats-reltag)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
> (parse-rtf)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Language Identification
> Parser/Filter (language-identifier)
> 2007-01-11 14:03:30,987 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
> (parse-msword)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
> (parse-text)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
> (analysis-de)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
> (urlnormalizer-regex)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
> Parse Plug-in (parse-oo)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
> (urlfilter-automaton)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
> Summary Plug-in (summary-lucene)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
> and query filter (subcollection)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> Framework (lib-regex-filter)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
> (lib-lucene-analyzers)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
> (index-basic)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
> Plug-in (summary-basic)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> (urlfilter-regex)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
> (parse-ext)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
> (protocol-http)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     the nutch core
> extension points (nutch-extensionpoints)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
> (index-more)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Query Filter
> (query-more)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
> (lib-nekohtml)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
> (urlfilter-prefix)
> 2007-01-11 14:03:31,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
> Plug-in (parse-mspowerpoint)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
> (urlnormalizer-basic)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
> Normalizer (urlnormalizer-pass)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
> Client (lib-commons-httpclient)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
> (protocol-file)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
> To Access Microsoft Format Files (lib-jakarta-poi)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
> (query-basic)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
> Framework (lib-parsems)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
> (parse-rss)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
> (scoring-opic)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
> (org.apache.nutch.parse.Parser)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
> (org.apache.nutch.ontology.Ontology)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> 2007-01-11 14:03:31,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> 2007-01-11 14:03:31,065 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> suffix-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
> 2007-01-11 14:03:31,065 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> automaton-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
> 2007-01-11 14:03:31,456 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> crawl-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
> 2007-01-11 14:03:31,472 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
> not found
> 2007-01-11 14:03:31,487 WARN  regex.RegexURLNormalizer
> (RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for
> scope 'inject', using default
> 2007-01-11 14:03:31,487 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) - C:/wkspc/nutch_trunk/urls/nutch:0+33
> 2007-01-11 14:03:31,503 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:31,503 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:31,503 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_qo4f9q.xml
> 2007-01-11 14:03:31,518 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:31,534 INFO  mapred.JobClient
> (JobClient.java:runJob(385)) -  map 100% reduce 0%
> 2007-01-11 14:03:31,753 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) - reduce > reduce
> 2007-01-11 14:03:32,534 INFO  mapred.JobClient
> (JobClient.java:runJob(401)) - Job complete: job_qo4f9q
> 2007-01-11 14:03:32,534 INFO  crawl.Injector (Injector.java:inject(163))
> - Injector: Merging injected urls into crawl db.
> 2007-01-11 14:03:32,534 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:32,534 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:32,534 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:32,550 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,550 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,581 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:32,597 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:32,597 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:32,597 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,612 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,612 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,628 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:32,628 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,628 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
> 2007-01-11 14:03:32,628 INFO  mapred.JobClient
> (JobClient.java:runJob(370)) - Running job: job_xiod9g
> 2007-01-11 14:03:32,643 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:32,643 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,643 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
> 2007-01-11 14:03:32,643 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,675 INFO  mapred.MapTask (MapTask.java:run(155)) -
> opened part-0.out
> 2007-01-11 14:03:32,675 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) -
> C:/tmp/hadoop-tbenke/mapred/temp/inject-temp-2045807797/part-00000:0+82
> 2007-01-11 14:03:32,690 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:32,706 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,706 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_xiod9g.xml
> 2007-01-11 14:03:32,706 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:32,722 INFO  plugin.PluginRepository
> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
> mode: [true]
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Creative Commons
> Plugins (creativecommons)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
> (query-site)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
> Plug-in (protocol-httpclient)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
> (parse-html)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
> (parse-pdf)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
> (parse-msexcel)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
> (parse-js)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
> (query-url)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
> (parse-swf)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
> (protocol-ftp)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
> (analysis-fr)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
> (parse-mp3)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
> (parse-zip)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Online Search Results
> Clustering using Carrot2's Lingo component (clustering-carrot2)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
> (urlfilter-suffix)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
> Parser/Indexer/Querier (microformats-reltag)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
> (parse-rtf)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Language Identification
> Parser/Filter (language-identifier)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
> (parse-msword)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
> (parse-text)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
> (analysis-de)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
> (urlnormalizer-regex)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
> Parse Plug-in (parse-oo)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
> (urlfilter-automaton)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
> Summary Plug-in (summary-lucene)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
> and query filter (subcollection)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> Framework (lib-regex-filter)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
> (lib-lucene-analyzers)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
> (index-basic)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
> Plug-in (summary-basic)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> (urlfilter-regex)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
> (parse-ext)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
> (protocol-http)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     the nutch core
> extension points (nutch-extensionpoints)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
> (index-more)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Query Filter
> (query-more)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
> (lib-nekohtml)
> 2007-01-11 14:03:33,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
> (urlfilter-prefix)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
> Plug-in (parse-mspowerpoint)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
> (urlnormalizer-basic)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
> Normalizer (urlnormalizer-pass)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
> Client (lib-commons-httpclient)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
> (protocol-file)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
> To Access Microsoft Format Files (lib-jakarta-poi)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
> (query-basic)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
> Framework (lib-parsems)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
> (parse-rss)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
> (scoring-opic)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
> (org.apache.nutch.parse.Parser)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
> (org.apache.nutch.ontology.Ontology)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> 2007-01-11 14:03:33,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> 2007-01-11 14:03:33,143 WARN  util.NativeCodeLoader
> (NativeCodeLoader.java:<clinit>(50)) - Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 2007-01-11 14:03:33,175 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) - reduce > reduce
> 2007-01-11 14:03:33,628 INFO  mapred.JobClient
> (JobClient.java:runJob(401)) - Job complete: job_xiod9g
> 2007-01-11 14:03:33,659 INFO  crawl.Injector (Injector.java:inject(173))
> - Injector: done
> 2007-01-11 14:03:34,659 INFO  crawl.Generator
> (Generator.java:generate(371)) - Generator: Selecting best-scoring urls
> due for fetch.
> 2007-01-11 14:03:34,659 INFO  crawl.Generator
> (Generator.java:generate(372)) - Generator: starting
> 2007-01-11 14:03:34,659 INFO  crawl.Generator
> (Generator.java:generate(373)) - Generator: segment:
> crawl/segments/20070111140334
> 2007-01-11 14:03:34,659 INFO  crawl.Generator
> (Generator.java:generate(374)) - Generator: filtering: false
> 2007-01-11 14:03:34,659 INFO  crawl.Generator
> (Generator.java:generate(376)) - Generator: topN: 50
> 2007-01-11 14:03:34,659 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:34,659 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:34,675 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:34,675 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,675 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,675 INFO  crawl.Generator
> (Generator.java:generate(388)) - Generator: jobtracker is 'local',
> generating exactly one partition.
> 2007-01-11 14:03:34,706 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:34,722 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:34,722 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:34,737 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,737 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,737 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,737 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:34,753 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,753 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
> 2007-01-11 14:03:34,753 INFO  mapred.JobClient
> (JobClient.java:runJob(370)) - Running job: job_m7h3ig
> 2007-01-11 14:03:34,753 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:34,768 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,768 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
> 2007-01-11 14:03:34,784 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:34,784 INFO  mapred.MapTask (MapTask.java:run(155)) -
> opened part-0.out
> 2007-01-11 14:03:34,784 INFO  plugin.PluginRepository
> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
> mode: [true]
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Creative Commons
> Plugins (creativecommons)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
> (query-site)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
> Plug-in (protocol-httpclient)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
> (parse-html)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
> (parse-pdf)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
> (parse-msexcel)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
> (parse-js)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
> (query-url)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
> (parse-swf)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
> (protocol-ftp)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
> (analysis-fr)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
> (parse-mp3)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
> (parse-zip)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Online Search Results
> Clustering using Carrot2's Lingo component (clustering-carrot2)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
> (urlfilter-suffix)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
> Parser/Indexer/Querier (microformats-reltag)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
> (parse-rtf)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Language Identification
> Parser/Filter (language-identifier)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
> (parse-msword)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
> (parse-text)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
> (analysis-de)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
> (urlnormalizer-regex)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
> Parse Plug-in (parse-oo)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
> (urlfilter-automaton)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
> Summary Plug-in (summary-lucene)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
> and query filter (subcollection)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> Framework (lib-regex-filter)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
> (lib-lucene-analyzers)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
> (index-basic)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
> Plug-in (summary-basic)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> (urlfilter-regex)
> 2007-01-11 14:03:35,003 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
> (parse-ext)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
> (protocol-http)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     the nutch core
> extension points (nutch-extensionpoints)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
> (index-more)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Query Filter
> (query-more)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
> (lib-nekohtml)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
> (urlfilter-prefix)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
> Plug-in (parse-mspowerpoint)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
> (urlnormalizer-basic)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
> Normalizer (urlnormalizer-pass)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
> Client (lib-commons-httpclient)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
> (protocol-file)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
> To Access Microsoft Format Files (lib-jakarta-poi)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
> (query-basic)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
> Framework (lib-parsems)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
> (parse-rss)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
> (scoring-opic)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
> (org.apache.nutch.parse.Parser)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
> (org.apache.nutch.ontology.Ontology)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> 2007-01-11 14:03:35,018 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> 2007-01-11 14:03:35,018 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> suffix-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
> 2007-01-11 14:03:35,018 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> automaton-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
> 2007-01-11 14:03:35,128 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> crawl-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
> 2007-01-11 14:03:35,128 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
> not found
> 2007-01-11 14:03:35,143 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) -
> C:/wkspc/nutch_trunk/crawl/crawldb/current/part-00000/data:0+125
> 2007-01-11 14:03:35,159 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:35,175 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,175 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_m7h3ig.xml
> 2007-01-11 14:03:35,175 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,190 INFO  plugin.PluginRepository
> (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in:
> C:\wkspc\nutch_trunk\tmpBuild\src\plugin
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation
> mode: [true]
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(310)) - Registered Plugins:
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Creative Commons
> Plugins (creativecommons)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Site Query Filter
> (query-site)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http / Https Protocol
> Plug-in (protocol-httpclient)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Html Parse Plug-in
> (parse-html)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pdf Parse Plug-in
> (parse-pdf)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSExcel Parse Plug-in
> (parse-msexcel)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     JavaScript Parser
> (parse-js)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     URL Query Filter
> (query-url)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     SWF Parse Plug-in
> (parse-swf)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Log4j (lib-log4j)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ontology Plug-in (ontology)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Ftp Protocol Plug-in
> (protocol-ftp)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     French Analysis Plug-in
> (analysis-fr)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MP3 Parse Plug-in
> (parse-mp3)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Zip Parse Plug-in
> (parse-zip)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Online Search Results
> Clustering using Carrot2's Lingo component (clustering-carrot2)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Suffix URL Filter
> (urlfilter-suffix)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Rel-Tag microformat
> Parser/Indexer/Querier (microformats-reltag)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RTF Parse Plug-in
> (parse-rtf)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Language Identification
> Parser/Filter (language-identifier)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSWord Parse Plug-in
> (parse-msword)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Text Parse Plug-in
> (parse-text)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     German Analysis Plug-in
> (analysis-de)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Normalizer
> (urlnormalizer-regex)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OpenOffice/OpenDocument
> Parse Plug-in (parse-oo)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Automaton URL Filter
> (urlfilter-automaton)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Highlighter
> Summary Plug-in (summary-lucene)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Subcollection indexing
> and query filter (subcollection)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> Framework (lib-regex-filter)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Lucene Analysers
> (lib-lucene-analyzers)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Indexing Filter
> (index-basic)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Summarizer
> Plug-in (summary-basic)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Regex URL Filter
> (urlfilter-regex)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     HTTP Framework (lib-http)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     External Parser Plug-in
> (parse-ext)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Http Protocol Plug-in
> (protocol-http)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     the nutch core
> extension points (nutch-extensionpoints)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Indexing Filter
> (index-more)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     More Query Filter
> (query-more)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     CyberNeko HTML Parser
> (lib-nekohtml)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Prefix URL Filter
> (urlfilter-prefix)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     MSPowerPoint Parse
> Plug-in (parse-mspowerpoint)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic URL Normalizer
> (urlnormalizer-basic)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Pass-through URL
> Normalizer (urlnormalizer-pass)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta Commons HTTP
> Client (lib-commons-httpclient)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     File Protocol Plug-in
> (protocol-file)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Jakarta POI - Java API
> To Access Microsoft Format Files (lib-jakarta-poi)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Basic Query Filter
> (query-basic)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     XML Libraries (lib-xml)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     Parse MS Documents
> Framework (lib-parsems)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     RSS Parse Plug-in
> (parse-rss)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(316)) -     OPIC Scoring Plug-in
> (scoring-opic)
> 2007-01-11 14:03:35,394 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(320)) - Registered Extension-Points:
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Online Search
> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Content Parser
> (org.apache.nutch.parse.Parser)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Ontology Model Loader
> (org.apache.nutch.ontology.Ontology)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> 2007-01-11 14:03:35,409 INFO  plugin.PluginRepository
> (PluginRepository.java:displayStatus(325)) -     Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> 2007-01-11 14:03:35,409 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> suffix-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/suffix-urlfilter.txt
> 2007-01-11 14:03:35,409 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> automaton-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/automaton-urlfilter.txt
> 2007-01-11 14:03:35,519 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(441)) - found resource
> crawl-urlfilter.txt at
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-urlfilter.txt
> 2007-01-11 14:03:35,519 INFO  conf.Configuration
> (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt
> not found
> 2007-01-11 14:03:35,706 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:progress(169)) - reduce > reduce
> 2007-01-11 14:03:35,753 INFO  mapred.JobClient
> (JobClient.java:runJob(401)) - Job complete: job_m7h3ig
> 2007-01-11 14:03:35,753 WARN  crawl.Generator
> (Generator.java:generate(419)) - Generator: 0 records selected for
> fetching, exiting ...
> 2007-01-11 14:03:35,753 INFO  crawl.Crawl (Crawl.java:main(121)) -
> Stopping at depth=0 - no more URLs to fetch.
> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(219)) -
> LinkDb: starting
> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(220)) -
> LinkDb: linkdb: crawl/linkdb
> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(221)) -
> LinkDb: URL normalize: true
> 2007-01-11 14:03:35,769 INFO  crawl.LinkDb (LinkDb.java:invert(222)) -
> LinkDb: URL filter: true
> 2007-01-11 14:03:35,769 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:35,769 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:35,784 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:35,784 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,784 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,800 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:35,800 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/nutch-default.xml
> 2007-01-11 14:03:35,815 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> file:/C:/wkspc/nutch_trunk/tmpBuild/crawl-tool.xml
> 2007-01-11 14:03:35,815 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,815 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,815 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,831 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/hadoop-default.xml
> 2007-01-11 14:03:35,831 INFO  conf.Configuration
> (Configuration.java:loadResource(495)) - parsing
> jar:file:/C:/wkspc/nutch_trunk/lib/hadoop-0.9.1.jar!/mapred-default.xml
> 2007-01-11 14:03:35,847 INFO  conf.Configuration
> (Configuration.java:loadResource(504)) - parsing
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xml
> 2007-01-11 14:03:35,847 INFO  mapred.JobClient
> (JobClient.java:runJob(370)) - Running job: job_kumfin
> 2007-01-11 14:03:35,847 WARN  mapred.LocalJobRunner
> (LocalJobRunner.java:run(147)) - job_kumfin
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-tbenke/mapred/local/localRunner/job_kumfin.xmlfinal:
> hadoop-site.xml
>     at
> org.apache.hadoop.mapred.InputFormatBase.listPaths(InputFormatBase.java:99)
>     at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(SequenceFileInputFormat.java:39)
>     at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:119)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:93)
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>     at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:209)
>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:131)
>