You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "McGibbney, Lewis John" <Le...@gcu.ac.uk> on 2011/04/18 12:47:41 UTC
Indexing from Nutch crawl
Hi list,
I am using Nutch-1.3 branch, which I checked out today to crawl a couple of urls in local mode. I have been using Solr Solr 1.4.1 within my web app but I am running into some problems during the indexing stages. I have three commands getting sent to Solr these are
echo "----- SolrIndex (Step 4 of $steps) -----"
$NUTCH_HOME/bin/nutch solrindex http://localhost:8080/wombra/data crawl/crawldb crawl/linkdb crawl/segments/*
echo "----- SolrDedup (Step 5 of $steps) -----"
$NUTCH_HOME/bin/nutch solrdedup http://localhost:8080/wombra/data
echo "----- SolrClean (Step 6 of $steps) -----"
$NUTCH_HOME/bin/nutch solrclean crawl/crawldb http://localhost:8080/wombra/data
The solrindex command is failing with SolrException: No Found
solrdedup appears to be working fine, the same could be said for solrclean
I have been monitoring threads on the Nutch list, but thought I would have a crack at the Solr list for any suggestions to how I can solve the errors I am seeing from my log output.
Thank you
Lewis
Here is my hadoop.log output
2011-04-18 11:27:05,480 INFO solr.SolrIndexer - SolrIndexer: starting at 2011-04-18 11:27:05
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418111549
2011-04-18 11:27:05,656 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418111603
...
some more
...
2011-04-18 11:27:09,966 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,966 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: content dest: content
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: site dest: site
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: title dest: title
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: host dest: host
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: segment dest: segment
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: boost dest: boost
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: digest dest: digest
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest: id
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest: url
2011-04-18 11:27:10,394 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2011-04-18 11:27:11,869 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2011-04-18 11:27:11
2011-04-18 11:27:11,870 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8080/wombra/data
2011-04-18 11:27:13,048 INFO solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13
2011-04-18 11:27:13,888 INFO solr.SolrClean - SolrClean: deleting 5 documents
2011-04-18 11:27:13,992 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:115)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Indexing from Nutch crawl
Posted by Markus Jelsma <ma...@openindex.io>.
And you are really sure there's a Solr instance runnning having an update
handler at : http://localhost:8080/wombra/data/update ? Anyway, your URL is
somewhat uncommon in Solr land. It's usually something like:
http://<host>:<port>/solr/[<core>]/update/
On Monday 18 April 2011 14:03:53 McGibbney, Lewis John wrote:
> Hi Markus,
>
> hadoop.log from beginning of solr commands as follows
>
> 2011-04-18 11:27:05,480 INFO solr.SolrIndexer - SolrIndexer: starting at
> 2011-04-18 11:27:05 2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce
> - IndexerMapReduce: crawldb: crawl/crawldb 2011-04-18 11:27:05,562 INFO
> indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
> 2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111549
> 2011-04-18 11:27:05,656 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111603
> 2011-04-18 11:27:05,660 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418112359
> 2011-04-18 11:27:05,661 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418112526
> 2011-04-18 11:27:06,065 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable 2011-04-18 11:27:06,282 INFO plugin.PluginRepository -
> Plugins: looking in: /home/lewis/branch-1.3/runtime/local/plugins
> 2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Plugin
> Auto-activation mode: [true] 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Registered Plugins: 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - the nutch core extension points
> (nutch-extensionpoints) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Basic URL Normalizer
> (urlnormalizer-basic) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Html Parse Plug-in (parse-html)
> 2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Basic
> Indexing Filter (index-basic) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - HTTP Framework (lib-http) 2011-04-18
> 11:27:06,396 INFO plugin.PluginRepository - Pass-through URL
> Normalizer (urlnormalizer-pass) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Regex URL Filter (urlfilter-regex)
> 2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Http
> Protocol Plug-in (protocol-http) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Regex URL Normalizer
> (urlnormalizer-regex) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Tika Parser Plug-in (parse-tika)
> 2011-04-18 11:27:06,396 INFO plugin.PluginRepository - OPIC
> Scoring Plug-in (scoring-opic) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
> 2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Anchor
> Indexing Filter (index-anchor) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Regex URL Filter Framework
> (lib-regex-filter) 2011-04-18 11:27:06,396 INFO plugin.PluginRepository -
> Registered Extension-Points: 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch Protocol
> (org.apache.nutch.protocol.Protocol) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch Segment Merge Filter
> (org.apache.nutch.segment.SegmentMergeFilter) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch URL Filter
> (org.apache.nutch.net.URLFilter) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch Content Parser
> (org.apache.nutch.parse.Parser) 2011-04-18 11:27:06,396 INFO
> plugin.PluginRepository - Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter) 2011-04-18 11:27:06,399 INFO
> indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,401
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,571 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,571
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,727 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,727
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,890 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,890
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,085 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,085
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,287 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,288
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,531 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,531
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,754 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,754
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,949 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,949
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,150 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,151
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,427 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,428
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,644 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,644
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,853 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,855
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,055 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,055
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,279 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,279
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,492 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,494
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,699 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,699
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,904 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,905
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,966 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,966
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:10,021 INFO solr.SolrMappingReader - source: content dest: content
> 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: site dest:
> site 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: title
> dest: title 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source:
> host dest: host 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader -
> source: segment dest: segment 2011-04-18 11:27:10,021 INFO
> solr.SolrMappingReader - source: boost dest: boost 2011-04-18 11:27:10,021
> INFO solr.SolrMappingReader - source: digest dest: digest 2011-04-18
> 11:27:10,021 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest:
> id 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url
> dest: url 2011-04-18 11:27:10,394 WARN mapred.LocalJobRunner -
> job_local_0001 org.apache.solr.common.SolrException: Not Found
>
> Not Found
>
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.j
> ava:48) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed! 2011-04-18 11:27:11,869 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: starting at 2011-04-18 11:27:11 2011-04-18
> 11:27:11,870 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr
> url: http://localhost:8080/wombra/data 2011-04-18 11:27:13,048 INFO
> solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13 2011-04-18
> 11:27:13,888 INFO solr.SolrClean - SolrClean: deleting 5 documents
> 2011-04-18 11:27:13,992 WARN mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Not Found
>
> Not Found
>
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:1
> 15) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>
>
> ________________________________________
> From: Markus Jelsma [markus.jelsma@openindex.io]
> Sent: 18 April 2011 11:59
> To: solr-user@lucene.apache.org
> Cc: McGibbney, Lewis John
> Subject: Re: Indexing from Nutch crawl
>
> Can you include hadoop.log output? Likely the other commands fail as well
> but don't write the exception to stdout.
>
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
RE: Indexing from Nutch crawl
Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
Hi Markus,
hadoop.log from beginning of solr commands as follows
2011-04-18 11:27:05,480 INFO solr.SolrIndexer - SolrIndexer: starting at 2011-04-18 11:27:05
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418111549
2011-04-18 11:27:05,656 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418111603
2011-04-18 11:27:05,660 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418112359
2011-04-18 11:27:05,661 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20110418112526
2011-04-18 11:27:06,065 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2011-04-18 11:27:06,282 INFO plugin.PluginRepository - Plugins: looking in: /home/lewis/branch-1.3/runtime/local/plugins
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true]
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Registered Plugins:
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - HTTP Framework (lib-http)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Registered Extension-Points:
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)
2011-04-18 11:27:06,396 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2011-04-18 11:27:06,399 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,401 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,571 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,571 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,727 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,727 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,890 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,890 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,085 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,085 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,287 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,288 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,531 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,531 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,754 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,754 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,949 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,949 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,150 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,151 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,427 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,428 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,644 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,644 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,853 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,855 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,055 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,055 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,279 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,279 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,492 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,494 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,699 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,699 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,905 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,966 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,966 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: content dest: content
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: site dest: site
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: title dest: title
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: host dest: host
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: segment dest: segment
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: boost dest: boost
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: digest dest: digest
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest: id
2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest: url
2011-04-18 11:27:10,394 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2011-04-18 11:27:11,869 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2011-04-18 11:27:11
2011-04-18 11:27:11,870 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8080/wombra/data
2011-04-18 11:27:13,048 INFO solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13
2011-04-18 11:27:13,888 INFO solr.SolrClean - SolrClean: deleting 5 documents
2011-04-18 11:27:13,992 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:115)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
________________________________________
From: Markus Jelsma [markus.jelsma@openindex.io]
Sent: 18 April 2011 11:59
To: solr-user@lucene.apache.org
Cc: McGibbney, Lewis John
Subject: Re: Indexing from Nutch crawl
Can you include hadoop.log output? Likely the other commands fail as well but
don't write the exception to stdout.
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Indexing from Nutch crawl
Posted by Markus Jelsma <ma...@openindex.io>.
Can you include hadoop.log output? Likely the other commands fail as well but
don't write the exception to stdout.
On Monday 18 April 2011 12:47:41 McGibbney, Lewis John wrote:
> Hi list,
>
> I am using Nutch-1.3 branch, which I checked out today to crawl a couple of
> urls in local mode. I have been using Solr Solr 1.4.1 within my web app
> but I am running into some problems during the indexing stages. I have
> three commands getting sent to Solr these are
>
> echo "----- SolrIndex (Step 4 of $steps) -----"
> $NUTCH_HOME/bin/nutch solrindex http://localhost:8080/wombra/data
> crawl/crawldb crawl/linkdb crawl/segments/*
>
> echo "----- SolrDedup (Step 5 of $steps) -----"
> $NUTCH_HOME/bin/nutch solrdedup http://localhost:8080/wombra/data
>
> echo "----- SolrClean (Step 6 of $steps) -----"
> $NUTCH_HOME/bin/nutch solrclean crawl/crawldb
> http://localhost:8080/wombra/data
>
> The solrindex command is failing with SolrException: No Found
> solrdedup appears to be working fine, the same could be said for solrclean
>
> I have been monitoring threads on the Nutch list, but thought I would have
> a crack at the Solr list for any suggestions to how I can solve the errors
> I am seeing from my log output.
>
> Thank you
>
> Lewis
>
> Here is my hadoop.log output
>
> 2011-04-18 11:27:05,480 INFO solr.SolrIndexer - SolrIndexer: starting at
> 2011-04-18 11:27:05 2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce
> - IndexerMapReduce: crawldb: crawl/crawldb 2011-04-18 11:27:05,562 INFO
> indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
> 2011-04-18 11:27:05,562 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111549
> 2011-04-18 11:27:05,656 INFO indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111603 ...
> some more
> ...
> 2011-04-18 11:27:09,966 INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,966
> INFO indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:10,021 INFO solr.SolrMappingReader - source: content dest: content
> 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: site dest:
> site 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: title
> dest: title 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source:
> host dest: host 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader -
> source: segment dest: segment 2011-04-18 11:27:10,021 INFO
> solr.SolrMappingReader - source: boost dest: boost 2011-04-18 11:27:10,021
> INFO solr.SolrMappingReader - source: digest dest: digest 2011-04-18
> 11:27:10,021 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url dest:
> id 2011-04-18 11:27:10,021 INFO solr.SolrMappingReader - source: url
> dest: url 2011-04-18 11:27:10,394 WARN mapred.LocalJobRunner -
> job_local_0001 org.apache.solr.common.SolrException: Not Found
>
> Not Found
>
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.j
> ava:48) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed! 2011-04-18 11:27:11,869 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: starting at 2011-04-18 11:27:11 2011-04-18
> 11:27:11,870 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr
> url: http://localhost:8080/wombra/data 2011-04-18 11:27:13,048 INFO
> solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13 2011-04-18
> 11:27:13,888 INFO solr.SolrClean - SolrClean: deleting 5 documents
> 2011-04-18 11:27:13,992 WARN mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Not Found
>
> Not Found
>
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:1
> 15) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: Indexing from Nutch crawl
Posted by ramires <uy...@beriltech.com>.
hi markus
i misunderstood before.
i use nutch.1.2-rc4 with solr-.4.0 trunk. You just need replace these files
apache-solr-core-4.0-SNAPSHOT.jar
apache-solr-solrj-4.0-SNAPSHOT.jar
which are in solr/dist directory with nutch 1.4.1 solrj and core.
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-from-Nutch-crawl-tp2833862p2834556.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing from Nutch crawl
Posted by Markus Jelsma <ma...@openindex.io>.
There is no problem with your files. Nutch still ships SolrJ 1.4.1. If you
would be using Solr 3.1 you would get a javabin error and not a Not Found
error.
On Monday 18 April 2011 15:37:42 McGibbney, Lewis John wrote:
> Hi Ramires,
>
> I have been using Solr 1.4.1
>
> My understanding from the example solrconfig.xml is that jar's will be
> loaded from the /lib directory. I do not have a /dist directory as I have
> copied the example directory as my solr home directory therefore I have
> commented out these entires in the solrconfig.xml.
>
> Can you elaborate any on your comment below please as I may be missing your
> point.
>
> Thank you Lewis
>
>
> ________________________________________
> From: ramires [uygar@beriltech.com]
> Sent: 18 April 2011 13:40
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing from Nutch crawl
>
> This is a problem of these files in nutch lib. You can easily change these
> files with in solr dist directory.
>
> apache-solr-core-1.4.0.jar
> apache-solr-solrj-1.4.0.jar
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-from-Nutch-crawl-tp2833862p283
> 4270.html Sent from the Solr - User mailing list archive at Nabble.com.
>
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
RE: Indexing from Nutch crawl
Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
Hi Ramires,
I have been using Solr 1.4.1
My understanding from the example solrconfig.xml is that jar's will be loaded from the /lib directory. I do not have a /dist directory as I have copied the example directory as my solr home directory therefore I have commented out these entires in the solrconfig.xml.
Can you elaborate any on your comment below please as I may be missing your point.
Thank you Lewis
________________________________________
From: ramires [uygar@beriltech.com]
Sent: 18 April 2011 13:40
To: solr-user@lucene.apache.org
Subject: Re: Indexing from Nutch crawl
This is a problem of these files in nutch lib. You can easily change these
files with in solr dist directory.
apache-solr-core-1.4.0.jar
apache-solr-solrj-1.4.0.jar
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-from-Nutch-crawl-tp2833862p2834270.html
Sent from the Solr - User mailing list archive at Nabble.com.
Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Indexing from Nutch crawl
Posted by ramires <uy...@beriltech.com>.
This is a problem of these files in nutch lib. You can easily change these
files with in solr dist directory.
apache-solr-core-1.4.0.jar
apache-solr-solrj-1.4.0.jar
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-from-Nutch-crawl-tp2833862p2834270.html
Sent from the Solr - User mailing list archive at Nabble.com.