You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "McGibbney, Lewis John" <Le...@gcu.ac.uk> on 2011/02/10 00:36:21 UTC

Index with Solr to my own webapp

Hi list,

I am at Solr indexing stage and seem to have hit trouble when sending crawldb linkdb and segments/* to Solr to be indexed. I have added xml file to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics. My Solr 1.4.1 implementation resides within my web app at following location /home/lewis/Downloads/mywebapp but when I send this command to index with Solr

lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb crawl/segments/*

I am getting java.io.IOException: Job failed!

I had experienced this before when I was using the Solrindex command incorrectly, I am hoping that this is not the case, however, it is late and I might have missed something simple.

I have Hadoop.log if this would help at all.

Any suggestions please. Thanks

Lewis

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: Index with Solr to my own webapp

Posted by Markus Jelsma <ma...@openindex.io>.

Nutch is looking good. Is Solr running at all? The example doesn't have a log 
output, it jut writes to stdout when running. If you're running under Tomcat 
Solr logs are by default written to catalina.out.

On Thursday 10 February 2011 14:31:27 McGibbney, Lewis John wrote:
> Hi Markus
> 
> Ok first is first, here is Hadoop.log
> 
> 2011-02-09 23:24:11,826 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-02-09 23:24:11,828
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-02-09
> 23:24:11,875 INFO  solr.SolrMappingReader - source: content dest: content
> 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: site dest:
> site 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: title
> dest: title 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source:
> host dest: host 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader -
> source: segment dest: segment 2011-02-09 23:24:11,876 INFO 
> solr.SolrMappingReader - source: boost dest: boost 2011-02-09 23:24:11,876
> INFO  solr.SolrMappingReader - source: digest dest: digest 2011-02-09
> 23:24:11,876 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url dest:
> id 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url
> dest: url 2011-02-09 23:24:13,626 WARN  mapred.LocalJobRunner -
> job_local_0001 org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
>         at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:424) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:243) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j
> ava:54) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j
> ava:44) at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:440) at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159
> ) at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-02-09 23:24:14,128 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!
> 
> I am unsure of where to get Solr output as I have been unable to progress
> past the stage above. I have been indexing directly from Nutch to vanilla
> Solr 1.4.1 dist, but this is my first attempt at indexing to my own app.
> Within my web app I have added following dirs:
> 
> bin (empty)
> conf (usual nutch schema, solrconfig with Nutch requestHandler, scripts,
> synonyms, etc) data (index and spellchecker dirs! Each containing
> segments.gen and segments_1) dist (as per 1.4.1 solr version)
> lib (as above)
> 
> I hope that this is sufficient
> 
> Lewis
> ________________________________________
> From: Markus Jelsma [markus.jelsma@openindex.io]
> Sent: 10 February 2011 10:58
> To: user@nutch.apache.org
> Cc: McGibbney, Lewis John
> Subject: Re: Index with Solr to my own webapp
> 
> Yes, please show us the hadoop.log output and the Solr output. The latter
> is in this stage usually more important. You might write to not-existing
> fields or writing multiple values to a single valued field or...
> whatever's happening.
> 
> On Thursday 10 February 2011 00:36:21 McGibbney, Lewis John wrote:
> > Hi list,
> > 
> > I am at Solr indexing stage and seem to have hit trouble when sending
> > crawldb linkdb and segments/* to Solr to be indexed. I have added xml
> > file to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics.
> > My Solr 1.4.1 implementation resides within my web app at following
> > location /home/lewis/Downloads/mywebapp but when I send this command to
> > index with Solr
> > 
> > lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex
> > http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb
> > crawl/segments/*
> > 
> > I am getting java.io.IOException: Job failed!
> > 
> > I had experienced this before when I was using the Solrindex command
> > incorrectly, I am hoping that this is not the case, however, it is late
> > and I might have missed something simple.
> > 
> > I have Hadoop.log if this would help at all.
> > 
> > Any suggestions please. Thanks
> > 
> > Lewis
> > 
> > Glasgow Caledonian University is a registered Scottish charity, number
> > SC021474
> > 
> > Winner: Times Higher Education’s Widening Participation Initiative of the
> > Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219
> > , en.html
> > 
> > Winner: Times Higher Education’s Outstanding Support for Early Career
> > Researchers of the Year 2010, GCU as a lead with Universities Scotland
> > partners.
> > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,1569
> > 1 ,en.html
> 
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

RE: Index with Solr to my own webapp

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.

Hi Markus

Ok first is first, here is Hadoop.log

2011-02-09 23:24:11,826 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-02-09 23:24:11,828 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-02-09 23:24:11,875 INFO  solr.SolrMappingReader - source: content dest: content
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: site dest: site
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: title dest: title
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: host dest: host
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: segment dest: segment
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: boost dest: boost
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: digest dest: digest
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url dest: id
2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url dest: url
2011-02-09 23:24:13,626 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found

Not Found

request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
        at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:54)
        at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:440)
        at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159)
        at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-02-09 23:24:14,128 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

I am unsure of where to get Solr output as I have been unable to progress past the stage above. I have been indexing directly from Nutch to vanilla Solr 1.4.1 dist, but this is my first attempt at indexing to my own app. Within my web app I have added following dirs:

bin (empty)
conf (usual nutch schema, solrconfig with Nutch requestHandler, scripts, synonyms, etc)
data (index and spellchecker dirs! Each containing segments.gen and segments_1)
dist (as per 1.4.1 solr version)
lib (as above)

I hope that this is sufficient

Lewis
________________________________________
From: Markus Jelsma [markus.jelsma@openindex.io]
Sent: 10 February 2011 10:58
To: user@nutch.apache.org
Cc: McGibbney, Lewis John
Subject: Re: Index with Solr to my own webapp

Yes, please show us the hadoop.log output and the Solr output. The latter is
in this stage usually more important. You might write to not-existing fields or
writing multiple values to a single valued field or... whatever's happening.

On Thursday 10 February 2011 00:36:21 McGibbney, Lewis John wrote:
> Hi list,
>
> I am at Solr indexing stage and seem to have hit trouble when sending
> crawldb linkdb and segments/* to Solr to be indexed. I have added xml file
> to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics. My
> Solr 1.4.1 implementation resides within my web app at following location
> /home/lewis/Downloads/mywebapp but when I send this command to index with
> Solr
>
> lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex
> http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb crawl/segments/*
>
> I am getting java.io.IOException: Job failed!
>
> I had experienced this before when I was using the Solrindex command
> incorrectly, I am hoping that this is not the case, however, it is late
> and I might have missed something simple.
>
> I have Hadoop.log if this would help at all.
>
> Any suggestions please. Thanks
>
> Lewis
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: Index with Solr to my own webapp

Posted by Markus Jelsma <ma...@openindex.io>.

Yes, please show us the hadoop.log output and the Solr output. The latter is 
in this stage usually more important. You might write to not-existing fields or 
writing multiple values to a single valued field or... whatever's happening.

On Thursday 10 February 2011 00:36:21 McGibbney, Lewis John wrote:
> Hi list,
> 
> I am at Solr indexing stage and seem to have hit trouble when sending
> crawldb linkdb and segments/* to Solr to be indexed. I have added xml file
> to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics. My
> Solr 1.4.1 implementation resides within my web app at following location
> /home/lewis/Downloads/mywebapp but when I send this command to index with
> Solr
> 
> lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex
> http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb crawl/segments/*
> 
> I am getting java.io.IOException: Job failed!
> 
> I had experienced this before when I was using the Solrindex command
> incorrectly, I am hoping that this is not the case, however, it is late
> and I might have missed something simple.
> 
> I have Hadoop.log if this would help at all.
> 
> Any suggestions please. Thanks
> 
> Lewis
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350