You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "McGibbney, Lewis John" <Le...@gcu.ac.uk> on 2011/01/12 19:52:01 UTC

SolrIndex problems

Hello list,

Having some problems when attempting to index to Solr (experimenting with running Solr.war on Tomcat instead of usual Jetty config as before). Using Nutch in singular mode (no Hadoop this time). Exception is present due to input path not existing, however as far as I am aware I have followed the usual procedure for updating, fetching URL's etc.

Nutch 1.2
Windows XP
Tomcat 6.0.26

Attached excerpt from Hadoop.log

2011-01-12 18:42:18,929 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:18,929 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:19,835 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:19,835 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:20,835 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:20,835 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:22,851 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:22,851 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:23,851 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:23,851 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:24,867 INFO  httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
2011-01-12 18:42:24,867 INFO  httpclient.HttpMethodDirector - Retrying request
2011-01-12 18:42:25,898 WARN  mapred.LocalJobRunner - job_local_0001
java.io.IOException
      at org.apache.nutch.indexer.solr.SolrWriter.makeIOException(SolrWriter.java:85)
      at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:80)
      at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
      at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused: connect
      at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
      at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
      at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
      at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
      at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75)
      ... 4 more
Caused by: java.net.ConnectException: Connection refused: connect
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
      at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
      at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
      at java.net.Socket.connect(Socket.java:529)
      at java.net.Socket.connect(Socket.java:478)
      at java.net.Socket.<init>(Socket.java:375)
      at java.net.Socket.<init>(Socket.java:249)
      at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
      at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
      at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
      at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
      at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
      at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
      at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
      at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
      at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
      ... 8 more
2011-01-12 18:42:26,836 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

Thank you
Lewis


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Re: SolrIndex problems

Posted by Gabriele Kahlout <ga...@mysimpatico.com>.

it worked. (just for the reference)



On Mon, Apr 18, 2011 at 11:06 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Yes. A fix was committed for NUTCH-980 in 1.3 wednesday or thursday.
>
> > Just to be crystal clear, I should update my nutch-1.3 version[1]? I
> > thought it was an issue with Solr. Are there any references to this
> issue,
> > to understand better what nutch got to do with it?
> >
> > [1] $ svn co
> > http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/nutch-1.3 $ cd
> > nutch-1.3
> > $ ant
> >
> > On Mon, Apr 18, 2011 at 10:52 AM, Markus Jelsma
> >
> > <ma...@openindex.io>wrote:
> > > This has been fixed a few days ago. Update your 1.3 export.
> > >
> > > > On Mon, Apr 18, 2011 at 10:39 AM, Klaus Tachtler <klaus@tachtler.net
> >
> > >
> > > wrote:
> > > > > Hi Gabriele,
> > > > >
> > > > > i had the same problem a few days ago, the answer was to delete
> date
> > > > > 'data' directory inside your solr installation. Under my
> installation
> > >
> > > it
> > >
> > > > > was /var/www/solr/data.
> > > >
> > > > that didn't do the trick for me. =(
> > > >
> > > > For the logs, I found this in hadoop.log:
> > > >
> > > > 2011-04-18 10:44:26,390 WARN  mapred.LocalJobRunner - job_local_0001
> > > > java.lang.IllegalAccessError: tried to access field
> > > > org.slf4j.impl.StaticLoggerBinder.SINGLETON from class
> > > > org.slf4j.LoggerFactory
> > > >
> > > >     at
> org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
> > > >     at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
> > > >     at
> > >
> > >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.<clinit>(CommonsH
> > > tt
> > >
> > > > pSolrServer.java:78) at
> > > > org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:44) at
> > >
> > >
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutpu
> > > tF
> > >
> > > > ormat.java:42) at
> > > >
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:433)
> > > >
> > > >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > > >     at
> > > >
> > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216
> > > > ) 2011-04-18 10:44:26,928 ERROR solr.SolrIndexer -
> java.io.IOException:
> > > > Job failed!
> > > >
> > > > > Then do it again --> $ bin/nutch solrindex
> > > > > http://localhost:8080/solrcrawl/crawldb/0
> > > > >
> > > > >  I'm now having the same problem but I'm not finding the problem
> yet.
> > > > >
> > > > >> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> > > > >> crawl/linkdb crawl/segments/0/20110418100309
> > > > >> SolrIndexer: starting at 2011-04-18 10:03:40
> > > > >> java.io.IOException: Job failed!
> > > > >
> > > > > Grüße
> > > > > Klaus.
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------------------------
> > > > > e-Mail  : klaus@tachtler.net
> > > > > Homepage: http://www.tachtler.net
> > > > > DokuWiki: http://www.dokuwiki.tachtler.net
> > > > > ------------------------------------------------
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: SolrIndex problems

Posted by Markus Jelsma <ma...@openindex.io>.

Yes. A fix was committed for NUTCH-980 in 1.3 wednesday or thursday.

> Just to be crystal clear, I should update my nutch-1.3 version[1]? I
> thought it was an issue with Solr. Are there any references to this issue,
> to understand better what nutch got to do with it?
> 
> [1] $ svn co
> http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/nutch-1.3 $ cd
> nutch-1.3
> $ ant
> 
> On Mon, Apr 18, 2011 at 10:52 AM, Markus Jelsma
> 
> <ma...@openindex.io>wrote:
> > This has been fixed a few days ago. Update your 1.3 export.
> > 
> > > On Mon, Apr 18, 2011 at 10:39 AM, Klaus Tachtler <kl...@tachtler.net>
> > 
> > wrote:
> > > > Hi Gabriele,
> > > > 
> > > > i had the same problem a few days ago, the answer was to delete date
> > > > 'data' directory inside your solr installation. Under my installation
> > 
> > it
> > 
> > > > was /var/www/solr/data.
> > > 
> > > that didn't do the trick for me. =(
> > > 
> > > For the logs, I found this in hadoop.log:
> > > 
> > > 2011-04-18 10:44:26,390 WARN  mapred.LocalJobRunner - job_local_0001
> > > java.lang.IllegalAccessError: tried to access field
> > > org.slf4j.impl.StaticLoggerBinder.SINGLETON from class
> > > org.slf4j.LoggerFactory
> > > 
> > >     at org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
> > >     at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
> > >     at
> > 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.<clinit>(CommonsH
> > tt
> > 
> > > pSolrServer.java:78) at
> > > org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:44) at
> > 
> > org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutpu
> > tF
> > 
> > > ormat.java:42) at
> > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:433)
> > > 
> > >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > >     at
> > > 
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216
> > > ) 2011-04-18 10:44:26,928 ERROR solr.SolrIndexer - java.io.IOException:
> > > Job failed!
> > > 
> > > > Then do it again --> $ bin/nutch solrindex
> > > > http://localhost:8080/solrcrawl/crawldb/0
> > > > 
> > > >  I'm now having the same problem but I'm not finding the problem yet.
> > > >  
> > > >> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> > > >> crawl/linkdb crawl/segments/0/20110418100309
> > > >> SolrIndexer: starting at 2011-04-18 10:03:40
> > > >> java.io.IOException: Job failed!
> > > > 
> > > > Grüße
> > > > Klaus.
> > > > 
> > > > --
> > > > 
> > > > ------------------------------------------------
> > > > e-Mail  : klaus@tachtler.net
> > > > Homepage: http://www.tachtler.net
> > > > DokuWiki: http://www.dokuwiki.tachtler.net
> > > > ------------------------------------------------

Re: SolrIndex problems

Posted by Gabriele Kahlout <ga...@mysimpatico.com>.

Just to be crystal clear, I should update my nutch-1.3 version[1]? I thought
it was an issue with Solr. Are there any references to this issue, to
understand better what nutch got to do with it?

[1] $ svn co http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/nutch-1.3
$ cd nutch-1.3
$ ant

On Mon, Apr 18, 2011 at 10:52 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> This has been fixed a few days ago. Update your 1.3 export.
>
> > On Mon, Apr 18, 2011 at 10:39 AM, Klaus Tachtler <kl...@tachtler.net>
> wrote:
> > > Hi Gabriele,
> > >
> > > i had the same problem a few days ago, the answer was to delete date
> > > 'data' directory inside your solr installation. Under my installation
> it
> > > was /var/www/solr/data.
> >
> > that didn't do the trick for me. =(
> >
> > For the logs, I found this in hadoop.log:
> >
> > 2011-04-18 10:44:26,390 WARN  mapred.LocalJobRunner - job_local_0001
> > java.lang.IllegalAccessError: tried to access field
> > org.slf4j.impl.StaticLoggerBinder.SINGLETON from class
> > org.slf4j.LoggerFactory
> >     at org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
> >     at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
> >     at
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.<clinit>(CommonsHtt
> > pSolrServer.java:78) at
> > org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:44) at
> >
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputF
> > ormat.java:42) at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:433)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> > 2011-04-18 10:44:26,928 ERROR solr.SolrIndexer - java.io.IOException: Job
> > failed!
> >
> > > Then do it again --> $ bin/nutch solrindex
> > > http://localhost:8080/solrcrawl/crawldb/0
> > >
> > >  I'm now having the same problem but I'm not finding the problem yet.
> > >
> > >> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> > >> crawl/linkdb crawl/segments/0/20110418100309
> > >> SolrIndexer: starting at 2011-04-18 10:03:40
> > >> java.io.IOException: Job failed!
> > >
> > > Grüße
> > > Klaus.
> > >
> > > --
> > >
> > > ------------------------------------------------
> > > e-Mail  : klaus@tachtler.net
> > > Homepage: http://www.tachtler.net
> > > DokuWiki: http://www.dokuwiki.tachtler.net
> > > ------------------------------------------------
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: SolrIndex problems

Posted by Markus Jelsma <ma...@openindex.io>.

This has been fixed a few days ago. Update your 1.3 export.

> On Mon, Apr 18, 2011 at 10:39 AM, Klaus Tachtler <kl...@tachtler.net> wrote:
> > Hi Gabriele,
> > 
> > i had the same problem a few days ago, the answer was to delete date
> > 'data' directory inside your solr installation. Under my installation it
> > was /var/www/solr/data.
> 
> that didn't do the trick for me. =(
> 
> For the logs, I found this in hadoop.log:
> 
> 2011-04-18 10:44:26,390 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.IllegalAccessError: tried to access field
> org.slf4j.impl.StaticLoggerBinder.SINGLETON from class
> org.slf4j.LoggerFactory
>     at org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
>     at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
>     at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.<clinit>(CommonsHtt
> pSolrServer.java:78) at
> org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:44) at
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputF
> ormat.java:42) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:433)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-04-18 10:44:26,928 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!
> 
> > Then do it again --> $ bin/nutch solrindex
> > http://localhost:8080/solrcrawl/crawldb/0
> > 
> >  I'm now having the same problem but I'm not finding the problem yet.
> >  
> >> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> >> crawl/linkdb crawl/segments/0/20110418100309
> >> SolrIndexer: starting at 2011-04-18 10:03:40
> >> java.io.IOException: Job failed!
> > 
> > Grüße
> > Klaus.
> > 
> > --
> > 
> > ------------------------------------------------
> > e-Mail  : klaus@tachtler.net
> > Homepage: http://www.tachtler.net
> > DokuWiki: http://www.dokuwiki.tachtler.net
> > ------------------------------------------------

Re: SolrIndex problems

Posted by Gabriele Kahlout <ga...@mysimpatico.com>.

On Mon, Apr 18, 2011 at 10:39 AM, Klaus Tachtler <kl...@tachtler.net> wrote:

> Hi Gabriele,
>
> i had the same problem a few days ago, the answer was to delete date 'data'
> directory inside your solr installation. Under my installation it was
> /var/www/solr/data.
>

that didn't do the trick for me. =(

For the logs, I found this in hadoop.log:

2011-04-18 10:44:26,390 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.IllegalAccessError: tried to access field
org.slf4j.impl.StaticLoggerBinder.SINGLETON from class
org.slf4j.LoggerFactory
    at org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
    at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
    at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.<clinit>(CommonsHttpSolrServer.java:78)
    at org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:44)
    at
org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:42)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:433)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-04-18 10:44:26,928 ERROR solr.SolrIndexer - java.io.IOException: Job
failed!


> Then do it again --> $ bin/nutch solrindex http://localhost:8080/solrcrawl/crawldb/0
>
>
>  I'm now having the same problem but I'm not finding the problem yet.
>>
>> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
>> crawl/linkdb crawl/segments/0/20110418100309
>> SolrIndexer: starting at 2011-04-18 10:03:40
>> java.io.IOException: Job failed!
>>
>
>
> Grüße
> Klaus.
>
> --
>
> ------------------------------------------------
> e-Mail  : klaus@tachtler.net
> Homepage: http://www.tachtler.net
> DokuWiki: http://www.dokuwiki.tachtler.net
> ------------------------------------------------
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

RE: SolrIndex problems

Posted by Klaus Tachtler <kl...@tachtler.net>.

Hi Gabriele,

i had the same problem a few days ago, the answer was to delete date  
'data' directory inside your solr installation. Under my installation  
it was /var/www/solr/data.

Then do it again --> $ bin/nutch solrindex http://localhost:8080/solr  
crawl/crawldb/0

> I'm now having the same problem but I'm not finding the problem yet.
>
> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> crawl/linkdb crawl/segments/0/20110418100309
> SolrIndexer: starting at 2011-04-18 10:03:40
> java.io.IOException: Job failed!


Grüße
Klaus.

--

------------------------------------------------
e-Mail  : klaus@tachtler.net
Homepage: http://www.tachtler.net
DokuWiki: http://www.dokuwiki.tachtler.net
------------------------------------------------

Re: SolrIndex problems

Posted by Markus Jelsma <ma...@openindex.io>.

Check Nutch and Solr log output, that's usually more helpful.

> I'm now having the same problem but I'm not finding the problem yet.
> 
> $ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
> crawl/linkdb crawl/segments/0/20110418100309
> SolrIndexer: starting at 2011-04-18 10:03:40
> java.io.IOException: Job failed!
> 
> But everything else seems to have worked[1]. I've tried the Jetty tutorial
> example and that worked too. I can access sorl through
> http://localhost:8080/solr/admin/ and even with curl[2].
> 
> Trying to debug I've modified SolrIndexer.java to add a few prints[3] and
> deploy again but still nothing gets printed. Any clues on how to fix?
> 
> $ cd $SOLR_HOME
> $ ant compile
> $ ant dist
> $CATALINA_HOME/bin/catalina.sh stop
> $ cp $SOLR_HOME/solr.war $CATALINA_HOME/webapps/solr.war
> $CATALINA_HOME/bin/catalina.sh start
> 
> [3]
> catch (final Exception e) {
> 	  e.printStackTrace();
> 		LOG.fatal("hello");
>       LOG.fatal("SolrIndexer: " + StringUtils.stringifyException(e));
>       return -1;
>     }
> 
> 
> [2]
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">217</int></lst>
> </response>
> 
> [1]
> $ bin/nutch readdb crawl/crawldb/0 -stats
> CrawlDb statistics start: crawl/crawldb/0
> Statistics for CrawlDb: crawl/crawldb/0
> TOTAL urls:	3
> retry 0:	3
> min score:	1.0
> avg score:	1.0
> max score:	1.0
> status 2 (db_fetched):	3
> CrawlDb statistics: done
> 
> bin/nutch parse crawl/segments/0/20110418100309
> ParseSegment: starting at 2011-04-18 10:03:28
> ParseSegment: segment: crawl/segments/0/20110418100309
> ParseSegment: finished at 2011-04-18 10:03:31, elapsed: 00:00:03
> 
> bin/nutch updatedb crawl/crawldb/0 crawl/segments/0/20110418100309
> CrawlDb update: starting at 2011-04-18 10:03:33
> CrawlDb update: db: crawl/crawldb/0
> CrawlDb update: segments: [crawl/segments/0/20110418100309]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: false
> CrawlDb update: URL filtering: false
> CrawlDb update: Merging segment data into db.
> CrawlDb update: finished at 2011-04-18 10:03:35, elapsed: 00:00:02
> 
> bin/nutch invertlinks crawl/linkdb -dir crawl/segments/0
> LinkDb: starting at 2011-04-18 10:03:37
> LinkDb: linkdb: crawl/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment:
> file:/Users/simpatico/nutch-1.3/runtime/local/crawl/segments/0/201104181003
> 09 LinkDb: finished at 2011-04-18 10:03:39, elapsed: 00:00:01
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrIndex-problems-tp2243073p2833462.ht
> ml Sent from the Nutch - User mailing list archive at Nabble.com.

RE: SolrIndex problems

Posted by Gabriele Kahlout <ga...@mysimpatico.com>.

I'm now having the same problem but I'm not finding the problem yet.

$ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
crawl/linkdb crawl/segments/0/20110418100309
SolrIndexer: starting at 2011-04-18 10:03:40
java.io.IOException: Job failed!

But everything else seems to have worked[1]. I've tried the Jetty tutorial
example and that worked too. I can access sorl through
http://localhost:8080/solr/admin/ and even with curl[2].

Trying to debug I've modified SolrIndexer.java to add a few prints[3] and
deploy again but still nothing gets printed. Any clues on how to fix?

$ cd $SOLR_HOME
$ ant compile
$ ant dist
$CATALINA_HOME/bin/catalina.sh stop
$ cp $SOLR_HOME/solr.war $CATALINA_HOME/webapps/solr.war
$CATALINA_HOME/bin/catalina.sh start

[3] 
catch (final Exception e) {
	  e.printStackTrace();
		LOG.fatal("hello");
      LOG.fatal("SolrIndexer: " + StringUtils.stringifyException(e));
      return -1;
    }


[2]
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">217</int></lst>
</response>

[1]
$ bin/nutch readdb crawl/crawldb/0 -stats
CrawlDb statistics start: crawl/crawldb/0
Statistics for CrawlDb: crawl/crawldb/0
TOTAL urls:	3
retry 0:	3
min score:	1.0
avg score:	1.0
max score:	1.0
status 2 (db_fetched):	3
CrawlDb statistics: done

bin/nutch parse crawl/segments/0/20110418100309
ParseSegment: starting at 2011-04-18 10:03:28
ParseSegment: segment: crawl/segments/0/20110418100309
ParseSegment: finished at 2011-04-18 10:03:31, elapsed: 00:00:03

bin/nutch updatedb crawl/crawldb/0 crawl/segments/0/20110418100309
CrawlDb update: starting at 2011-04-18 10:03:33
CrawlDb update: db: crawl/crawldb/0
CrawlDb update: segments: [crawl/segments/0/20110418100309]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-04-18 10:03:35, elapsed: 00:00:02

bin/nutch invertlinks crawl/linkdb -dir crawl/segments/0
LinkDb: starting at 2011-04-18 10:03:37
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/Users/simpatico/nutch-1.3/runtime/local/crawl/segments/0/20110418100309
LinkDb: finished at 2011-04-18 10:03:39, elapsed: 00:00:01

--
View this message in context: http://lucene.472066.n3.nabble.com/SolrIndex-problems-tp2243073p2833462.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: SolrIndex problems

Posted by "Gabriele Kahlout [via Lucene]" <ml...@n3.nabble.com>.

I'm now having the same problem but I'm not finding the problem yet.

$ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0
crawl/linkdb crawl/segments/0/20110418100309
SolrIndexer: starting at 2011-04-18 10:03:40
java.io.IOException: Job failed!

But everything else seems to have worked[1]. I've tried the Jetty tutorial
example and that worked too. I can access sorl through
http://localhost:8080/solr/admin/ and even with curl[2].

Trying to debug I've modified SolrIndexer.java to add a few prints[3] and
deploy again but still nothing gets printed. Any clues on how to fix?

$ cd $SOLR_HOME
$ ant compile
$ ant dist
$CATALINA_HOME/bin/catalina.sh stop
$ cp $SOLR_HOME/solr.war $CATALINA_HOME/webapps/solr.war
$CATALINA_HOME/bin/catalina.sh start

[3] 
catch (final Exception e) {
	  e.printStackTrace();
		LOG.fatal("hello");
      LOG.fatal("SolrIndexer: " + StringUtils.stringifyException(e));
      return -1;
    }


[2]
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">217</int></lst>
</response>

[1]
$ bin/nutch readdb crawl/crawldb/0 -stats
CrawlDb statistics start: crawl/crawldb/0
Statistics for CrawlDb: crawl/crawldb/0
TOTAL urls:	3
retry 0:	3
min score:	1.0
avg score:	1.0
max score:	1.0
status 2 (db_fetched):	3
CrawlDb statistics: done

bin/nutch parse crawl/segments/0/20110418100309
ParseSegment: starting at 2011-04-18 10:03:28
ParseSegment: segment: crawl/segments/0/20110418100309
ParseSegment: finished at 2011-04-18 10:03:31, elapsed: 00:00:03

bin/nutch updatedb crawl/crawldb/0 crawl/segments/0/20110418100309
CrawlDb update: starting at 2011-04-18 10:03:33
CrawlDb update: db: crawl/crawldb/0
CrawlDb update: segments: [crawl/segments/0/20110418100309]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-04-18 10:03:35, elapsed: 00:00:02

bin/nutch invertlinks crawl/linkdb -dir crawl/segments/0
LinkDb: starting at 2011-04-18 10:03:37
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/Users/simpatico/nutch-1.3/runtime/local/crawl/segments/0/20110418100309
LinkDb: finished at 2011-04-18 10:03:39, elapsed: 00:00:01


______________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/SolrIndex-problems-tp2243073p2833462.html
This email was sent by Gabriele Kahlout (via Nabble)
To receive all replies by email, subscribe to this discussion: http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=2243073&code=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3JnfDIyNDMwNzN8MTIyNjQ0MTk0Mg==

RE: SolrIndex problems

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.

Update

...duh

I did not substitute port :8080 which Tomcat was running on for usual :8983 port which Solr usually runs on.

Incredible how some things jump out at you after half an hour or so, and typically just after you have posted.

Sorry for post

-----Original Message-----
From: McGibbney, Lewis John [mailto:Lewis.McGibbney@gcu.ac.uk]
Sent: 12 January 2011 18:52
To: 'user@nutch.apache.org'
Subject: SolrIndex problems

Hello list,

Having some problems when attempting to index to Solr (experimenting with running Solr.war on Tomcat instead of usual Jetty config as before). Using Nutch in singular mode (no Hadoop this time). Exception is present due to input path not existing, however as far as I am aware I have followed the usual

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html