You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cameron tran <ca...@gmail.com> on 2012/05/18 06:58:39 UTC
ERROR solr.SolrIndexer - java.io.IOException: Job failed!
Hello
I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
http://127.0.0.1:8983/solr/ but is getting the following error. Using Solr
3.6.0.. Please error in bold below.
Is there some incompatability issue?
Ran
bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 3
topN 300
Thank you for your help
org.apache.solr.common.SolrException: ERROR: [doc=http://www.website.com/]
unknown field 'site'
*ERROR: [doc=http://www.website.com/] unknown field 'site'*
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
*2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException: Job
failed!*
2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
SolrDeleteDuplicates: starting at 2012-05-18 14:21:46
2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr
2012-05-18 14:21:48,640 INFO solr.SolrDeleteDuplicates -
SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01
2012-05-18 14:21:48,640 INFO crawl.Crawl - crawl finished:
crawl-20120518141951
Re: ERROR solr.SolrIndexer - java.io.IOException: Job failed!
Posted by cameron tran <ca...@gmail.com>.
Hello Jim and Tolga
Thanks for this... copied nutch's schema.xml to solr and it works.
When runing
bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 5
topN 1000
Only seems to index 8 docs because in solr's admin did a query string
search for *:*
returns only 8 docs in the results.
Have tried stopping and starting solr and running nutch again (using
different depth and topN parameters) and the result is always the same..
Have tried to add more seeds to the urls\seeds.txt list with separate urls
on a new line but same.
what commands in nutch can I use to get it to crawl the site again and add
to solr's index..
Tried bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3
-depth 5 topN 1000 solrindex
But this gives error
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:c:/nutch14/runtime/local/solrindex
Thank you
On Fri, May 18, 2012 at 9:20 PM, Jim Chandler <ja...@gmail.com>wrote:
> You need to add the site field in your schema.xml - in your solr.
>
> Jim
>
> On Fri, May 18, 2012 at 12:58 AM, cameron tran <cameront168@gmail.com
> >wrote:
>
> > Hello
> >
> > I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
> > http://127.0.0.1:8983/solr/ but is getting the following error. Using
> Solr
> > 3.6.0.. Please error in bold below.
> >
> > Is there some incompatability issue?
> >
> > Ran
> > bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth
> 3
> > topN 300
> >
> > Thank you for your help
> >
> > org.apache.solr.common.SolrException: ERROR: [doc=
> http://www.website.com/]
> > unknown field 'site'
> >
> > *ERROR: [doc=http://www.website.com/] unknown field 'site'*
> >
> > request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
> > at
> >
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> > at
> >
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> > at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
> > at
> >
> >
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
> > at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> > *2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException:
> Job
> > failed!*
> > 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: starting at 2012-05-18 14:21:46
> > 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr
> > 2012-05-18 14:21:48,640 INFO solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01
> > 2012-05-18 14:21:48,640 INFO crawl.Crawl - crawl finished:
> > crawl-20120518141951
> >
>
Re: ERROR solr.SolrIndexer - java.io.IOException: Job failed!
Posted by Jim Chandler <ja...@gmail.com>.
You need to add the site field in your schema.xml - in your solr.
Jim
On Fri, May 18, 2012 at 12:58 AM, cameron tran <ca...@gmail.com>wrote:
> Hello
>
> I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
> http://127.0.0.1:8983/solr/ but is getting the following error. Using Solr
> 3.6.0.. Please error in bold below.
>
> Is there some incompatability issue?
>
> Ran
> bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 3
> topN 300
>
> Thank you for your help
>
> org.apache.solr.common.SolrException: ERROR: [doc=http://www.website.com/]
> unknown field 'site'
>
> *ERROR: [doc=http://www.website.com/] unknown field 'site'*
>
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
> at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
> at
>
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> *2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!*
> 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: starting at 2012-05-18 14:21:46
> 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr
> 2012-05-18 14:21:48,640 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01
> 2012-05-18 14:21:48,640 INFO crawl.Crawl - crawl finished:
> crawl-20120518141951
>
Re: ERROR solr.SolrIndexer - java.io.IOException: Job failed!
Posted by Tolga <to...@ozses.net>.
Hi Cameron,
I've been dealing with the same issue, and taking care of it by adding
the field, in your case 'site', to solr schema.xml, and restarting solr.
On 5/18/12 7:58 AM, cameron tran wrote:
> Hello
>
> I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
> http://127.0.0.1:8983/solr/ but is getting the following error. Using Solr
> 3.6.0.. Please error in bold below.
>
> Is there some incompatability issue?
>
> Ran
> bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 3
> topN 300
>
> Thank you for your help
>
> org.apache.solr.common.SolrException: ERROR: [doc=http://www.website.com/]
> unknown field 'site'
>
> *ERROR: [doc=http://www.website.com/] unknown field 'site'*
>
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> *2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!*
> 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: starting at 2012-05-18 14:21:46
> 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr
> 2012-05-18 14:21:48,640 INFO solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01
> 2012-05-18 14:21:48,640 INFO crawl.Crawl - crawl finished:
> crawl-20120518141951
>