You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by alessio crisantemi <al...@gmail.com> on 2012/01/29 23:35:09 UTC

problem to add Solr data with Nutch

Hi all,
I built Nutch on Solr (versions 1.4 and 1.4.1) on Windows.
 I can parse and crawl a website, but when I try to indexing this data with
Solr, I received an error..
 this is my command:

bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5

and
this is (the final part of) the reply:
...

ParseSegment: finished at 2012-01-29 23:10:20, elapsed: 00:00:04
CrawlDb update: starting at 2012-01-29 23:10:20
CrawlDb update: db: crawl-20120129230752/crawldb
CrawlDb update: segments: [crawl-20120129230752/segments/20120129230930]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: 404 purging: false
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2012-01-29 23:10:25, elapsed: 00:00:04
LinkDb: starting at 2012-01-29 23:10:25
LinkDb: linkdb: crawl-20120129230752/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230806
LinkDb: adding segment:
file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230834
LinkDb: adding segment:
file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230930
LinkDb: finished at 2012-01-29 23:10:30, elapsed: 00:00:05
SolrIndexer: starting at 2012-01-29 23:10:30
Adding 11 documents
java.io.IOException: Job failed!
SolrDeleteDuplicates: starting at 2012-01-29 23:10:44
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
Exception in thread "main" java.io.IOException:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error
executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198)
... 9 more
Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 1)
or the data in not in 'javabin' format
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 11 more

*CAN YOU HELP ME!?!?*

best,

alessio