You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by BlackIce <bl...@gmail.com> on 2014/05/11 16:39:16 UTC

Nutch 2.x from svn.

I just installed Nutch 2.x from SVN and Solrindexer is not working, my
guess is that it has to dow ith that Solrindexer is now a plug-in, so I
activated it in the plug-ins (same as in 1.8)

When trying to run crawl script I get:


Indexing TestCrawl12 on SOLR index -> http://localhost:8983/solr
IndexingJob: starting
SolrIndexerJob: java.lang.RuntimeException: job failed:
name=[TestCrawl12]Indexer, jobid=job_local1518802404_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:153)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:161)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:187)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:196)

and in log it shows:

2014-05-11 16:35:57,480 INFO  solr.SolrIndexWriter - Adding 1000 documents
2014-05-11 16:35:57,779 INFO  solr.SolrIndexWriter - Adding 1000 documents
2014-05-11 16:35:57,954 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-11 16:35:57,955 WARN  mapred.LocalJobRunner -
job_local1860988708_0001
java.lang.Exception: java.lang.IllegalStateException: Target host must not
be null, or set in parameters.
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IllegalStateException: Target host must not be null,
or set in parameters.
    at
org.apache.http.impl.client.DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:787)
    at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:414)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
    at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
    at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)
    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
    at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48)
    at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43)
    at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120)
    at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2014-05-11 16:35:58,543 ERROR indexer.IndexingJob - SolrIndexerJob:
java.lang.RuntimeException: job failed: name=[TestCrawl12]Indexer,
jobid=job_local1860988708_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:153)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:161)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:187)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:196)


 when I change solrindex to index in crawl script it seems to be activated
but is not adding any docs to Solr:

2014-05-11 16:21:05,439 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2014-05-11 16:21:05,439 INFO  indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
    solr.server.url : URL of the SOLR instance (mandatory)
    solr.commit.size : buffer size when sending to SOLR (default 1000)
    solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
    solr.auth : use authentication (default false)
    solr.auth.username : use authentication (default false)
    solr.auth : username for authentication
    solr.auth.password : password for authentication

and in Log it shows:

2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: content
dest: content
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: title dest:
title
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: host dest:
host
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: batchId
dest: batchId
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: boost dest:
boost
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: digest dest:
digest
2014-05-11 16:21:05,440 INFO  solr.SolrMappingReader - source: tstamp dest:
tstamp
2014-05-11 16:21:05,694 INFO  solr.SolrIndexWriter - Total 0 document is
added.