You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 16:07:38 UTC

[jira] [Resolved] (NUTCH-1773) Solr Indexer fails

     [ https://issues.apache.org/jira/browse/NUTCH-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche resolved NUTCH-1773.
----------------------------------

    Resolution: Not a Problem

As Lewis pointed out you need to specify the SOLR URL with the indexsolr command or if using the index command directly pass it (solr.server.url) either on the command line (-D solr.server.url=MYSOLRURL) or via nutch-site.xml.

> Solr Indexer fails
> ------------------
>
>                 Key: NUTCH-1773
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1773
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 2.3
>         Environment: Ubuntu 12.04 LTS, java version "1.7.0_55" - Hbase-0.90.6 (pseudo dist), Hadoop 1.2.1, Solr 4.6
>            Reporter: Ralf
>            Priority: Critical
>             Fix For: 2.3
>
>
> When using crawl script or solrindexer by itself (/bin/nutch solrindex) in localmode it fails with:
> hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch solrindex TestCrawl18 -reindex
> IndexingJob: starting
> Active IndexWriters :
> SOLRIndexWriter
> 	solr.server.url : URL of the SOLR instance (mandatory)
> 	solr.commit.size : buffer size when sending to SOLR (default 1000)
> 	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
> 	solr.auth : use authentication (default false)
> 	solr.auth.username : use authentication (default false)
> 	solr.auth : username for authentication
> 	solr.auth.password : password for authentication
> SolrIndexerJob: java.lang.IllegalStateException: Target host must not be null, or set in parameters.
> 	at org.apache.http.impl.client.DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:787)
> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:414)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)
> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> 	at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
> 	at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
> 	at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:146)
> 	at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)
> 	at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:171)
> 	at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:187)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:196)
> when using the new INDEX command it finishes, but nothing is added to Solr:
> hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch index TestCrawl18 -reindex
> IndexingJob: starting
> Active IndexWriters :
> SOLRIndexWriter
> 	solr.server.url : URL of the SOLR instance (mandatory)
> 	solr.commit.size : buffer size when sending to SOLR (default 1000)
> 	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
> 	solr.auth : use authentication (default false)
> 	solr.auth.username : use authentication (default false)
> 	solr.auth : username for authentication
> 	solr.auth.password : password for authentication
>  
> Log shows:
> 2014-05-13 03:01:13,781 INFO  indexer.IndexingJob - IndexingJob: starting
> 2014-05-13 03:01:14,108 INFO  indexer.IndexingFilters - Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
> 2014-05-13 03:01:14,109 INFO  basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
> 2014-05-13 03:01:14,109 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2014-05-13 03:01:14,335 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
> 2014-05-13 03:01:14,336 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2014-05-13 03:01:14,336 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2014-05-13 03:01:14,620 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:14,768 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:14,968 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:15,243 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:15,276 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:15,326 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:15,386 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: content dest: content
> 2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: title dest: title
> 2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: host dest: host
> 2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: batchId dest: batchId
> 2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: boost dest: boost
> 2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: digest dest: digest
> 2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
> 2014-05-13 03:01:15,405 INFO  basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
> 2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
> 2014-05-13 03:01:15,405 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2014-05-13 03:01:15,426 WARN  zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
> 2014-05-13 03:01:15,442 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
> 2014-05-13 03:01:16,144 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2014-05-13 03:01:16,144 INFO  indexer.IndexingJob - Active IndexWriters :
> SOLRIndexWriter
> 	solr.server.url : URL of the SOLR instance (mandatory)
> 	solr.commit.size : buffer size when sending to SOLR (default 1000)
> 	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
> 	solr.auth : use authentication (default false)
> 	solr.auth.username : use authentication (default false)
> 	solr.auth : username for authentication
> 	solr.auth.password : password for authentication
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: content dest: content
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: title dest: title
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: host dest: host
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: batchId dest: batchId
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: boost dest: boost
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: digest dest: digest
> 2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2014-05-13 03:01:16,338 INFO  solr.SolrIndexWriter - Total 0 document is added.
> 2014-05-13 03:01:16,338 INFO  indexer.IndexingJob - IndexingJob: done.



--
This message was sent by Atlassian JIRA
(v6.2#6252)