You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jake Dodd <ja...@ontopic.io> on 2014/06/11 17:37:39 UTC

Exception 'Missing elastic.cluster' with correct elasticsearch config

Hi all,

The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).

I’ve noticed that Nutch throws an exception when the elastic.cluster property is not set—even when elastic.host and elastic.port are properly configured. In the documentation for the elastic properties, it says that you can either specify elastic.cluster, or specify elastic.port together with elastic.host. 

However, it seems that org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception if elastic.cluster is missing, regardless of whether elastic.port and elastic.host have been properly set. The exception is thrown in the ElasticIndexWriter.setConf() method.

Is this a known bug, and has it been fixed in the trunk? I was able to get the Elasticsearch indexer working properly by setting elastic.host and elastic.port, and commenting out the if-statement beginning on line 254 in ElasticIndexWriter.java.

For reference, here are the exception, and the relevant properties in my nutch-site.xml.


***Exception***

Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set in nutch-site.xml 
ElasticIndexWriter
	elastic.cluster : elastic prefix cluster
	elastic.host : hostname
	elastic.port : port
	elastic.index : elastic index command 
	elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) 
	elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)

	at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
	at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
	at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
	at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
	at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

***nutch-site.xml***

<property>
  <name>elastic.host</name>
  <value>localhost</value>
  <description>The hostname to send documents to using TransportClient. Either host
  and port must be defined or cluster.</description>
</property>

<property> 
  <name>elastic.port</name>
  <value>9300</value>The port to connect to using TransportClient.<description>
  </description>
</property>

Cheers

Jake

Re: Exception 'Missing elastic.cluster' with correct elasticsearch config

Posted by Julien Nioche <li...@gmail.com>.
Hi Jake

This has been fixed in trunk. see
https://github.com/apache/nutch/commit/026b2ff414bcf166de4bfeabef57f0202375ea38#diff-68fe6210481889b1947da1fe7d7ed0afL254
 and https://issues.apache.org/jira/browse/NUTCH-1745

Thanks

Julien


On 11 June 2014 16:37, Jake Dodd <ja...@ontopic.io> wrote:

> Hi all,
>
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
>
> I’ve noticed that Nutch throws an exception when the elastic.cluster
> property is not set—even when elastic.host and elastic.port are properly
> configured. In the documentation for the elastic properties, it says that
> you can either specify elastic.cluster, or specify elastic.port together
> with elastic.host.
>
> However, it seems that
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception
> if elastic.cluster is missing, regardless of whether elastic.port and
> elastic.host have been properly set. The exception is thrown in the
> ElasticIndexWriter.setConf() method.
>
> Is this a known bug, and has it been fixed in the trunk? I was able to get
> the Elasticsearch indexer working properly by setting elastic.host and
> elastic.port, and commenting out the if-statement beginning on line 254 in
> ElasticIndexWriter.java.
>
> For reference, here are the exception, and the relevant properties in my
> nutch-site.xml.
>
>
> ***Exception***
>
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be
> set in nutch-site.xml
> ElasticIndexWriter
>         elastic.cluster : elastic prefix cluster
>         elastic.host : hostname
>         elastic.port : port
>         elastic.index : elastic index command
>         elastic.max.bulk.docs : elastic bulk index doc counts. (default
> 250)
>         elastic.max.bulk.size : elastic bulk index length. (default
> 2500500 ~2.5MB)
>
>         at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
>         at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
>         at
> org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>
> ***nutch-site.xml***
>
> <property>
>   <name>elastic.host</name>
>   <value>localhost</value>
>   <description>The hostname to send documents to using TransportClient.
> Either host
>   and port must be defined or cluster.</description>
> </property>
>
> <property>
>   <name>elastic.port</name>
>   <value>9300</value>The port to connect to using
> TransportClient.<description>
>   </description>
> </property>
>
> Cheers
>
> Jake




-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Exception 'Missing elastic.cluster' with correct elasticsearch config

Posted by Jake Dodd <ja...@ontopic.io>.
Ok great!

In case anybody comes across this thread before Nutch 1.9 is released, and needs to get this working, the easiest solution is just to specify the elastic.cluster property in nutch-site.xml in addition to the port number and host, rather than modifying the source.

Cheers

Jake 

On Jun 11, 2014, at 8:37 AM, Jake Dodd <ja...@ontopic.io> wrote:

> Hi all,
> 
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
> 
> I’ve noticed that Nutch throws an exception when the elastic.cluster property is not set—even when elastic.host and elastic.port are properly configured. In the documentation for the elastic properties, it says that you can either specify elastic.cluster, or specify elastic.port together with elastic.host. 
> 
> However, it seems that org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception if elastic.cluster is missing, regardless of whether elastic.port and elastic.host have been properly set. The exception is thrown in the ElasticIndexWriter.setConf() method.
> 
> Is this a known bug, and has it been fixed in the trunk? I was able to get the Elasticsearch indexer working properly by setting elastic.host and elastic.port, and commenting out the if-statement beginning on line 254 in ElasticIndexWriter.java.
> 
> For reference, here are the exception, and the relevant properties in my nutch-site.xml.
> 
> 
> ***Exception***
> 
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set in nutch-site.xml 
> ElasticIndexWriter
> 	elastic.cluster : elastic prefix cluster
> 	elastic.host : hostname
> 	elastic.port : port
> 	elastic.index : elastic index command 
> 	elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) 
> 	elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
> 
> 	at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
> 	at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
> 	at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
> 	at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
> 	at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> ***nutch-site.xml***
> 
> <property>
>  <name>elastic.host</name>
>  <value>localhost</value>
>  <description>The hostname to send documents to using TransportClient. Either host
>  and port must be defined or cluster.</description>
> </property>
> 
> <property> 
>  <name>elastic.port</name>
>  <value>9300</value>The port to connect to using TransportClient.<description>
>  </description>
> </property>
> 
> Cheers
> 
> Jake