You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jake Dodd <ja...@ontopic.io> on 2014/06/11 17:37:39 UTC
Exception 'Missing elastic.cluster' with correct elasticsearch config
Hi all,
The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
I’ve noticed that Nutch throws an exception when the elastic.cluster property is not set—even when elastic.host and elastic.port are properly configured. In the documentation for the elastic properties, it says that you can either specify elastic.cluster, or specify elastic.port together with elastic.host.
However, it seems that org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception if elastic.cluster is missing, regardless of whether elastic.port and elastic.host have been properly set. The exception is thrown in the ElasticIndexWriter.setConf() method.
Is this a known bug, and has it been fixed in the trunk? I was able to get the Elasticsearch indexer working properly by setting elastic.host and elastic.port, and commenting out the if-statement beginning on line 254 in ElasticIndexWriter.java.
For reference, here are the exception, and the relevant properties in my nutch-site.xml.
***Exception***
Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set in nutch-site.xml
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port
elastic.index : elastic index command
elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
***nutch-site.xml***
<property>
<name>elastic.host</name>
<value>localhost</value>
<description>The hostname to send documents to using TransportClient. Either host
and port must be defined or cluster.</description>
</property>
<property>
<name>elastic.port</name>
<value>9300</value>The port to connect to using TransportClient.<description>
</description>
</property>
Cheers
Jake
Re: Exception 'Missing elastic.cluster' with correct elasticsearch config
Posted by Julien Nioche <li...@gmail.com>.
Hi Jake
This has been fixed in trunk. see
https://github.com/apache/nutch/commit/026b2ff414bcf166de4bfeabef57f0202375ea38#diff-68fe6210481889b1947da1fe7d7ed0afL254
and https://issues.apache.org/jira/browse/NUTCH-1745
Thanks
Julien
On 11 June 2014 16:37, Jake Dodd <ja...@ontopic.io> wrote:
> Hi all,
>
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
>
> I’ve noticed that Nutch throws an exception when the elastic.cluster
> property is not set—even when elastic.host and elastic.port are properly
> configured. In the documentation for the elastic properties, it says that
> you can either specify elastic.cluster, or specify elastic.port together
> with elastic.host.
>
> However, it seems that
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception
> if elastic.cluster is missing, regardless of whether elastic.port and
> elastic.host have been properly set. The exception is thrown in the
> ElasticIndexWriter.setConf() method.
>
> Is this a known bug, and has it been fixed in the trunk? I was able to get
> the Elasticsearch indexer working properly by setting elastic.host and
> elastic.port, and commenting out the if-statement beginning on line 254 in
> ElasticIndexWriter.java.
>
> For reference, here are the exception, and the relevant properties in my
> nutch-site.xml.
>
>
> ***Exception***
>
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be
> set in nutch-site.xml
> ElasticIndexWriter
> elastic.cluster : elastic prefix cluster
> elastic.host : hostname
> elastic.port : port
> elastic.index : elastic index command
> elastic.max.bulk.docs : elastic bulk index doc counts. (default
> 250)
> elastic.max.bulk.size : elastic bulk index length. (default
> 2500500 ~2.5MB)
>
> at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
> at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
> at
> org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>
> ***nutch-site.xml***
>
> <property>
> <name>elastic.host</name>
> <value>localhost</value>
> <description>The hostname to send documents to using TransportClient.
> Either host
> and port must be defined or cluster.</description>
> </property>
>
> <property>
> <name>elastic.port</name>
> <value>9300</value>The port to connect to using
> TransportClient.<description>
> </description>
> </property>
>
> Cheers
>
> Jake
--
Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble
Re: Exception 'Missing elastic.cluster' with correct elasticsearch config
Posted by Jake Dodd <ja...@ontopic.io>.
Ok great!
In case anybody comes across this thread before Nutch 1.9 is released, and needs to get this working, the easiest solution is just to specify the elastic.cluster property in nutch-site.xml in addition to the port number and host, rather than modifying the source.
Cheers
Jake
On Jun 11, 2014, at 8:37 AM, Jake Dodd <ja...@ontopic.io> wrote:
> Hi all,
>
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
>
> I’ve noticed that Nutch throws an exception when the elastic.cluster property is not set—even when elastic.host and elastic.port are properly configured. In the documentation for the elastic properties, it says that you can either specify elastic.cluster, or specify elastic.port together with elastic.host.
>
> However, it seems that org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception if elastic.cluster is missing, regardless of whether elastic.port and elastic.host have been properly set. The exception is thrown in the ElasticIndexWriter.setConf() method.
>
> Is this a known bug, and has it been fixed in the trunk? I was able to get the Elasticsearch indexer working properly by setting elastic.host and elastic.port, and commenting out the if-statement beginning on line 254 in ElasticIndexWriter.java.
>
> For reference, here are the exception, and the relevant properties in my nutch-site.xml.
>
>
> ***Exception***
>
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set in nutch-site.xml
> ElasticIndexWriter
> elastic.cluster : elastic prefix cluster
> elastic.host : hostname
> elastic.port : port
> elastic.index : elastic index command
> elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
> elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
>
> at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
> at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
> at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>
> ***nutch-site.xml***
>
> <property>
> <name>elastic.host</name>
> <value>localhost</value>
> <description>The hostname to send documents to using TransportClient. Either host
> and port must be defined or cluster.</description>
> </property>
>
> <property>
> <name>elastic.port</name>
> <value>9300</value>The port to connect to using TransportClient.<description>
> </description>
> </property>
>
> Cheers
>
> Jake