You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jens Jahnke <je...@wegtam.com> on 2014/05/28 12:05:38 UTC

Error while trying to index with elasticsearch on hadoop

Hi,

I've a simple hadoop setup with a name node and 1 data node.
Using nutch 1.8 I'm able to walk through all steps: inject, generate,
fetch, parse, updatedb, fetch-again, parse-again, ..., invertlinks.

But if I try to run the indexer using:

% ./bin/nutch index crawl/db/ -linkdb crawl/linkdb/ -dir crawl/segments

then I get an exception from hadoop:

Error: java.lang.IllegalArgumentException: n must be positive
        at java.util.Random.nextInt(Random.java:300)
        at org.elasticsearch.common.Names.randomNodeName(Names.java:45)
        at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareSettings(InternalSettingsPreparer.java:119)
        at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:159)
        at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:125)
        at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.open(ElasticIndexWriter.java:106)
        at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:78)
        at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
        at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:502)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

I've attached the whole stack trace to the mail.

Searching the internet the error suggests that hadoop is unable to find 
the data node but if that is the case shouldn't the error occur right 
from the start?

Any ideas?

Regards,

Jens

-- 
28. Wonnemond 2014, 11:56
Homepage : http://www.wegtam.com

Real Programmers think better when playing Adventure or Rogue.

Re: Error while trying to index with elasticsearch on hadoop

Posted by Jens Jahnke <je...@wegtam.com>.
Hi Julien,

On Fri, 30 May 2014 14:19:56 +0100
Julien Nioche <li...@gmail.com> wrote:

JN> did you rebuild the job file with ant job and copied it to
JN> runtime/deploy? it should be taken into account when set in
JN> nutch-site.xml.

I always did a `ant runtime` after changing files in conf/.

JN> "elasticsearch" is the default cluster name
JN> in elasticsearch. not in the nutch config, where there is no
JN> default value set

I know, but when you set none, elasticsearch falls back to the
"elasticsearch" cluster name. Maybe we should create an
"elasticsearch.yml" file which holds the cluster name like the
elasticsearch docs suggest? I'll try this out on monday.

Regards,

Jens

-- 
31. Wonnemond 2014, 09:43
Homepage : http://www.wegtam.com

When the bosses talk about improving productivity, they are never
talking about themselves.

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
did you rebuild the job file with ant job and copied it to runtime/deploy?
it should be taken into account when set in nutch-site.xml.

"elasticsearch" is the default cluster name


in elasticsearch. not in the nutch config, where there is no default value
set


On 30 May 2014 14:01, Jens Jahnke <je...@wegtam.com> wrote:

> Hi,
>
> On Fri, 30 May 2014 13:34:23 +0100
> Julien Nioche <li...@gmail.com> wrote:
>
> JN> The cluster name is not the same thing as the index name. It's
> JN> elasticsearch by default. Are you saying that it works when you
> specify it
> JN> on the command line but not in nutch-site.xml?
>
> I know. I just named my cluster "Fancy-Index-Name" for testing. Right
> now it seems that the setting of the cluster name from the
> nutch-site.xml is ignored. Maybe this wasn't triggered before because
> "elasticsearch" is the default cluster name.
>
> Regards,
>
> Jens
>
> --
> 30. Wonnemond 2014, 14:59
> Homepage : http://www.wegtam.com
>
> An apple a day makes 365 apples a year.
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Error while trying to index with elasticsearch on hadoop

Posted by Jens Jahnke <je...@wegtam.com>.
Hi,

On Fri, 30 May 2014 13:34:23 +0100
Julien Nioche <li...@gmail.com> wrote:

JN> The cluster name is not the same thing as the index name. It's
JN> elasticsearch by default. Are you saying that it works when you specify it
JN> on the command line but not in nutch-site.xml?

I know. I just named my cluster "Fancy-Index-Name" for testing. Right
now it seems that the setting of the cluster name from the
nutch-site.xml is ignored. Maybe this wasn't triggered before because
"elasticsearch" is the default cluster name.

Regards,

Jens

-- 
30. Wonnemond 2014, 14:59
Homepage : http://www.wegtam.com

An apple a day makes 365 apples a year.

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
Hi

The cluster name is not the same thing as the index name. It's
elasticsearch by default. Are you saying that it works when you specify it
on the command line but not in nutch-site.xml?

J.


On 30 May 2014 10:29, Jens Jahnke <je...@wegtam.com> wrote:

> Hi Julien,
>
> On Fri, 30 May 2014 10:04:23 +0100
> Julien Nioche <li...@gmail.com> wrote:
>
> JN> Before you open an issue : could you please try specifying the cluster
> name
> JN> -D elastic.cluster=elasticsearch when indexing? For some reason it
> seems to
> JN> have solved the issue in my case
>
> if I use something like this:
>
> % NUTCH_OPTS=-Delastic.cluster=Fancy-Index-Name ./bin/nutch index crawl/db
> -linkdb linkdb -dir crawl/segments
>
> It works indeed. :-)
>
> But I have the clustername configured in nutch-site.xml:
>
> <property>
>   <name>elastic.cluster</name>
>   <value>Fancy-Index-Name</value>
> </property>
>
> Regards,
>
> Jens
>
> --
> 30. Wonnemond 2014, 11:25
> Homepage : http://www.wegtam.com
>
> There's a long-standing bug relating to the x86 architecture that
> allows you to install Windows.
>                 -- Matthew D. Fuller
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Error while trying to index with elasticsearch on hadoop

Posted by Jens Jahnke <je...@wegtam.com>.
Hi Julien,

On Fri, 30 May 2014 10:04:23 +0100
Julien Nioche <li...@gmail.com> wrote:

JN> Before you open an issue : could you please try specifying the cluster name
JN> -D elastic.cluster=elasticsearch when indexing? For some reason it seems to
JN> have solved the issue in my case

if I use something like this:

% NUTCH_OPTS=-Delastic.cluster=Fancy-Index-Name ./bin/nutch index crawl/db -linkdb linkdb -dir crawl/segments

It works indeed. :-)

But I have the clustername configured in nutch-site.xml:

<property>
  <name>elastic.cluster</name>
  <value>Fancy-Index-Name</value>      
</property>

Regards,

Jens

-- 
30. Wonnemond 2014, 11:25
Homepage : http://www.wegtam.com

There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.
		-- Matthew D. Fuller

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
Hi again

Before you open an issue : could you please try specifying the cluster name
-D elastic.cluster=elasticsearch when indexing? For some reason it seems to
have solved the issue in my case

Thanks

J.


On 30 May 2014 09:22, Julien Nioche <li...@gmail.com> wrote:

> Hi Jens
>
> I have been able to reproduce the 'No node available' issue but did not
> have the one with names.txt
> Can you please open an issue on JIRA?
>
> Is your cluster somewhere on the cloud or in house?
>
> Thanks
>
> Julien
>
>
> On 29 May 2014 10:07, Jens Jahnke <je...@wegtam.com> wrote:
>
>> Hi Julien,
>>
>> On Wed, 28 May 2014 15:32:28 +0100
>> Julien Nioche <li...@gmail.com> wrote:
>>
>> JN> Ok, so you are running it on Hadoop 2 then.
>>
>> Sorry, I forgot to mention that.
>>
>> JN> [...]
>> JN> That file names.txt lives in the elasticsearch jar. The explanation
>> that
>> JN> comes to mind is that using Nutch in deployed mode means that this
>> file is
>> JN> not loaded. I have used ES in distributed mode with Hadoop 1.x so it
>> could
>> JN> be that there is a difference in Hadoop 2 (which we do not yet
>> officially
>> JN> support BTW) or a difference with my configuration which would
>> explain why
>> JN> I did not get this issue.
>>
>> I have created a file `conf/names.txt` with some lines of names in it and
>> executed `ant runtime` afterwards. But now I get an
>> NoNodeAvailableException
>> although elasticsearch is running. :-|
>>
>> 2014-05-29 10:13:40,376 INFO  [main] mapreduce.Job
>> (Job.java:printTaskEvents(1424)) - Task Id :
>> attempt_1401350880587_0001_r_000000_0, Status : FAILED
>> Error: org.elasticsearch.client.transport.NoNodeAvailableException: No
>> node available
>>         at
>> org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:219)
>>         at
>> org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
>>         at
>> org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:147)
>>         at
>> org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.java:360)
>>         at
>> org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:165)
>>         at
>> org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
>>         at
>> org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
>>         at
>> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexWriter.java:211)
>>         at
>> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.close(ElasticIndexWriter.java:229)
>>         at
>> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
>>         at
>> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:520)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>
>> I poked a bit in the source and added log output to verify that the
>> correct
>> settings are used. That seems to be the case (localhost:9300).
>>
>> Regards,
>>
>> Jens
>>
>> --
>> 29. Wonnemond 2014, 10:16
>> Homepage : http://www.wegtam.com
>>
>> I don't know anything about music.  In my line you don't have to.
>>                 -- Elvis Presley
>>
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
Hi Jens

I have been able to reproduce the 'No node available' issue but did not
have the one with names.txt
Can you please open an issue on JIRA?

Is your cluster somewhere on the cloud or in house?

Thanks

Julien


On 29 May 2014 10:07, Jens Jahnke <je...@wegtam.com> wrote:

> Hi Julien,
>
> On Wed, 28 May 2014 15:32:28 +0100
> Julien Nioche <li...@gmail.com> wrote:
>
> JN> Ok, so you are running it on Hadoop 2 then.
>
> Sorry, I forgot to mention that.
>
> JN> [...]
> JN> That file names.txt lives in the elasticsearch jar. The explanation
> that
> JN> comes to mind is that using Nutch in deployed mode means that this
> file is
> JN> not loaded. I have used ES in distributed mode with Hadoop 1.x so it
> could
> JN> be that there is a difference in Hadoop 2 (which we do not yet
> officially
> JN> support BTW) or a difference with my configuration which would explain
> why
> JN> I did not get this issue.
>
> I have created a file `conf/names.txt` with some lines of names in it and
> executed `ant runtime` afterwards. But now I get an
> NoNodeAvailableException
> although elasticsearch is running. :-|
>
> 2014-05-29 10:13:40,376 INFO  [main] mapreduce.Job
> (Job.java:printTaskEvents(1424)) - Task Id :
> attempt_1401350880587_0001_r_000000_0, Status : FAILED
> Error: org.elasticsearch.client.transport.NoNodeAvailableException: No
> node available
>         at
> org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:219)
>         at
> org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
>         at
> org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:147)
>         at
> org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.java:360)
>         at
> org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:165)
>         at
> org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
>         at
> org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
>         at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexWriter.java:211)
>         at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.close(ElasticIndexWriter.java:229)
>         at
> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
>         at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
>         at
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:520)
>         at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>
> I poked a bit in the source and added log output to verify that the correct
> settings are used. That seems to be the case (localhost:9300).
>
> Regards,
>
> Jens
>
> --
> 29. Wonnemond 2014, 10:16
> Homepage : http://www.wegtam.com
>
> I don't know anything about music.  In my line you don't have to.
>                 -- Elvis Presley
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Error while trying to index with elasticsearch on hadoop

Posted by Jens Jahnke <je...@wegtam.com>.
Hi Julien,

On Wed, 28 May 2014 15:32:28 +0100
Julien Nioche <li...@gmail.com> wrote:

JN> Ok, so you are running it on Hadoop 2 then.

Sorry, I forgot to mention that.

JN> [...]
JN> That file names.txt lives in the elasticsearch jar. The explanation that
JN> comes to mind is that using Nutch in deployed mode means that this file is
JN> not loaded. I have used ES in distributed mode with Hadoop 1.x so it could
JN> be that there is a difference in Hadoop 2 (which we do not yet officially
JN> support BTW) or a difference with my configuration which would explain why
JN> I did not get this issue.

I have created a file `conf/names.txt` with some lines of names in it and 
executed `ant runtime` afterwards. But now I get an NoNodeAvailableException
although elasticsearch is running. :-|

2014-05-29 10:13:40,376 INFO  [main] mapreduce.Job (Job.java:printTaskEvents(1424)) - Task Id : attempt_1401350880587_0001_r_000000_0, Status : FAILED
Error: org.elasticsearch.client.transport.NoNodeAvailableException: No node available
        at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:219)
        at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
        at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:147)
        at org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.java:360)
        at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:165)
        at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
        at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
        at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexWriter.java:211)
        at org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.close(ElasticIndexWriter.java:229)
        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
        at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
        at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:520)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

I poked a bit in the source and added log output to verify that the correct
settings are used. That seems to be the case (localhost:9300).

Regards,

Jens

-- 
29. Wonnemond 2014, 10:16
Homepage : http://www.wegtam.com

I don't know anything about music.  In my line you don't have to.
		-- Elvis Presley

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
Hi Jens

Ok, so you are running it on Hadoop 2 then.

Now if you look at
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/node/internal/InternalSettingsPreparer.java#L114
you'll see that if it does not have a name, it will try to load one
randomly from a file (names.txt) and judging by
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/Names.java#L45
that file must be empty or non existent.

That file names.txt lives in the elasticsearch jar. The explanation that
comes to mind is that using Nutch in deployed mode means that this file is
not loaded. I have used ES in distributed mode with Hadoop 1.x so it could
be that there is a difference in Hadoop 2 (which we do not yet officially
support BTW) or a difference with my configuration which would explain why
I did not get this issue.

HTH

Julien




On 28 May 2014 12:24, Jens Jahnke <je...@wegtam.com> wrote:

> Hi Julien,
>
> On Wed, 28 May 2014 11:33:17 +0100
> Julien Nioche <li...@gmail.com> wrote:
>
> JN> > I've a simple hadoop setup with a name node and 1 data node.
> JN>
> JN> no job tracker and task trackers?
>
> well I also have a yarn resource and node manager running.
>
> JN> >
> JN> > Error: java.lang.IllegalArgumentException: n must be positive
> JN> >         at java.util.Random.nextInt(Random.java:300)
> JN> >         at
> org.elasticsearch.common.Names.randomNodeName(Names.java:45)
> JN> >         [...]
> JN>
> JN> Your problem is not related to Hadoop but has to do with the config of
> ES.
> JN> What did you specify for the port value?
>
> I have configured the the port 9300 in nutch-site.xml:
>
> <property>
>   <name>elastic.port</name>
>   <value>9300</value>
> </property>
>
> JN> Try the version in trunk if you can : I committed
> JN> https://issues.apache.org/jira/browse/NUTCH-1745 some time ago and it
> could
> JN> fix this issue
>
> I just ran the trunk version and got exactly the same error message. :-(
> The configuration runs fine in local mode by the way...
>
> Regards,
>
> Jens
>
> --
> 28. Wonnemond 2014, 13:18
> Homepage : http://www.wegtam.com
>
> Jealousy is all the fun you think they have.
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Error while trying to index with elasticsearch on hadoop

Posted by Jens Jahnke <je...@wegtam.com>.
Hi Julien,

On Wed, 28 May 2014 11:33:17 +0100
Julien Nioche <li...@gmail.com> wrote:

JN> > I've a simple hadoop setup with a name node and 1 data node.
JN> 
JN> no job tracker and task trackers?

well I also have a yarn resource and node manager running. 

JN> >
JN> > Error: java.lang.IllegalArgumentException: n must be positive
JN> >         at java.util.Random.nextInt(Random.java:300)
JN> >         at org.elasticsearch.common.Names.randomNodeName(Names.java:45)
JN> >         [...]
JN> 
JN> Your problem is not related to Hadoop but has to do with the config of ES.
JN> What did you specify for the port value?

I have configured the the port 9300 in nutch-site.xml:

<property>
  <name>elastic.port</name>
  <value>9300</value>
</property>

JN> Try the version in trunk if you can : I committed
JN> https://issues.apache.org/jira/browse/NUTCH-1745 some time ago and it could
JN> fix this issue

I just ran the trunk version and got exactly the same error message. :-(
The configuration runs fine in local mode by the way...

Regards,

Jens

-- 
28. Wonnemond 2014, 13:18
Homepage : http://www.wegtam.com

Jealousy is all the fun you think they have.

Re: Error while trying to index with elasticsearch on hadoop

Posted by Julien Nioche <li...@gmail.com>.
Hi Jens


> I've a simple hadoop setup with a name node and 1 data node.
>

no job tracker and task trackers?


> Using nutch 1.8 I'm able to walk through all steps: inject, generate,
> fetch, parse, updatedb, fetch-again, parse-again, ..., invertlinks.
>
> But if I try to run the indexer using:
>
> % ./bin/nutch index crawl/db/ -linkdb crawl/linkdb/ -dir crawl/segments
>
> then I get an exception from hadoop:
>
> Error: java.lang.IllegalArgumentException: n must be positive
>         at java.util.Random.nextInt(Random.java:300)
>         at org.elasticsearch.common.Names.randomNodeName(Names.java:45)
>         at
> org.elasticsearch.node.internal.InternalSettingsPreparer.prepareSettings(InternalSettingsPreparer.java:119)
>         at
> org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:159)
>         at
> org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:125)
>         at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.open(ElasticIndexWriter.java:106)
>         at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:78)
>         at
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
>         at
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:502)
>         at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>
> I've attached the whole stack trace to the mail.
>
> Searching the internet the error suggests that hadoop is unable to find
> the data node but if that is the case shouldn't the error occur right
> from the start?
>


>
> Any ideas?
>
> Regards,
>
> Jens
>
> --
> 28. Wonnemond 2014, 11:56
> Homepage : http://www.wegtam.com
>
> Real Programmers think better when playing Adventure or Rogue.



Your problem is not related to Hadoop but has to do with the config of ES.
What did you specify for the port value?

Try the version in trunk if you can : I committed
https://issues.apache.org/jira/browse/NUTCH-1745 some time ago and it could
fix this issue

Julien


-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble