You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Telles Nobrega <te...@gmail.com> on 2015/02/08 05:37:55 UTC
Max Connect retries
Hi, I changed my cluster config so a failed nodemanager can be detected in
about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25%
for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911.
Already tried 28 time(s); maxRetries=45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents
request from attempt_1423319128424_0025_r_000000_0. startIndex 24
maxEvents 10000
Is this the expected behaviour? should I change max retries to a lower
values? if so, which config is that?
Thanks
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator
On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <da...@gmail.com> wrote:
> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>>
>>> That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>> Thanks
>>>
>>> Xuan Gong
>>>
>>> From: Telles Nobrega <te...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Max Connect retries
>>>
>>> Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator
On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <da...@gmail.com> wrote:
> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>>
>>> That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>> Thanks
>>>
>>> Xuan Gong
>>>
>>> From: Telles Nobrega <te...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Max Connect retries
>>>
>>> Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator
On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <da...@gmail.com> wrote:
> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>>
>>> That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>> Thanks
>>>
>>> Xuan Gong
>>>
>>> From: Telles Nobrega <te...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Max Connect retries
>>>
>>> Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator
On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <da...@gmail.com> wrote:
> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>>
>>> That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>> Thanks
>>>
>>> Xuan Gong
>>>
>>> From: Telles Nobrega <te...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Max Connect retries
>>>
>>> Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>
Re: Max Connect retries
Posted by daemeon reiydelle <da...@gmail.com>.
Are your nodes actually stuck or are you in e.g. a reduce step that is
reading so much data across the network that the node SEEMS unreachable?
Since you mention "gets stuck for a while at 25%", that suggests that
eventually the node finishes up its work ...
*.......*
*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
wrote:
> Thanks
>
> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>
>> That is for client connect retry in ipc level.
>>
>> You can decrease the max.retries by configuring
>>
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>> Thanks
>>
>> Xuan Gong
>>
>> From: Telles Nobrega <te...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, February 7, 2015 at 8:37 PM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Max Connect retries
>>
>> Hi, I changed my cluster config so a failed nodemanager can be
>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>> failed node:
>>
>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>
>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>
>> Thanks
>>
>>
>>
Re: Max Connect retries
Posted by daemeon reiydelle <da...@gmail.com>.
Are your nodes actually stuck or are you in e.g. a reduce step that is
reading so much data across the network that the node SEEMS unreachable?
Since you mention "gets stuck for a while at 25%", that suggests that
eventually the node finishes up its work ...
*.......*
*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
wrote:
> Thanks
>
> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>
>> That is for client connect retry in ipc level.
>>
>> You can decrease the max.retries by configuring
>>
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>> Thanks
>>
>> Xuan Gong
>>
>> From: Telles Nobrega <te...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, February 7, 2015 at 8:37 PM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Max Connect retries
>>
>> Hi, I changed my cluster config so a failed nodemanager can be
>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>> failed node:
>>
>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>
>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>
>> Thanks
>>
>>
>>
Re: Max Connect retries
Posted by daemeon reiydelle <da...@gmail.com>.
Are your nodes actually stuck or are you in e.g. a reduce step that is
reading so much data across the network that the node SEEMS unreachable?
Since you mention "gets stuck for a while at 25%", that suggests that
eventually the node finishes up its work ...
*.......*
*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
wrote:
> Thanks
>
> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>
>> That is for client connect retry in ipc level.
>>
>> You can decrease the max.retries by configuring
>>
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>> Thanks
>>
>> Xuan Gong
>>
>> From: Telles Nobrega <te...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, February 7, 2015 at 8:37 PM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Max Connect retries
>>
>> Hi, I changed my cluster config so a failed nodemanager can be
>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>> failed node:
>>
>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>
>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>
>> Thanks
>>
>>
>>
Re: Max Connect retries
Posted by daemeon reiydelle <da...@gmail.com>.
Are your nodes actually stuck or are you in e.g. a reduce step that is
reading so much data across the network that the node SEEMS unreachable?
Since you mention "gets stuck for a while at 25%", that suggests that
eventually the node finishes up its work ...
*.......*
*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <te...@gmail.com>
wrote:
> Thanks
>
> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
>
>> That is for client connect retry in ipc level.
>>
>> You can decrease the max.retries by configuring
>>
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>> Thanks
>>
>> Xuan Gong
>>
>> From: Telles Nobrega <te...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Saturday, February 7, 2015 at 8:37 PM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Max Connect retries
>>
>> Hi, I changed my cluster config so a failed nodemanager can be
>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>> failed node:
>>
>> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>>
>> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>>
>> Thanks
>>
>>
>>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
Thanks
On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
> That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
> Thanks
>
> Xuan Gong
>
> From: Telles Nobrega <te...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Max Connect retries
>
> Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>
> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>
> Thanks
>
>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
Thanks
On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
> That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
> Thanks
>
> Xuan Gong
>
> From: Telles Nobrega <te...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Max Connect retries
>
> Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>
> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>
> Thanks
>
>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
Thanks
On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
> That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
> Thanks
>
> Xuan Gong
>
> From: Telles Nobrega <te...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Max Connect retries
>
> Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>
> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>
> Thanks
>
>
>
Re: Max Connect retries
Posted by Telles Nobrega <te...@gmail.com>.
Thanks
On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xg...@hortonworks.com> wrote:
> That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
> Thanks
>
> Xuan Gong
>
> From: Telles Nobrega <te...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Max Connect retries
>
> Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
> org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>
> Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
>
> Thanks
>
>
>
Re: Max Connect retries
Posted by Xuan Gong <xg...@hortonworks.com>.
That is for client connect retry in ipc level.
You can decrease the max.retries by configuring
ipc.client.connect.max.retries.on.timeouts
in core-site.xml
Thanks
Xuan Gong
From: Telles Nobrega <te...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Max Connect retries
Hi, I changed my cluster config so a failed nodemanager can be detected in about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25% for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911<http://10.3.2.99:49911>. Already tried 28 time(s); maxRetries=45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
Thanks
Re: Max Connect retries
Posted by Xuan Gong <xg...@hortonworks.com>.
That is for client connect retry in ipc level.
You can decrease the max.retries by configuring
ipc.client.connect.max.retries.on.timeouts
in core-site.xml
Thanks
Xuan Gong
From: Telles Nobrega <te...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Max Connect retries
Hi, I changed my cluster config so a failed nodemanager can be detected in about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25% for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911<http://10.3.2.99:49911>. Already tried 28 time(s); maxRetries=45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
Thanks
Re: Max Connect retries
Posted by Xuan Gong <xg...@hortonworks.com>.
That is for client connect retry in ipc level.
You can decrease the max.retries by configuring
ipc.client.connect.max.retries.on.timeouts
in core-site.xml
Thanks
Xuan Gong
From: Telles Nobrega <te...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Max Connect retries
Hi, I changed my cluster config so a failed nodemanager can be detected in about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25% for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911<http://10.3.2.99:49911>. Already tried 28 time(s); maxRetries=45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
Thanks
Re: Max Connect retries
Posted by Xuan Gong <xg...@hortonworks.com>.
That is for client connect retry in ipc level.
You can decrease the max.retries by configuring
ipc.client.connect.max.retries.on.timeouts
in core-site.xml
Thanks
Xuan Gong
From: Telles Nobrega <te...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Max Connect retries
Hi, I changed my cluster config so a failed nodemanager can be detected in about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25% for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911<http://10.3.2.99:49911>. Already tried 28 time(s); maxRetries=45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
Is this the expected behaviour? should I change max retries to a lower values? if so, which config is that?
Thanks