You are viewing a plain text version of this content. The canonical link for it is here.
Posted to builds@apache.org by Jukka Zitting <ju...@gmail.com> on 2010/04/19 09:48:45 UTC

[hudson] Slaves going offline

Hi,

Every now and then I see Hudson marking slaves as offline due to "Ping
response time is too long or timed out", but when I check the slave
it's still running OK. The problem seems to be because of extra load
on the master [1] or perhaps due to the slave-status plugin [2].

Do we need the slave-status plugin for anything? If not, I'd like to
disable it for now to see if that makes a difference.

We may also want to review the other plugins we have installed. Do we
need them all?

[1] http://issues.hudson-ci.org/browse/HUDSON-6196
[2] http://www.echelog.com/logs/browse/hudson/1268866800

BR,

Jukka Zitting

Re: [hudson] Slaves going offline

Posted by Niklas Gustavsson <ni...@protocol7.com>.
On Mon, Apr 19, 2010 at 9:48 AM, Jukka Zitting <ju...@gmail.com> wrote:
> Do we need the slave-status plugin for anything? If not, I'd like to
> disable it for now to see if that makes a difference.

We use it to monitor the health of Hudson on the Windows slave (the
slave has had a tendency to crash) using the regular Nagios
monitoring. So, if we disable the slave-status plugin, we need to make
sure to disable the Nagios check as well.

In the long run, we need a way to monitor at least the Windows slave.
But in the short run I have no issue with trying to disable the plugin
to see if it's causing these problems.

/niklas