You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "reshu.agarwal" <re...@orkash.com> on 2014/12/04 13:41:11 UTC
DUCC-unstable behaviour od ducc
Hi,
Please look this stats:
/ Status Name Memory(GB):usable Memory(GB):total
Swap(GB):inuse Swap(GB):free Alien PIDs Shares:total
Shares:inuse Heartbeat (last)
Total 58 70
0 29 9
29 3
up S144 36 39
0 20 8 18 2
59
down S143 22 31
0 9 1 11
11 58
/
I am not able to understand this stats.
Please help.
Reshu.
Re: DUCC-unstable behaviour od ducc
Posted by Lou DeGenaro <lo...@gmail.com>.
Are the machines where your DUCC daemons and/or agents run extremely busy?
Otherwise, I should think that the default heartbeat config should work as
is.
Lou.
On Wed, Dec 10, 2014 at 4:06 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Dear Lou,
>
> My problem has been resolved. I just increased the max time of receiving
> Heartbeats of agents.
>
> The "unstable behavior" of DUCC 1.1.0 in my case was the node up and down
> problem in both cases either on single instance of DUCC 1.1.0
> or running both ducc versions simultaneously.
>
> And Now, I am able to run DUCC 1.1.0 alone. So, Only DUCC 1.1.0 is
> configured.
>
> Thanks for your help. :-)
>
> Reshu.
>
>
>
>
> On 12/08/2014 04:24 PM, Lou DeGenaro wrote:
>
>> What is the "unstable behavior" of DUCC 1.1.0 when running it alone?
>>
>> All kinds of bad things can happen if you run 2 DUCCs on the same set of
>> machines. I'm willing to help, but am not sure I can if you are running 2
>> DUCCs - that's fairly complex. Instead I urge you to run a single DUCC
>> 1.1.0 and let's try to fix what's wrong with running it alone.
>>
>> Lou.
>>
>> On Sun, Dec 7, 2014 at 11:40 PM, reshu.agarwal <re...@orkash.com>
>> wrote:
>>
>> Yes, I am running both at same time. But I tried only 1.1.0 version to
>>> check the performance.But, due to unstable behaviour I had to run DUCC
>>> 1.0.0 and DUCC 1.1.0 at the same time. I am running DUCC 1.0.0 for
>>> running
>>> Jobs and DUCC 1.1.0 for testing purpose.
>>>
>>> Do I need to increase heartbeats timing to greater than to 60 sec?
>>> Signature
>>>
>>> **Reshu.
>>>
>>>
>>> On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
>>>
>>> You can fetch the latest code containing the bug fix from SVN and build
>>>> your own snapshot. However, this bug is of minimal impact so there is
>>>> no
>>>> pressing need to do so.
>>>>
>>>> Are you trying to run 1.0 and 1.1 at the same time? This can be very
>>>> tricky. You need to be sure of no overlaps. I highly recommend that
>>>> you
>>>> pick one or the other.
>>>>
>>>> Lou.
>>>>
>>>> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <reshu.agarwal@orkash.com
>>>> >
>>>> wrote:
>>>>
>>>> Dear Lou,
>>>>
>>>>> Thanks for confirming this.
>>>>>
>>>>> Is Bug fixing version available for use?
>>>>>
>>>>> What can be the reason of delaying in heartbeats? Because machines were
>>>>> not able to send heartbeats with in 60 seconds so node gets down in
>>>>> DUCC
>>>>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>>>>
>>>>> My master node is physical and client is on virtual. Can this be a
>>>>> reason
>>>>> for delaying in heartbeats as well as increase of processing time of
>>>>> job?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>>>>
>>>>> Each node has a DUCC Agent daemon that sends heartbeats.
>>>>>
>>>>>> There was a bug discovered after the release of 1.1 whereby the share
>>>>>> calculation is incorrect (a rounding up problem that you observe).
>>>>>> The
>>>>>> impact of this bug should be minimal. The bug has been fixed.
>>>>>>
>>>>>> Lou.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <
>>>>>> reshu.agarwal@orkash.com>
>>>>>> wrote:
>>>>>>
>>>>>> Lou,
>>>>>>
>>>>>> How can a node send heartbeats in DUCC? If you can tell me this I
>>>>>>> will
>>>>>>> be
>>>>>>> able to identify problem of down in my nodes.
>>>>>>>
>>>>>>> The other problem which I am facing is:
>>>>>>>
>>>>>>> Memory(GB):total : 31
>>>>>>> Memory(GB):usable : 16
>>>>>>> Shares:total : 8
>>>>>>> Shares:inuse : 9
>>>>>>>
>>>>>>>
>>>>>>> Means actual RAM which is available is 30 GB so shares available
>>>>>>> should
>>>>>>> be
>>>>>>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>>>>>>> Shares:total : 8.
>>>>>>>
>>>>>>> In DUCC 1.0.0, I don't face this problem.
>>>>>>>
>>>>>>> Please explain me its reason.
>>>>>>>
>>>>>>> Reshu.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>>>>
>>>>>>> Which of these are no understandable? If you hover over the
>>>>>>> column
>>>>>>>
>>>>>>> heading
>>>>>>>> a little more explanation is given (though not much).
>>>>>>>>
>>>>>>>> For example, If you hover over Heartbeat(last) you'll see "The
>>>>>>>> elapsed
>>>>>>>> time
>>>>>>>> (in seconds) since the last heartbeat". This should usually be
>>>>>>>> around
>>>>>>>> 60
>>>>>>>> seconds. On the system I'm looking at live presently, I see a range
>>>>>>>> from
>>>>>>>> 9
>>>>>>>> to 66. If the number gets too large, the DUCC system will consider
>>>>>>>> the
>>>>>>>> node down. As best as I can tell, it looks like your numbers are
>>>>>>>> 58 &
>>>>>>>> 59
>>>>>>>> which is perfect.
>>>>>>>>
>>>>>>>> Lou.
>>>>>>>>
>>>>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <
>>>>>>>> reshu.agarwal@orkash.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please look this stats:
>>>>>>>>
>>>>>>>>> / Status Name Memory(GB):usable Memory(GB):total
>>>>>>>>> Swap(GB):inuse
>>>>>>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse
>>>>>>>>> Heartbeat
>>>>>>>>> (last)
>>>>>>>>> Total 58 70
>>>>>>>>> 0 29 9 29
>>>>>>>>> 3
>>>>>>>>> up S144 36 39
>>>>>>>>> 0 20 8 18 2
>>>>>>>>> 59
>>>>>>>>> down S143 22 31
>>>>>>>>> 0 9 1 11 11
>>>>>>>>> 58
>>>>>>>>> /
>>>>>>>>> I am not able to understand this stats.
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Reshu.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>
Re: DUCC-unstable behaviour od ducc
Posted by "reshu.agarwal" <re...@orkash.com>.
Dear Lou,
My problem has been resolved. I just increased the max time of receiving
Heartbeats of agents.
The "unstable behavior" of DUCC 1.1.0 in my case was the node up and
down problem in both cases either on single instance of DUCC 1.1.0
or running both ducc versions simultaneously.
And Now, I am able to run DUCC 1.1.0 alone. So, Only DUCC 1.1.0 is
configured.
Thanks for your help. :-)
Reshu.
On 12/08/2014 04:24 PM, Lou DeGenaro wrote:
> What is the "unstable behavior" of DUCC 1.1.0 when running it alone?
>
> All kinds of bad things can happen if you run 2 DUCCs on the same set of
> machines. I'm willing to help, but am not sure I can if you are running 2
> DUCCs - that's fairly complex. Instead I urge you to run a single DUCC
> 1.1.0 and let's try to fix what's wrong with running it alone.
>
> Lou.
>
> On Sun, Dec 7, 2014 at 11:40 PM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Yes, I am running both at same time. But I tried only 1.1.0 version to
>> check the performance.But, due to unstable behaviour I had to run DUCC
>> 1.0.0 and DUCC 1.1.0 at the same time. I am running DUCC 1.0.0 for running
>> Jobs and DUCC 1.1.0 for testing purpose.
>>
>> Do I need to increase heartbeats timing to greater than to 60 sec?
>> Signature
>>
>> **Reshu.
>>
>>
>> On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
>>
>>> You can fetch the latest code containing the bug fix from SVN and build
>>> your own snapshot. However, this bug is of minimal impact so there is no
>>> pressing need to do so.
>>>
>>> Are you trying to run 1.0 and 1.1 at the same time? This can be very
>>> tricky. You need to be sure of no overlaps. I highly recommend that you
>>> pick one or the other.
>>>
>>> Lou.
>>>
>>> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <re...@orkash.com>
>>> wrote:
>>>
>>> Dear Lou,
>>>> Thanks for confirming this.
>>>>
>>>> Is Bug fixing version available for use?
>>>>
>>>> What can be the reason of delaying in heartbeats? Because machines were
>>>> not able to send heartbeats with in 60 seconds so node gets down in DUCC
>>>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>>>
>>>> My master node is physical and client is on virtual. Can this be a reason
>>>> for delaying in heartbeats as well as increase of processing time of job?
>>>>
>>>> Thanks.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>>>
>>>> Each node has a DUCC Agent daemon that sends heartbeats.
>>>>> There was a bug discovered after the release of 1.1 whereby the share
>>>>> calculation is incorrect (a rounding up problem that you observe). The
>>>>> impact of this bug should be minimal. The bug has been fixed.
>>>>>
>>>>> Lou.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <
>>>>> reshu.agarwal@orkash.com>
>>>>> wrote:
>>>>>
>>>>> Lou,
>>>>>
>>>>>> How can a node send heartbeats in DUCC? If you can tell me this I will
>>>>>> be
>>>>>> able to identify problem of down in my nodes.
>>>>>>
>>>>>> The other problem which I am facing is:
>>>>>>
>>>>>> Memory(GB):total : 31
>>>>>> Memory(GB):usable : 16
>>>>>> Shares:total : 8
>>>>>> Shares:inuse : 9
>>>>>>
>>>>>>
>>>>>> Means actual RAM which is available is 30 GB so shares available should
>>>>>> be
>>>>>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>>>>>> Shares:total : 8.
>>>>>>
>>>>>> In DUCC 1.0.0, I don't face this problem.
>>>>>>
>>>>>> Please explain me its reason.
>>>>>>
>>>>>> Reshu.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>>>
>>>>>> Which of these are no understandable? If you hover over the column
>>>>>>
>>>>>>> heading
>>>>>>> a little more explanation is given (though not much).
>>>>>>>
>>>>>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>>>>>> time
>>>>>>> (in seconds) since the last heartbeat". This should usually be around
>>>>>>> 60
>>>>>>> seconds. On the system I'm looking at live presently, I see a range
>>>>>>> from
>>>>>>> 9
>>>>>>> to 66. If the number gets too large, the DUCC system will consider
>>>>>>> the
>>>>>>> node down. As best as I can tell, it looks like your numbers are 58 &
>>>>>>> 59
>>>>>>> which is perfect.
>>>>>>>
>>>>>>> Lou.
>>>>>>>
>>>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <
>>>>>>> reshu.agarwal@orkash.com
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please look this stats:
>>>>>>>> / Status Name Memory(GB):usable Memory(GB):total
>>>>>>>> Swap(GB):inuse
>>>>>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse
>>>>>>>> Heartbeat
>>>>>>>> (last)
>>>>>>>> Total 58 70
>>>>>>>> 0 29 9 29
>>>>>>>> 3
>>>>>>>> up S144 36 39
>>>>>>>> 0 20 8 18 2
>>>>>>>> 59
>>>>>>>> down S143 22 31
>>>>>>>> 0 9 1 11 11
>>>>>>>> 58
>>>>>>>> /
>>>>>>>> I am not able to understand this stats.
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>> Reshu.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
Re: DUCC-unstable behaviour od ducc
Posted by Lou DeGenaro <lo...@gmail.com>.
What is the "unstable behavior" of DUCC 1.1.0 when running it alone?
All kinds of bad things can happen if you run 2 DUCCs on the same set of
machines. I'm willing to help, but am not sure I can if you are running 2
DUCCs - that's fairly complex. Instead I urge you to run a single DUCC
1.1.0 and let's try to fix what's wrong with running it alone.
Lou.
On Sun, Dec 7, 2014 at 11:40 PM, reshu.agarwal <re...@orkash.com>
wrote:
>
> Yes, I am running both at same time. But I tried only 1.1.0 version to
> check the performance.But, due to unstable behaviour I had to run DUCC
> 1.0.0 and DUCC 1.1.0 at the same time. I am running DUCC 1.0.0 for running
> Jobs and DUCC 1.1.0 for testing purpose.
>
> Do I need to increase heartbeats timing to greater than to 60 sec?
> Signature
>
> **Reshu.
>
>
> On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
>
>> You can fetch the latest code containing the bug fix from SVN and build
>> your own snapshot. However, this bug is of minimal impact so there is no
>> pressing need to do so.
>>
>> Are you trying to run 1.0 and 1.1 at the same time? This can be very
>> tricky. You need to be sure of no overlaps. I highly recommend that you
>> pick one or the other.
>>
>> Lou.
>>
>> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <re...@orkash.com>
>> wrote:
>>
>> Dear Lou,
>>>
>>> Thanks for confirming this.
>>>
>>> Is Bug fixing version available for use?
>>>
>>> What can be the reason of delaying in heartbeats? Because machines were
>>> not able to send heartbeats with in 60 seconds so node gets down in DUCC
>>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>>
>>> My master node is physical and client is on virtual. Can this be a reason
>>> for delaying in heartbeats as well as increase of processing time of job?
>>>
>>> Thanks.
>>>
>>> Reshu.
>>>
>>>
>>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>>
>>> Each node has a DUCC Agent daemon that sends heartbeats.
>>>>
>>>> There was a bug discovered after the release of 1.1 whereby the share
>>>> calculation is incorrect (a rounding up problem that you observe). The
>>>> impact of this bug should be minimal. The bug has been fixed.
>>>>
>>>> Lou.
>>>>
>>>>
>>>>
>>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <
>>>> reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>> Lou,
>>>>
>>>>> How can a node send heartbeats in DUCC? If you can tell me this I will
>>>>> be
>>>>> able to identify problem of down in my nodes.
>>>>>
>>>>> The other problem which I am facing is:
>>>>>
>>>>> Memory(GB):total : 31
>>>>> Memory(GB):usable : 16
>>>>> Shares:total : 8
>>>>> Shares:inuse : 9
>>>>>
>>>>>
>>>>> Means actual RAM which is available is 30 GB so shares available should
>>>>> be
>>>>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>>>>> Shares:total : 8.
>>>>>
>>>>> In DUCC 1.0.0, I don't face this problem.
>>>>>
>>>>> Please explain me its reason.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>>
>>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>>
>>>>> Which of these are no understandable? If you hover over the column
>>>>>
>>>>>> heading
>>>>>> a little more explanation is given (though not much).
>>>>>>
>>>>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>>>>> time
>>>>>> (in seconds) since the last heartbeat". This should usually be around
>>>>>> 60
>>>>>> seconds. On the system I'm looking at live presently, I see a range
>>>>>> from
>>>>>> 9
>>>>>> to 66. If the number gets too large, the DUCC system will consider
>>>>>> the
>>>>>> node down. As best as I can tell, it looks like your numbers are 58 &
>>>>>> 59
>>>>>> which is perfect.
>>>>>>
>>>>>> Lou.
>>>>>>
>>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <
>>>>>> reshu.agarwal@orkash.com
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Please look this stats:
>>>>>>>
>>>>>>> / Status Name Memory(GB):usable Memory(GB):total
>>>>>>> Swap(GB):inuse
>>>>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse
>>>>>>> Heartbeat
>>>>>>> (last)
>>>>>>> Total 58 70
>>>>>>> 0 29 9 29
>>>>>>> 3
>>>>>>> up S144 36 39
>>>>>>> 0 20 8 18 2
>>>>>>> 59
>>>>>>> down S143 22 31
>>>>>>> 0 9 1 11 11
>>>>>>> 58
>>>>>>> /
>>>>>>> I am not able to understand this stats.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Reshu.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>
Re: DUCC-unstable behaviour od ducc
Posted by "reshu.agarwal" <re...@orkash.com>.
Yes, I am running both at same time. But I tried only 1.1.0 version to
check the performance.But, due to unstable behaviour I had to run DUCC
1.0.0 and DUCC 1.1.0 at the same time. I am running DUCC 1.0.0 for
running Jobs and DUCC 1.1.0 for testing purpose.
Do I need to increase heartbeats timing to greater than to 60 sec?
Signature
**Reshu.
On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
> You can fetch the latest code containing the bug fix from SVN and build
> your own snapshot. However, this bug is of minimal impact so there is no
> pressing need to do so.
>
> Are you trying to run 1.0 and 1.1 at the same time? This can be very
> tricky. You need to be sure of no overlaps. I highly recommend that you
> pick one or the other.
>
> Lou.
>
> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Dear Lou,
>>
>> Thanks for confirming this.
>>
>> Is Bug fixing version available for use?
>>
>> What can be the reason of delaying in heartbeats? Because machines were
>> not able to send heartbeats with in 60 seconds so node gets down in DUCC
>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>
>> My master node is physical and client is on virtual. Can this be a reason
>> for delaying in heartbeats as well as increase of processing time of job?
>>
>> Thanks.
>>
>> Reshu.
>>
>>
>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>
>>> Each node has a DUCC Agent daemon that sends heartbeats.
>>>
>>> There was a bug discovered after the release of 1.1 whereby the share
>>> calculation is incorrect (a rounding up problem that you observe). The
>>> impact of this bug should be minimal. The bug has been fixed.
>>>
>>> Lou.
>>>
>>>
>>>
>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <re...@orkash.com>
>>> wrote:
>>>
>>> Lou,
>>>> How can a node send heartbeats in DUCC? If you can tell me this I will be
>>>> able to identify problem of down in my nodes.
>>>>
>>>> The other problem which I am facing is:
>>>>
>>>> Memory(GB):total : 31
>>>> Memory(GB):usable : 16
>>>> Shares:total : 8
>>>> Shares:inuse : 9
>>>>
>>>>
>>>> Means actual RAM which is available is 30 GB so shares available should
>>>> be
>>>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>>>> Shares:total : 8.
>>>>
>>>> In DUCC 1.0.0, I don't face this problem.
>>>>
>>>> Please explain me its reason.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>
>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>
>>>> Which of these are no understandable? If you hover over the column
>>>>> heading
>>>>> a little more explanation is given (though not much).
>>>>>
>>>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>>>> time
>>>>> (in seconds) since the last heartbeat". This should usually be around
>>>>> 60
>>>>> seconds. On the system I'm looking at live presently, I see a range
>>>>> from
>>>>> 9
>>>>> to 66. If the number gets too large, the DUCC system will consider the
>>>>> node down. As best as I can tell, it looks like your numbers are 58 &
>>>>> 59
>>>>> which is perfect.
>>>>>
>>>>> Lou.
>>>>>
>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <reshu.agarwal@orkash.com
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> Please look this stats:
>>>>>>
>>>>>> / Status Name Memory(GB):usable Memory(GB):total
>>>>>> Swap(GB):inuse
>>>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse
>>>>>> Heartbeat
>>>>>> (last)
>>>>>> Total 58 70
>>>>>> 0 29 9 29
>>>>>> 3
>>>>>> up S144 36 39
>>>>>> 0 20 8 18 2
>>>>>> 59
>>>>>> down S143 22 31
>>>>>> 0 9 1 11 11
>>>>>> 58
>>>>>> /
>>>>>> I am not able to understand this stats.
>>>>>>
>>>>>> Please help.
>>>>>>
>>>>>> Reshu.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
Re: DUCC-unstable behaviour od ducc
Posted by Lou DeGenaro <lo...@gmail.com>.
You can fetch the latest code containing the bug fix from SVN and build
your own snapshot. However, this bug is of minimal impact so there is no
pressing need to do so.
Are you trying to run 1.0 and 1.1 at the same time? This can be very
tricky. You need to be sure of no overlaps. I highly recommend that you
pick one or the other.
Lou.
On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Dear Lou,
>
> Thanks for confirming this.
>
> Is Bug fixing version available for use?
>
> What can be the reason of delaying in heartbeats? Because machines were
> not able to send heartbeats with in 60 seconds so node gets down in DUCC
> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>
> My master node is physical and client is on virtual. Can this be a reason
> for delaying in heartbeats as well as increase of processing time of job?
>
> Thanks.
>
> Reshu.
>
>
> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>
>> Each node has a DUCC Agent daemon that sends heartbeats.
>>
>> There was a bug discovered after the release of 1.1 whereby the share
>> calculation is incorrect (a rounding up problem that you observe). The
>> impact of this bug should be minimal. The bug has been fixed.
>>
>> Lou.
>>
>>
>>
>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <re...@orkash.com>
>> wrote:
>>
>> Lou,
>>>
>>> How can a node send heartbeats in DUCC? If you can tell me this I will be
>>> able to identify problem of down in my nodes.
>>>
>>> The other problem which I am facing is:
>>>
>>> Memory(GB):total : 31
>>> Memory(GB):usable : 16
>>> Shares:total : 8
>>> Shares:inuse : 9
>>>
>>>
>>> Means actual RAM which is available is 30 GB so shares available should
>>> be
>>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>>> Shares:total : 8.
>>>
>>> In DUCC 1.0.0, I don't face this problem.
>>>
>>> Please explain me its reason.
>>>
>>> Reshu.
>>>
>>>
>>>
>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>
>>> Which of these are no understandable? If you hover over the column
>>>> heading
>>>> a little more explanation is given (though not much).
>>>>
>>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>>> time
>>>> (in seconds) since the last heartbeat". This should usually be around
>>>> 60
>>>> seconds. On the system I'm looking at live presently, I see a range
>>>> from
>>>> 9
>>>> to 66. If the number gets too large, the DUCC system will consider the
>>>> node down. As best as I can tell, it looks like your numbers are 58 &
>>>> 59
>>>> which is perfect.
>>>>
>>>> Lou.
>>>>
>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <reshu.agarwal@orkash.com
>>>> >
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> Please look this stats:
>>>>>
>>>>> / Status Name Memory(GB):usable Memory(GB):total
>>>>> Swap(GB):inuse
>>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse
>>>>> Heartbeat
>>>>> (last)
>>>>> Total 58 70
>>>>> 0 29 9 29
>>>>> 3
>>>>> up S144 36 39
>>>>> 0 20 8 18 2
>>>>> 59
>>>>> down S143 22 31
>>>>> 0 9 1 11 11
>>>>> 58
>>>>> /
>>>>> I am not able to understand this stats.
>>>>>
>>>>> Please help.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>>
>>>>>
>
Re: DUCC-unstable behaviour od ducc
Posted by "reshu.agarwal" <re...@orkash.com>.
Dear Lou,
Thanks for confirming this.
Is Bug fixing version available for use?
What can be the reason of delaying in heartbeats? Because machines were
not able to send heartbeats with in 60 seconds so node gets down in DUCC
1.1.0 but DUCC 1.0.0 is working fine on same machines.
My master node is physical and client is on virtual. Can this be a
reason for delaying in heartbeats as well as increase of processing time
of job?
Thanks.
Reshu.
On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
> Each node has a DUCC Agent daemon that sends heartbeats.
>
> There was a bug discovered after the release of 1.1 whereby the share
> calculation is incorrect (a rounding up problem that you observe). The
> impact of this bug should be minimal. The bug has been fixed.
>
> Lou.
>
>
>
> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Lou,
>>
>> How can a node send heartbeats in DUCC? If you can tell me this I will be
>> able to identify problem of down in my nodes.
>>
>> The other problem which I am facing is:
>>
>> Memory(GB):total : 31
>> Memory(GB):usable : 16
>> Shares:total : 8
>> Shares:inuse : 9
>>
>>
>> Means actual RAM which is available is 30 GB so shares available should be
>> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
>> Shares:total : 8.
>>
>> In DUCC 1.0.0, I don't face this problem.
>>
>> Please explain me its reason.
>>
>> Reshu.
>>
>>
>>
>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>
>>> Which of these are no understandable? If you hover over the column
>>> heading
>>> a little more explanation is given (though not much).
>>>
>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>> time
>>> (in seconds) since the last heartbeat". This should usually be around 60
>>> seconds. On the system I'm looking at live presently, I see a range from
>>> 9
>>> to 66. If the number gets too large, the DUCC system will consider the
>>> node down. As best as I can tell, it looks like your numbers are 58 & 59
>>> which is perfect.
>>>
>>> Lou.
>>>
>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <re...@orkash.com>
>>> wrote:
>>>
>>> Hi,
>>>> Please look this stats:
>>>>
>>>> / Status Name Memory(GB):usable Memory(GB):total Swap(GB):inuse
>>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse Heartbeat
>>>> (last)
>>>> Total 58 70
>>>> 0 29 9 29
>>>> 3
>>>> up S144 36 39
>>>> 0 20 8 18 2
>>>> 59
>>>> down S143 22 31
>>>> 0 9 1 11 11
>>>> 58
>>>> /
>>>> I am not able to understand this stats.
>>>>
>>>> Please help.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>
Re: DUCC-unstable behaviour od ducc
Posted by Lou DeGenaro <lo...@gmail.com>.
Each node has a DUCC Agent daemon that sends heartbeats.
There was a bug discovered after the release of 1.1 whereby the share
calculation is incorrect (a rounding up problem that you observe). The
impact of this bug should be minimal. The bug has been fixed.
Lou.
On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Lou,
>
> How can a node send heartbeats in DUCC? If you can tell me this I will be
> able to identify problem of down in my nodes.
>
> The other problem which I am facing is:
>
> Memory(GB):total : 31
> Memory(GB):usable : 16
> Shares:total : 8
> Shares:inuse : 9
>
>
> Means actual RAM which is available is 30 GB so shares available should be
> 15(2GB per share) but it is showing Memory(GB):usable : 16 and
> Shares:total : 8.
>
> In DUCC 1.0.0, I don't face this problem.
>
> Please explain me its reason.
>
> Reshu.
>
>
>
> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>
>> Which of these are no understandable? If you hover over the column
>> heading
>> a little more explanation is given (though not much).
>>
>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>> time
>> (in seconds) since the last heartbeat". This should usually be around 60
>> seconds. On the system I'm looking at live presently, I see a range from
>> 9
>> to 66. If the number gets too large, the DUCC system will consider the
>> node down. As best as I can tell, it looks like your numbers are 58 & 59
>> which is perfect.
>>
>> Lou.
>>
>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <re...@orkash.com>
>> wrote:
>>
>> Hi,
>>>
>>> Please look this stats:
>>>
>>> / Status Name Memory(GB):usable Memory(GB):total Swap(GB):inuse
>>> Swap(GB):free Alien PIDs Shares:total Shares:inuse Heartbeat
>>> (last)
>>> Total 58 70
>>> 0 29 9 29
>>> 3
>>> up S144 36 39
>>> 0 20 8 18 2
>>> 59
>>> down S143 22 31
>>> 0 9 1 11 11
>>> 58
>>> /
>>> I am not able to understand this stats.
>>>
>>> Please help.
>>>
>>> Reshu.
>>>
>>>
>>>
>
Re: DUCC-unstable behaviour od ducc
Posted by "reshu.agarwal" <re...@orkash.com>.
Lou,
How can a node send heartbeats in DUCC? If you can tell me this I will
be able to identify problem of down in my nodes.
The other problem which I am facing is:
Memory(GB):total : 31
Memory(GB):usable : 16
Shares:total : 8
Shares:inuse : 9
Means actual RAM which is available is 30 GB so shares available should
be 15(2GB per share) but it is showing Memory(GB):usable : 16 and
Shares:total : 8.
In DUCC 1.0.0, I don't face this problem.
Please explain me its reason.
Reshu.
On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
> Which of these are no understandable? If you hover over the column heading
> a little more explanation is given (though not much).
>
> For example, If you hover over Heartbeat(last) you'll see "The elapsed time
> (in seconds) since the last heartbeat". This should usually be around 60
> seconds. On the system I'm looking at live presently, I see a range from 9
> to 66. If the number gets too large, the DUCC system will consider the
> node down. As best as I can tell, it looks like your numbers are 58 & 59
> which is perfect.
>
> Lou.
>
> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Hi,
>>
>> Please look this stats:
>>
>> / Status Name Memory(GB):usable Memory(GB):total Swap(GB):inuse
>> Swap(GB):free Alien PIDs Shares:total Shares:inuse Heartbeat
>> (last)
>> Total 58 70
>> 0 29 9 29
>> 3
>> up S144 36 39
>> 0 20 8 18 2 59
>> down S143 22 31
>> 0 9 1 11 11 58
>> /
>> I am not able to understand this stats.
>>
>> Please help.
>>
>> Reshu.
>>
>>
Re: DUCC-unstable behaviour od ducc
Posted by Lou DeGenaro <lo...@gmail.com>.
Which of these are no understandable? If you hover over the column heading
a little more explanation is given (though not much).
For example, If you hover over Heartbeat(last) you'll see "The elapsed time
(in seconds) since the last heartbeat". This should usually be around 60
seconds. On the system I'm looking at live presently, I see a range from 9
to 66. If the number gets too large, the DUCC system will consider the
node down. As best as I can tell, it looks like your numbers are 58 & 59
which is perfect.
Lou.
On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Hi,
>
> Please look this stats:
>
> / Status Name Memory(GB):usable Memory(GB):total Swap(GB):inuse
> Swap(GB):free Alien PIDs Shares:total Shares:inuse Heartbeat
> (last)
> Total 58 70
> 0 29 9 29
> 3
> up S144 36 39
> 0 20 8 18 2 59
> down S143 22 31
> 0 9 1 11 11 58
> /
> I am not able to understand this stats.
>
> Please help.
>
> Reshu.
>
>