You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Andrija Panic <an...@gmail.com> on 2015/02/20 13:06:02 UTC

Agent dies every night/morning.... memory violation

Hi,

I have crazy agent on one of the hosts, that is being killed each morning
and I found this in /var/log/audit.log:

type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6

I dont remember changing anything on the system, but this keeps happening
each morning arrond same time 5.20am-5.40am.

I'm wondering what the hack is happening, any suggestions where to
troubleshoot ?
Will check logs in details anyway...

-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
Thanks guys,

I already disabled all cron jobs and everythgin (did not disable logrotate
though...) - will share my findings.

Thanks a lot for hint.

On 23 February 2015 at 17:30, Simon Weller <sw...@ena.com> wrote:

> I agree with Marcus. I suggest you start monitoring everything that's
> going on around this time frame.
> Maybe dump available memory, and IO (both disk and network) to a file
> every minute or so, and see if you can correlate it to something in
> particular that might be happening on the underlying server, or the network
> connectivity to that server. Maybe slowly move  VMs one at a time to a
> different host and see if the issue follows a particular VM.
>
> In the mean time in order to reduce the affect of this problem, you could
> use a process monitoring like Monit to watch the PID and restart
> cloudstack-agent if a failure is detected.
>
> - Si
>
> ________________________________________
> From: Marcus <sh...@gmail.com>
> Sent: Monday, February 23, 2015 10:21 AM
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Subject: Re: Agent dies every night/morning.... memory violation
>
> It doesn't really sound like an agent problem, but some other root
> problem that is causing issues for the agent. Perhaps it is specific
> to the host simply because there is a particular VM that always runs
> on that host and the VM itself is triggering the issue. Perhaps a
> heavy logrotate or cron job on the vm causes issues for librados. Just
> grasping at straws here. From the output provided it does seem that
> the libvirt bindings that include ceph code are terminating the agent
> execution.  My guess is that if you focus on "why this host" as
> opposed to "what's going on", you'll find the answer to both. Sorry, I
> know that's not much help.
>
> On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com>
> wrote:
> > Anybody?, before I start to cry :(
> >
> > On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com>
> wrote:
> >
> >> HI Simon,
> >>
> >> selinux is disabled, I have just double checked.
> >>
> >> BTW, this is what I can see in the cloudstack-agent.err log - seems like
> >> some CEPH related issues, but not sure why would agent die...
> >> If I recall correclty, this might be happening since the CEPH update
> from
> >> 0.80.3? to 0.87 - and this seesm like some crash in librados....
> >>
> >>
> >> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
> >> LTTng-UST per-user tracing. (in setup_local_apps() at
> lttng-ust-comm.c:305)
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt: Storage Driver error : failed to remove volume
> >> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource
> busy
> >> ./log/SubsystemMap.h: In function 'bool
> >> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
> >> 7f04427fc700 time 2015-02-21 06:39:38.839210
> >> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
> >>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> >>  1: (()+0x1fe223) [0x7f060c932223]
> >>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
> >>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
> >>  4: (()+0x79d1) [0x7f06605ee9d1]
> >>  5: (clone()+0x6d) [0x7f066033bb5d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed
> >> to interpret this.
> >> terminate called after throwing an instance of 'ceph::FailedAssertion'
> >> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
> >>
> >> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
> >>
> >>> Andrija,
> >>>
> >>> What is SELinux set to on this host?
> >>>
> >>>
> >>> - SI
> >>>
> >>>
> >>> ________________________________________
> >>> From: Andrija Panic <an...@gmail.com>
> >>> Sent: Friday, February 20, 2015 6:06 AM
> >>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> >>> Subject: Agent dies every night/morning.... memory violation
> >>>
> >>> Hi,
> >>>
> >>> I have crazy agent on one of the hosts, that is being killed each
> morning
> >>> and I found this in /var/log/audit.log:
> >>>
> >>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
> >>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
> >>>
> >>> I dont remember changing anything on the system, but this keeps
> happening
> >>> each morning arrond same time 5.20am-5.40am.
> >>>
> >>> I'm wondering what the hack is happening, any suggestions where to
> >>> troubleshoot ?
> >>> Will check logs in details anyway...
> >>>
> >>> --
> >>>
> >>> Andrija Panić
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
Thanks guys,

I already disabled all cron jobs and everythgin (did not disable logrotate
though...) - will share my findings.

Thanks a lot for hint.

On 23 February 2015 at 17:30, Simon Weller <sw...@ena.com> wrote:

> I agree with Marcus. I suggest you start monitoring everything that's
> going on around this time frame.
> Maybe dump available memory, and IO (both disk and network) to a file
> every minute or so, and see if you can correlate it to something in
> particular that might be happening on the underlying server, or the network
> connectivity to that server. Maybe slowly move  VMs one at a time to a
> different host and see if the issue follows a particular VM.
>
> In the mean time in order to reduce the affect of this problem, you could
> use a process monitoring like Monit to watch the PID and restart
> cloudstack-agent if a failure is detected.
>
> - Si
>
> ________________________________________
> From: Marcus <sh...@gmail.com>
> Sent: Monday, February 23, 2015 10:21 AM
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Subject: Re: Agent dies every night/morning.... memory violation
>
> It doesn't really sound like an agent problem, but some other root
> problem that is causing issues for the agent. Perhaps it is specific
> to the host simply because there is a particular VM that always runs
> on that host and the VM itself is triggering the issue. Perhaps a
> heavy logrotate or cron job on the vm causes issues for librados. Just
> grasping at straws here. From the output provided it does seem that
> the libvirt bindings that include ceph code are terminating the agent
> execution.  My guess is that if you focus on "why this host" as
> opposed to "what's going on", you'll find the answer to both. Sorry, I
> know that's not much help.
>
> On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com>
> wrote:
> > Anybody?, before I start to cry :(
> >
> > On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com>
> wrote:
> >
> >> HI Simon,
> >>
> >> selinux is disabled, I have just double checked.
> >>
> >> BTW, this is what I can see in the cloudstack-agent.err log - seems like
> >> some CEPH related issues, but not sure why would agent die...
> >> If I recall correclty, this might be happening since the CEPH update
> from
> >> 0.80.3? to 0.87 - and this seesm like some crash in librados....
> >>
> >>
> >> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
> >> LTTng-UST per-user tracing. (in setup_local_apps() at
> lttng-ust-comm.c:305)
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt:  error : name in virDomainLookupByName must not be NULL
> >> libvirt: Storage Driver error : failed to remove volume
> >> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource
> busy
> >> ./log/SubsystemMap.h: In function 'bool
> >> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
> >> 7f04427fc700 time 2015-02-21 06:39:38.839210
> >> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
> >>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> >>  1: (()+0x1fe223) [0x7f060c932223]
> >>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
> >>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
> >>  4: (()+0x79d1) [0x7f06605ee9d1]
> >>  5: (clone()+0x6d) [0x7f066033bb5d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed
> >> to interpret this.
> >> terminate called after throwing an instance of 'ceph::FailedAssertion'
> >> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
> >>
> >> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
> >>
> >>> Andrija,
> >>>
> >>> What is SELinux set to on this host?
> >>>
> >>>
> >>> - SI
> >>>
> >>>
> >>> ________________________________________
> >>> From: Andrija Panic <an...@gmail.com>
> >>> Sent: Friday, February 20, 2015 6:06 AM
> >>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> >>> Subject: Agent dies every night/morning.... memory violation
> >>>
> >>> Hi,
> >>>
> >>> I have crazy agent on one of the hosts, that is being killed each
> morning
> >>> and I found this in /var/log/audit.log:
> >>>
> >>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
> >>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
> >>>
> >>> I dont remember changing anything on the system, but this keeps
> happening
> >>> each morning arrond same time 5.20am-5.40am.
> >>>
> >>> I'm wondering what the hack is happening, any suggestions where to
> >>> troubleshoot ?
> >>> Will check logs in details anyway...
> >>>
> >>> --
> >>>
> >>> Andrija Panić
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Simon Weller <sw...@ena.com>.
I agree with Marcus. I suggest you start monitoring everything that's going on around this time frame. 
Maybe dump available memory, and IO (both disk and network) to a file every minute or so, and see if you can correlate it to something in particular that might be happening on the underlying server, or the network connectivity to that server. Maybe slowly move  VMs one at a time to a different host and see if the issue follows a particular VM.

In the mean time in order to reduce the affect of this problem, you could use a process monitoring like Monit to watch the PID and restart cloudstack-agent if a failure is detected.

- Si

________________________________________
From: Marcus <sh...@gmail.com>
Sent: Monday, February 23, 2015 10:21 AM
To: dev@cloudstack.apache.org
Cc: users@cloudstack.apache.org
Subject: Re: Agent dies every night/morning.... memory violation

It doesn't really sound like an agent problem, but some other root
problem that is causing issues for the agent. Perhaps it is specific
to the host simply because there is a particular VM that always runs
on that host and the VM itself is triggering the issue. Perhaps a
heavy logrotate or cron job on the vm causes issues for librados. Just
grasping at straws here. From the output provided it does seem that
the libvirt bindings that include ceph code are terminating the agent
execution.  My guess is that if you focus on "why this host" as
opposed to "what's going on", you'll find the answer to both. Sorry, I
know that's not much help.

On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com> wrote:
> Anybody?, before I start to cry :(
>
> On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:
>
>> HI Simon,
>>
>> selinux is disabled, I have just double checked.
>>
>> BTW, this is what I can see in the cloudstack-agent.err log - seems like
>> some CEPH related issues, but not sure why would agent die...
>> If I recall correclty, this might be happening since the CEPH update from
>> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>>
>>
>> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
>> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt: Storage Driver error : failed to remove volume
>> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
>> ./log/SubsystemMap.h: In function 'bool
>> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
>> 7f04427fc700 time 2015-02-21 06:39:38.839210
>> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>>  1: (()+0x1fe223) [0x7f060c932223]
>>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>>  4: (()+0x79d1) [0x7f06605ee9d1]
>>  5: (clone()+0x6d) [0x7f066033bb5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>>
>> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>>
>>> Andrija,
>>>
>>> What is SELinux set to on this host?
>>>
>>>
>>> - SI
>>>
>>>
>>> ________________________________________
>>> From: Andrija Panic <an...@gmail.com>
>>> Sent: Friday, February 20, 2015 6:06 AM
>>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>>> Subject: Agent dies every night/morning.... memory violation
>>>
>>> Hi,
>>>
>>> I have crazy agent on one of the hosts, that is being killed each morning
>>> and I found this in /var/log/audit.log:
>>>
>>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>>
>>> I dont remember changing anything on the system, but this keeps happening
>>> each morning arrond same time 5.20am-5.40am.
>>>
>>> I'm wondering what the hack is happening, any suggestions where to
>>> troubleshoot ?
>>> Will check logs in details anyway...
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Simon Weller <sw...@ena.com>.
I agree with Marcus. I suggest you start monitoring everything that's going on around this time frame. 
Maybe dump available memory, and IO (both disk and network) to a file every minute or so, and see if you can correlate it to something in particular that might be happening on the underlying server, or the network connectivity to that server. Maybe slowly move  VMs one at a time to a different host and see if the issue follows a particular VM.

In the mean time in order to reduce the affect of this problem, you could use a process monitoring like Monit to watch the PID and restart cloudstack-agent if a failure is detected.

- Si

________________________________________
From: Marcus <sh...@gmail.com>
Sent: Monday, February 23, 2015 10:21 AM
To: dev@cloudstack.apache.org
Cc: users@cloudstack.apache.org
Subject: Re: Agent dies every night/morning.... memory violation

It doesn't really sound like an agent problem, but some other root
problem that is causing issues for the agent. Perhaps it is specific
to the host simply because there is a particular VM that always runs
on that host and the VM itself is triggering the issue. Perhaps a
heavy logrotate or cron job on the vm causes issues for librados. Just
grasping at straws here. From the output provided it does seem that
the libvirt bindings that include ceph code are terminating the agent
execution.  My guess is that if you focus on "why this host" as
opposed to "what's going on", you'll find the answer to both. Sorry, I
know that's not much help.

On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com> wrote:
> Anybody?, before I start to cry :(
>
> On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:
>
>> HI Simon,
>>
>> selinux is disabled, I have just double checked.
>>
>> BTW, this is what I can see in the cloudstack-agent.err log - seems like
>> some CEPH related issues, but not sure why would agent die...
>> If I recall correclty, this might be happening since the CEPH update from
>> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>>
>>
>> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
>> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt: Storage Driver error : failed to remove volume
>> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
>> ./log/SubsystemMap.h: In function 'bool
>> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
>> 7f04427fc700 time 2015-02-21 06:39:38.839210
>> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>>  1: (()+0x1fe223) [0x7f060c932223]
>>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>>  4: (()+0x79d1) [0x7f06605ee9d1]
>>  5: (clone()+0x6d) [0x7f066033bb5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>>
>> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>>
>>> Andrija,
>>>
>>> What is SELinux set to on this host?
>>>
>>>
>>> - SI
>>>
>>>
>>> ________________________________________
>>> From: Andrija Panic <an...@gmail.com>
>>> Sent: Friday, February 20, 2015 6:06 AM
>>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>>> Subject: Agent dies every night/morning.... memory violation
>>>
>>> Hi,
>>>
>>> I have crazy agent on one of the hosts, that is being killed each morning
>>> and I found this in /var/log/audit.log:
>>>
>>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>>
>>> I dont remember changing anything on the system, but this keeps happening
>>> each morning arrond same time 5.20am-5.40am.
>>>
>>> I'm wondering what the hack is happening, any suggestions where to
>>> troubleshoot ?
>>> Will check logs in details anyway...
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Marcus <sh...@gmail.com>.
It doesn't really sound like an agent problem, but some other root
problem that is causing issues for the agent. Perhaps it is specific
to the host simply because there is a particular VM that always runs
on that host and the VM itself is triggering the issue. Perhaps a
heavy logrotate or cron job on the vm causes issues for librados. Just
grasping at straws here. From the output provided it does seem that
the libvirt bindings that include ceph code are terminating the agent
execution.  My guess is that if you focus on "why this host" as
opposed to "what's going on", you'll find the answer to both. Sorry, I
know that's not much help.

On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com> wrote:
> Anybody?, before I start to cry :(
>
> On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:
>
>> HI Simon,
>>
>> selinux is disabled, I have just double checked.
>>
>> BTW, this is what I can see in the cloudstack-agent.err log - seems like
>> some CEPH related issues, but not sure why would agent die...
>> If I recall correclty, this might be happening since the CEPH update from
>> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>>
>>
>> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
>> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt: Storage Driver error : failed to remove volume
>> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
>> ./log/SubsystemMap.h: In function 'bool
>> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
>> 7f04427fc700 time 2015-02-21 06:39:38.839210
>> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>>  1: (()+0x1fe223) [0x7f060c932223]
>>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>>  4: (()+0x79d1) [0x7f06605ee9d1]
>>  5: (clone()+0x6d) [0x7f066033bb5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>>
>> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>>
>>> Andrija,
>>>
>>> What is SELinux set to on this host?
>>>
>>>
>>> - SI
>>>
>>>
>>> ________________________________________
>>> From: Andrija Panic <an...@gmail.com>
>>> Sent: Friday, February 20, 2015 6:06 AM
>>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>>> Subject: Agent dies every night/morning.... memory violation
>>>
>>> Hi,
>>>
>>> I have crazy agent on one of the hosts, that is being killed each morning
>>> and I found this in /var/log/audit.log:
>>>
>>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>>
>>> I dont remember changing anything on the system, but this keeps happening
>>> each morning arrond same time 5.20am-5.40am.
>>>
>>> I'm wondering what the hack is happening, any suggestions where to
>>> troubleshoot ?
>>> Will check logs in details anyway...
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Marcus <sh...@gmail.com>.
It doesn't really sound like an agent problem, but some other root
problem that is causing issues for the agent. Perhaps it is specific
to the host simply because there is a particular VM that always runs
on that host and the VM itself is triggering the issue. Perhaps a
heavy logrotate or cron job on the vm causes issues for librados. Just
grasping at straws here. From the output provided it does seem that
the libvirt bindings that include ceph code are terminating the agent
execution.  My guess is that if you focus on "why this host" as
opposed to "what's going on", you'll find the answer to both. Sorry, I
know that's not much help.

On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <an...@gmail.com> wrote:
> Anybody?, before I start to cry :(
>
> On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:
>
>> HI Simon,
>>
>> selinux is disabled, I have just double checked.
>>
>> BTW, this is what I can see in the cloudstack-agent.err log - seems like
>> some CEPH related issues, but not sure why would agent die...
>> If I recall correclty, this might be happening since the CEPH update from
>> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>>
>>
>> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
>> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt:  error : name in virDomainLookupByName must not be NULL
>> libvirt: Storage Driver error : failed to remove volume
>> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
>> ./log/SubsystemMap.h: In function 'bool
>> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
>> 7f04427fc700 time 2015-02-21 06:39:38.839210
>> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>>  1: (()+0x1fe223) [0x7f060c932223]
>>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>>  4: (()+0x79d1) [0x7f06605ee9d1]
>>  5: (clone()+0x6d) [0x7f066033bb5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>>
>> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>>
>>> Andrija,
>>>
>>> What is SELinux set to on this host?
>>>
>>>
>>> - SI
>>>
>>>
>>> ________________________________________
>>> From: Andrija Panic <an...@gmail.com>
>>> Sent: Friday, February 20, 2015 6:06 AM
>>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>>> Subject: Agent dies every night/morning.... memory violation
>>>
>>> Hi,
>>>
>>> I have crazy agent on one of the hosts, that is being killed each morning
>>> and I found this in /var/log/audit.log:
>>>
>>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>>
>>> I dont remember changing anything on the system, but this keeps happening
>>> each morning arrond same time 5.20am-5.40am.
>>>
>>> I'm wondering what the hack is happening, any suggestions where to
>>> troubleshoot ?
>>> Will check logs in details anyway...
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
Anybody?, before I start to cry :(

On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:

> HI Simon,
>
> selinux is disabled, I have just double checked.
>
> BTW, this is what I can see in the cloudstack-agent.err log - seems like
> some CEPH related issues, but not sure why would agent die...
> If I recall correclty, this might be happening since the CEPH update from
> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>
>
> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt: Storage Driver error : failed to remove volume
> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
> ./log/SubsystemMap.h: In function 'bool
> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
> 7f04427fc700 time 2015-02-21 06:39:38.839210
> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>  1: (()+0x1fe223) [0x7f060c932223]
>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>  4: (()+0x79d1) [0x7f06605ee9d1]
>  5: (clone()+0x6d) [0x7f066033bb5d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>
> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>
>> Andrija,
>>
>> What is SELinux set to on this host?
>>
>>
>> - SI
>>
>>
>> ________________________________________
>> From: Andrija Panic <an...@gmail.com>
>> Sent: Friday, February 20, 2015 6:06 AM
>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>> Subject: Agent dies every night/morning.... memory violation
>>
>> Hi,
>>
>> I have crazy agent on one of the hosts, that is being killed each morning
>> and I found this in /var/log/audit.log:
>>
>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>
>> I dont remember changing anything on the system, but this keeps happening
>> each morning arrond same time 5.20am-5.40am.
>>
>> I'm wondering what the hack is happening, any suggestions where to
>> troubleshoot ?
>> Will check logs in details anyway...
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
Anybody?, before I start to cry :(

On 21 February 2015 at 21:18, Andrija Panic <an...@gmail.com> wrote:

> HI Simon,
>
> selinux is disabled, I have just double checked.
>
> BTW, this is what I can see in the cloudstack-agent.err log - seems like
> some CEPH related issues, but not sure why would agent die...
> If I recall correclty, this might be happening since the CEPH update from
> 0.80.3? to 0.87 - and this seesm like some crash in librados....
>
>
> libust[1907/2046]: Warning: HOME environment variable not set. Disabling
> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt:  error : name in virDomainLookupByName must not be NULL
> libvirt: Storage Driver error : failed to remove volume
> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
> ./log/SubsystemMap.h: In function 'bool
> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
> 7f04427fc700 time 2015-02-21 06:39:38.839210
> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>  1: (()+0x1fe223) [0x7f060c932223]
>  2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
>  3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
>  4: (()+0x79d1) [0x7f06605ee9d1]
>  5: (clone()+0x6d) [0x7f066033bb5d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly
>
> On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:
>
>> Andrija,
>>
>> What is SELinux set to on this host?
>>
>>
>> - SI
>>
>>
>> ________________________________________
>> From: Andrija Panic <an...@gmail.com>
>> Sent: Friday, February 20, 2015 6:06 AM
>> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
>> Subject: Agent dies every night/morning.... memory violation
>>
>> Hi,
>>
>> I have crazy agent on one of the hosts, that is being killed each morning
>> and I found this in /var/log/audit.log:
>>
>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>>
>> I dont remember changing anything on the system, but this keeps happening
>> each morning arrond same time 5.20am-5.40am.
>>
>> I'm wondering what the hack is happening, any suggestions where to
>> troubleshoot ?
>> Will check logs in details anyway...
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
HI Simon,

selinux is disabled, I have just double checked.

BTW, this is what I can see in the cloudstack-agent.err log - seems like
some CEPH related issues, but not sure why would agent die...
If I recall correclty, this might be happening since the CEPH update from
0.80.3? to 0.87 - and this seesm like some crash in librados....


libust[1907/2046]: Warning: HOME environment variable not set. Disabling
LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt: Storage Driver error : failed to remove volume
'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
./log/SubsystemMap.h: In function 'bool
ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
7f04427fc700 time 2015-02-21 06:39:38.839210
./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (()+0x1fe223) [0x7f060c932223]
 2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
 4: (()+0x79d1) [0x7f06605ee9d1]
 5: (clone()+0x6d) [0x7f066033bb5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly

On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:

> Andrija,
>
> What is SELinux set to on this host?
>
>
> - SI
>
>
> ________________________________________
> From: Andrija Panic <an...@gmail.com>
> Sent: Friday, February 20, 2015 6:06 AM
> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> Subject: Agent dies every night/morning.... memory violation
>
> Hi,
>
> I have crazy agent on one of the hosts, that is being killed each morning
> and I found this in /var/log/audit.log:
>
> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>
> I dont remember changing anything on the system, but this keeps happening
> each morning arrond same time 5.20am-5.40am.
>
> I'm wondering what the hack is happening, any suggestions where to
> troubleshoot ?
> Will check logs in details anyway...
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Andrija Panic <an...@gmail.com>.
HI Simon,

selinux is disabled, I have just double checked.

BTW, this is what I can see in the cloudstack-agent.err log - seems like
some CEPH related issues, but not sure why would agent die...
If I recall correclty, this might be happening since the CEPH update from
0.80.3? to 0.87 - and this seesm like some crash in librados....


libust[1907/2046]: Warning: HOME environment variable not set. Disabling
LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt:  error : name in virDomainLookupByName must not be NULL
libvirt: Storage Driver error : failed to remove volume
'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource busy
./log/SubsystemMap.h: In function 'bool
ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread
7f04427fc700 time 2015-02-21 06:39:38.839210
./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (()+0x1fe223) [0x7f060c932223]
 2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5]
 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd]
 4: (()+0x79d1) [0x7f06605ee9d1]
 5: (clone()+0x6d) [0x7f066033bb5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly

On 20 February 2015 at 21:56, Simon Weller <sw...@ena.com> wrote:

> Andrija,
>
> What is SELinux set to on this host?
>
>
> - SI
>
>
> ________________________________________
> From: Andrija Panic <an...@gmail.com>
> Sent: Friday, February 20, 2015 6:06 AM
> To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> Subject: Agent dies every night/morning.... memory violation
>
> Hi,
>
> I have crazy agent on one of the hosts, that is being killed each morning
> and I found this in /var/log/audit.log:
>
> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6
>
> I dont remember changing anything on the system, but this keeps happening
> each morning arrond same time 5.20am-5.40am.
>
> I'm wondering what the hack is happening, any suggestions where to
> troubleshoot ?
> Will check logs in details anyway...
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Simon Weller <sw...@ena.com>.
Andrija,

What is SELinux set to on this host?


- SI


________________________________________
From: Andrija Panic <an...@gmail.com>
Sent: Friday, February 20, 2015 6:06 AM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Agent dies every night/morning.... memory violation

Hi,

I have crazy agent on one of the hosts, that is being killed each morning
and I found this in /var/log/audit.log:

type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6

I dont remember changing anything on the system, but this keeps happening
each morning arrond same time 5.20am-5.40am.

I'm wondering what the hack is happening, any suggestions where to
troubleshoot ?
Will check logs in details anyway...

--

Andrija Panić

Re: Agent dies every night/morning.... memory violation

Posted by Simon Weller <sw...@ena.com>.
Andrija,

What is SELinux set to on this host?


- SI


________________________________________
From: Andrija Panic <an...@gmail.com>
Sent: Friday, February 20, 2015 6:06 AM
To: dev@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Agent dies every night/morning.... memory violation

Hi,

I have crazy agent on one of the hosts, that is being killed each morning
and I found this in /var/log/audit.log:

type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0
ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6

I dont remember changing anything on the system, but this keeps happening
each morning arrond same time 5.20am-5.40am.

I'm wondering what the hack is happening, any suggestions where to
troubleshoot ?
Will check logs in details anyway...

--

Andrija Panić