You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by Sam Corbett <sa...@cloudsoftcorp.com> on 2015/01/29 12:07:52 UTC

Sudden termination of server, possible thread interruption

Hi all,

I'd like help getting to the bottom of the unexpected termination of a
Brooklyn process. It hit me twice in a row yesterday, once with nothing
weird in the logs and once with a number of stacktraces indicating an
InterruptedException was thrown. Both processes were run on the same host
in Softlayer and were running Clocker.

The first deployment seemed to be working normally. I had deployed a few
applications to my-docker-cloud and stopped them again. A moment later and
the process had stopped. The last thing the server did was check the status
of a Weave container:

2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
[brooklyn-execmanager-BL31ZSeZ-2001]: check-running
WeaveContainerImpl{id=r3fpQokV}, on machine
SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
completed: return status 0
2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
[shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
web-console

There were no interesting exceptions in the debug log.

In the second case the process stopped as Brooklyn waited for the status of
a service that did not provision. This time there was a (lot of) stacktrace
in the logs. Most pertinently perhaps was:

2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
[brooklyn-execmanager-YRXvc51z-1676]: Exception running task
Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
sleep interrupted
brooklyn.util.exceptions.RuntimeInterruptedException:
java.lang.InterruptedException: sleep interrupted
at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:312)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:318)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
~[na:1.7.0_65]
at
brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
at brooklyn.util.time.Time.sleep(Time.java:310)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
... 15 common frames omitted

I've got full logs for each run of the server but I didn't get the exit
code of either process. I ran a third test on the same host later in the
day and nothing went wrong (in a reasonable timeframe).

Has anyone experienced this before? Are there any system logs I could have
looked to for more information? A brief look at the standard /var/log files
revealed nothing.

It was a bit alarming to see that in the first instance the process stopped
with no indication why.

Sam

Re: Sudden termination of server, possible thread interruption

Posted by Sam Corbett <sa...@cloudsoftcorp.com>.
Argh. That's boring and plausible. I will always `nohup .. &` in future.

On 29 January 2015 at 14:29, Aled Sage <al...@gmail.com> wrote:

> Sam,
>
> Am I reading that correctly: you ran `clocker.sh` without either of nohup
> or disown, and those are not automatically run by clocker.sh?
>
> If that is the case, the most likely explanation is that your terminal
> session terminated so a sighup was sent to all jobs, including your clocker
> process.
>
> Aled
>
> p.s. see the link that Alasdair sent out: http://unix.stackexchange.com/
> questions/3886/difference-between-nohup-disown-and
>
>
>
> On 29/01/2015 14:23, Sam Corbett wrote:
>
>> I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
>> "${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
>> that were running do anything weird. There was no core dump. Shutdown
>> hooks
>> were run in both cases.
>>
>> The processes were run on a VM in Softlayer. I don't have the precise
>> specs
>> of the machine to hand but it was Debian, had 8Gb of RAM and was running
>> OpenJDK 7. Brooklyn was - or at least should have been - operating a light
>> load in both cases. Each had a two-host Clocker application running and
>> had
>> had a few other applications with two or three entities deployed to
>> Clocker
>> and then stopped again. I don't have the machine available any more so
>> can't get to /var/log.
>>
>> On 29 January 2015 at 13:19, Richard Downer <ri...@apache.org> wrote:
>>
>>  Further questions:
>>>
>>> Where were you running this process - on your workstation, or on a
>>> Linux server elsewhere? What is the spec/OS of the machine running
>>> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
>>> entites or SSH sensors...)
>>>
>>> If this was on a Linux server, I'm wondering if the JVM holding
>>> Brooklyn used up all of the server's memory and the OOM killer was
>>> invoked. If so you should see a message in the output of the "dmesg"
>>> command, or in /var/log/syslog or /var/log/messages.
>>>
>>> Richard.
>>>
>>>
>>> On 29 January 2015 at 13:11, Aled Sage <al...@gmail.com> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>> First quick questions:
>>>>
>>>>   * Was brooklyn definitely run with `nohup` or `disown`?
>>>>   * Are you running with any unusual entities that might inadvertently
>>>>     have a System.exit or some such?!
>>>>   * I presume there was no core dump file in the run directory?
>>>>
>>>> The InterruptedException suggest this might be a relatively gracefully
>>>> shutdown. Do you see evidence that the shutdown hook has been called (so
>>>>
>>> the
>>>
>>>> management context was shut down cleanly)?
>>>>
>>>> Aled
>>>>
>>>>
>>>> On 29/01/2015 11:07, Sam Corbett wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like help getting to the bottom of the unexpected termination of a
>>>>> Brooklyn process. It hit me twice in a row yesterday, once with nothing
>>>>> weird in the logs and once with a number of stacktraces indicating an
>>>>> InterruptedException was thrown. Both processes were run on the same
>>>>>
>>>> host
>>>
>>>> in Softlayer and were running Clocker.
>>>>>
>>>>> The first deployment seemed to be working normally. I had deployed a
>>>>> few
>>>>> applications to my-docker-cloud and stopped them again. A moment later
>>>>>
>>>> and
>>>
>>>> the process had stopped. The last thing the server did was check the
>>>>> status
>>>>> of a Weave container:
>>>>>
>>>>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
>>>>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
>>>>> WeaveContainerImpl{id=r3fpQokV}, on machine
>>>>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
>>>>> completed: return status 0
>>>>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
>>>>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
>>>>> web-console
>>>>>
>>>>> There were no interesting exceptions in the debug log.
>>>>>
>>>>> In the second case the process stopped as Brooklyn waited for the
>>>>> status
>>>>> of
>>>>> a service that did not provision. This time there was a (lot of)
>>>>> stacktrace
>>>>> in the logs. Most pertinently perhaps was:
>>>>>
>>>>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
>>>>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
>>>>> Task[post-start:ihocjzth] (rethrowing): java.lang.
>>>>> InterruptedException:
>>>>> sleep interrupted
>>>>> brooklyn.util.exceptions.RuntimeInterruptedException:
>>>>> java.lang.InterruptedException: sleep interrupted
>>>>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:312)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:318)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
>>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(
>>> SoftwareProcessImpl.java:370)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(
>>> SoftwareProcessImpl.java:367)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.basic.SoftwareProcessDriverLifecycle
>>> EffectorTasks.postStartCustom(SoftwareProcessDriverLifecycle
>>> EffectorTasks.java:160)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(
>>> MachineLifecycleEffectorTasks.java:431)
>>>
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>
>>>> ~[na:1.7.0_65]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.util.task.DynamicSequentialTask$DstJob.
>>> call(DynamicSequentialTask.java:337)
>>>
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at
>>>>>
>>>>>
>>>>>  brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(
>>> BasicExecutionManager.java:469)
>>>
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>
>>>> [na:1.7.0_65]
>>>
>>>> at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1145)
>>>
>>>> [na:1.7.0_65]
>>>>> at
>>>>>
>>>>>
>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:615)
>>>
>>>> [na:1.7.0_65]
>>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>>>> Caused by: java.lang.InterruptedException: sleep interrupted
>>>>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
>>>>> at brooklyn.util.time.Time.sleep(Time.java:310)
>>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>>> ... 15 common frames omitted
>>>>>
>>>>> I've got full logs for each run of the server but I didn't get the exit
>>>>> code of either process. I ran a third test on the same host later in
>>>>> the
>>>>> day and nothing went wrong (in a reasonable timeframe).
>>>>>
>>>>> Has anyone experienced this before? Are there any system logs I could
>>>>>
>>>> have
>>>
>>>> looked to for more information? A brief look at the standard /var/log
>>>>> files
>>>>> revealed nothing.
>>>>>
>>>>> It was a bit alarming to see that in the first instance the process
>>>>> stopped
>>>>> with no indication why.
>>>>>
>>>>> Sam
>>>>>
>>>>>
>

Re: Sudden termination of server, possible thread interruption

Posted by Aled Sage <al...@gmail.com>.
Sam,

Am I reading that correctly: you ran `clocker.sh` without either of 
nohup or disown, and those are not automatically run by clocker.sh?

If that is the case, the most likely explanation is that your terminal 
session terminated so a sighup was sent to all jobs, including your 
clocker process.

Aled

p.s. see the link that Alasdair sent out: 
http://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and


On 29/01/2015 14:23, Sam Corbett wrote:
> I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
> "${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
> that were running do anything weird. There was no core dump. Shutdown hooks
> were run in both cases.
>
> The processes were run on a VM in Softlayer. I don't have the precise specs
> of the machine to hand but it was Debian, had 8Gb of RAM and was running
> OpenJDK 7. Brooklyn was - or at least should have been - operating a light
> load in both cases. Each had a two-host Clocker application running and had
> had a few other applications with two or three entities deployed to Clocker
> and then stopped again. I don't have the machine available any more so
> can't get to /var/log.
>
> On 29 January 2015 at 13:19, Richard Downer <ri...@apache.org> wrote:
>
>> Further questions:
>>
>> Where were you running this process - on your workstation, or on a
>> Linux server elsewhere? What is the spec/OS of the machine running
>> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
>> entites or SSH sensors...)
>>
>> If this was on a Linux server, I'm wondering if the JVM holding
>> Brooklyn used up all of the server's memory and the OOM killer was
>> invoked. If so you should see a message in the output of the "dmesg"
>> command, or in /var/log/syslog or /var/log/messages.
>>
>> Richard.
>>
>>
>> On 29 January 2015 at 13:11, Aled Sage <al...@gmail.com> wrote:
>>> Hi Sam,
>>>
>>> First quick questions:
>>>
>>>   * Was brooklyn definitely run with `nohup` or `disown`?
>>>   * Are you running with any unusual entities that might inadvertently
>>>     have a System.exit or some such?!
>>>   * I presume there was no core dump file in the run directory?
>>>
>>> The InterruptedException suggest this might be a relatively gracefully
>>> shutdown. Do you see evidence that the shutdown hook has been called (so
>> the
>>> management context was shut down cleanly)?
>>>
>>> Aled
>>>
>>>
>>> On 29/01/2015 11:07, Sam Corbett wrote:
>>>> Hi all,
>>>>
>>>> I'd like help getting to the bottom of the unexpected termination of a
>>>> Brooklyn process. It hit me twice in a row yesterday, once with nothing
>>>> weird in the logs and once with a number of stacktraces indicating an
>>>> InterruptedException was thrown. Both processes were run on the same
>> host
>>>> in Softlayer and were running Clocker.
>>>>
>>>> The first deployment seemed to be working normally. I had deployed a few
>>>> applications to my-docker-cloud and stopped them again. A moment later
>> and
>>>> the process had stopped. The last thing the server did was check the
>>>> status
>>>> of a Weave container:
>>>>
>>>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
>>>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
>>>> WeaveContainerImpl{id=r3fpQokV}, on machine
>>>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
>>>> completed: return status 0
>>>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
>>>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
>>>> web-console
>>>>
>>>> There were no interesting exceptions in the debug log.
>>>>
>>>> In the second case the process stopped as Brooklyn waited for the status
>>>> of
>>>> a service that did not provision. This time there was a (lot of)
>>>> stacktrace
>>>> in the logs. Most pertinently perhaps was:
>>>>
>>>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
>>>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
>>>> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
>>>> sleep interrupted
>>>> brooklyn.util.exceptions.RuntimeInterruptedException:
>>>> java.lang.InterruptedException: sleep interrupted
>>>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at brooklyn.util.time.Time.sleep(Time.java:312)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at brooklyn.util.time.Time.sleep(Time.java:318)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>>>>
>>>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>>>>
>>>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>>>>
>>>>
>> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>>>>
>>>>
>> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
>>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> ~[na:1.7.0_65]
>>>> at
>>>>
>>>>
>> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at
>>>>
>>>>
>> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> [na:1.7.0_65]
>>>> at
>>>>
>>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> [na:1.7.0_65]
>>>> at
>>>>
>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> [na:1.7.0_65]
>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>>> Caused by: java.lang.InterruptedException: sleep interrupted
>>>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
>>>> at brooklyn.util.time.Time.sleep(Time.java:310)
>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>>>> ... 15 common frames omitted
>>>>
>>>> I've got full logs for each run of the server but I didn't get the exit
>>>> code of either process. I ran a third test on the same host later in the
>>>> day and nothing went wrong (in a reasonable timeframe).
>>>>
>>>> Has anyone experienced this before? Are there any system logs I could
>> have
>>>> looked to for more information? A brief look at the standard /var/log
>>>> files
>>>> revealed nothing.
>>>>
>>>> It was a bit alarming to see that in the first instance the process
>>>> stopped
>>>> with no indication why.
>>>>
>>>> Sam
>>>>


Re: Sudden termination of server, possible thread interruption

Posted by Sam Corbett <sa...@cloudsoftcorp.com>.
I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
"${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
that were running do anything weird. There was no core dump. Shutdown hooks
were run in both cases.

The processes were run on a VM in Softlayer. I don't have the precise specs
of the machine to hand but it was Debian, had 8Gb of RAM and was running
OpenJDK 7. Brooklyn was - or at least should have been - operating a light
load in both cases. Each had a two-host Clocker application running and had
had a few other applications with two or three entities deployed to Clocker
and then stopped again. I don't have the machine available any more so
can't get to /var/log.

On 29 January 2015 at 13:19, Richard Downer <ri...@apache.org> wrote:

> Further questions:
>
> Where were you running this process - on your workstation, or on a
> Linux server elsewhere? What is the spec/OS of the machine running
> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
> entites or SSH sensors...)
>
> If this was on a Linux server, I'm wondering if the JVM holding
> Brooklyn used up all of the server's memory and the OOM killer was
> invoked. If so you should see a message in the output of the "dmesg"
> command, or in /var/log/syslog or /var/log/messages.
>
> Richard.
>
>
> On 29 January 2015 at 13:11, Aled Sage <al...@gmail.com> wrote:
> > Hi Sam,
> >
> > First quick questions:
> >
> >  * Was brooklyn definitely run with `nohup` or `disown`?
> >  * Are you running with any unusual entities that might inadvertently
> >    have a System.exit or some such?!
> >  * I presume there was no core dump file in the run directory?
> >
> > The InterruptedException suggest this might be a relatively gracefully
> > shutdown. Do you see evidence that the shutdown hook has been called (so
> the
> > management context was shut down cleanly)?
> >
> > Aled
> >
> >
> > On 29/01/2015 11:07, Sam Corbett wrote:
> >>
> >> Hi all,
> >>
> >> I'd like help getting to the bottom of the unexpected termination of a
> >> Brooklyn process. It hit me twice in a row yesterday, once with nothing
> >> weird in the logs and once with a number of stacktraces indicating an
> >> InterruptedException was thrown. Both processes were run on the same
> host
> >> in Softlayer and were running Clocker.
> >>
> >> The first deployment seemed to be working normally. I had deployed a few
> >> applications to my-docker-cloud and stopped them again. A moment later
> and
> >> the process had stopped. The last thing the server did was check the
> >> status
> >> of a Weave container:
> >>
> >> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
> >> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
> >> WeaveContainerImpl{id=r3fpQokV}, on machine
> >> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
> >> completed: return status 0
> >> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
> >> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
> >> web-console
> >>
> >> There were no interesting exceptions in the debug log.
> >>
> >> In the second case the process stopped as Brooklyn waited for the status
> >> of
> >> a service that did not provision. This time there was a (lot of)
> >> stacktrace
> >> in the logs. Most pertinently perhaps was:
> >>
> >> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
> >> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
> >> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
> >> sleep interrupted
> >> brooklyn.util.exceptions.RuntimeInterruptedException:
> >> java.lang.InterruptedException: sleep interrupted
> >> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.time.Time.sleep(Time.java:312)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.time.Time.sleep(Time.java:318)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
> >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> ~[na:1.7.0_65]
> >> at
> >>
> >>
> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at
> >>
> >>
> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
> >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> [na:1.7.0_65]
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> [na:1.7.0_65]
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> [na:1.7.0_65]
> >> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> >> Caused by: java.lang.InterruptedException: sleep interrupted
> >> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
> >> at brooklyn.util.time.Time.sleep(Time.java:310)
> >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> >> ... 15 common frames omitted
> >>
> >> I've got full logs for each run of the server but I didn't get the exit
> >> code of either process. I ran a third test on the same host later in the
> >> day and nothing went wrong (in a reasonable timeframe).
> >>
> >> Has anyone experienced this before? Are there any system logs I could
> have
> >> looked to for more information? A brief look at the standard /var/log
> >> files
> >> revealed nothing.
> >>
> >> It was a bit alarming to see that in the first instance the process
> >> stopped
> >> with no indication why.
> >>
> >> Sam
> >>
> >
>

Re: Sudden termination of server, possible thread interruption

Posted by Richard Downer <ri...@apache.org>.
Further questions:

Where were you running this process - on your workstation, or on a
Linux server elsewhere? What is the spec/OS of the machine running
Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
entites or SSH sensors...)

If this was on a Linux server, I'm wondering if the JVM holding
Brooklyn used up all of the server's memory and the OOM killer was
invoked. If so you should see a message in the output of the "dmesg"
command, or in /var/log/syslog or /var/log/messages.

Richard.


On 29 January 2015 at 13:11, Aled Sage <al...@gmail.com> wrote:
> Hi Sam,
>
> First quick questions:
>
>  * Was brooklyn definitely run with `nohup` or `disown`?
>  * Are you running with any unusual entities that might inadvertently
>    have a System.exit or some such?!
>  * I presume there was no core dump file in the run directory?
>
> The InterruptedException suggest this might be a relatively gracefully
> shutdown. Do you see evidence that the shutdown hook has been called (so the
> management context was shut down cleanly)?
>
> Aled
>
>
> On 29/01/2015 11:07, Sam Corbett wrote:
>>
>> Hi all,
>>
>> I'd like help getting to the bottom of the unexpected termination of a
>> Brooklyn process. It hit me twice in a row yesterday, once with nothing
>> weird in the logs and once with a number of stacktraces indicating an
>> InterruptedException was thrown. Both processes were run on the same host
>> in Softlayer and were running Clocker.
>>
>> The first deployment seemed to be working normally. I had deployed a few
>> applications to my-docker-cloud and stopped them again. A moment later and
>> the process had stopped. The last thing the server did was check the
>> status
>> of a Weave container:
>>
>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
>> WeaveContainerImpl{id=r3fpQokV}, on machine
>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
>> completed: return status 0
>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
>> web-console
>>
>> There were no interesting exceptions in the debug log.
>>
>> In the second case the process stopped as Brooklyn waited for the status
>> of
>> a service that did not provision. This time there was a (lot of)
>> stacktrace
>> in the logs. Most pertinently perhaps was:
>>
>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
>> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
>> sleep interrupted
>> brooklyn.util.exceptions.RuntimeInterruptedException:
>> java.lang.InterruptedException: sleep interrupted
>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.time.Time.sleep(Time.java:312)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.time.Time.sleep(Time.java:318)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> ~[na:1.7.0_65]
>> at
>>
>> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [na:1.7.0_65]
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [na:1.7.0_65]
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>> Caused by: java.lang.InterruptedException: sleep interrupted
>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
>> at brooklyn.util.time.Time.sleep(Time.java:310)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> ... 15 common frames omitted
>>
>> I've got full logs for each run of the server but I didn't get the exit
>> code of either process. I ran a third test on the same host later in the
>> day and nothing went wrong (in a reasonable timeframe).
>>
>> Has anyone experienced this before? Are there any system logs I could have
>> looked to for more information? A brief look at the standard /var/log
>> files
>> revealed nothing.
>>
>> It was a bit alarming to see that in the first instance the process
>> stopped
>> with no indication why.
>>
>> Sam
>>
>

Re: Sudden termination of server, possible thread interruption

Posted by Aled Sage <al...@gmail.com>.
Hi Sam,

First quick questions:

  * Was brooklyn definitely run with `nohup` or `disown`?
  * Are you running with any unusual entities that might inadvertently
    have a System.exit or some such?!
  * I presume there was no core dump file in the run directory?

The InterruptedException suggest this might be a relatively gracefully 
shutdown. Do you see evidence that the shutdown hook has been called (so 
the management context was shut down cleanly)?

Aled

On 29/01/2015 11:07, Sam Corbett wrote:
> Hi all,
>
> I'd like help getting to the bottom of the unexpected termination of a
> Brooklyn process. It hit me twice in a row yesterday, once with nothing
> weird in the logs and once with a number of stacktraces indicating an
> InterruptedException was thrown. Both processes were run on the same host
> in Softlayer and were running Clocker.
>
> The first deployment seemed to be working normally. I had deployed a few
> applications to my-docker-cloud and stopped them again. A moment later and
> the process had stopped. The last thing the server did was check the status
> of a Weave container:
>
> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
> WeaveContainerImpl{id=r3fpQokV}, on machine
> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
> completed: return status 0
> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
> web-console
>
> There were no interesting exceptions in the debug log.
>
> In the second case the process stopped as Brooklyn waited for the status of
> a service that did not provision. This time there was a (lot of) stacktrace
> in the logs. Most pertinently perhaps was:
>
> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
> sleep interrupted
> brooklyn.util.exceptions.RuntimeInterruptedException:
> java.lang.InterruptedException: sleep interrupted
> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.util.time.Time.sleep(Time.java:312)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.util.time.Time.sleep(Time.java:318)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> ~[na:1.7.0_65]
> at
> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_65]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_65]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
> at brooklyn.util.time.Time.sleep(Time.java:310)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> ... 15 common frames omitted
>
> I've got full logs for each run of the server but I didn't get the exit
> code of either process. I ran a third test on the same host later in the
> day and nothing went wrong (in a reasonable timeframe).
>
> Has anyone experienced this before? Are there any system logs I could have
> looked to for more information? A brief look at the standard /var/log files
> revealed nothing.
>
> It was a bit alarming to see that in the first instance the process stopped
> with no indication why.
>
> Sam
>