You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by "Aled Sage (JIRA)" <ji...@apache.org> on 2016/06/10 11:35:21 UTC
[jira] [Created] (BROOKLYN-298) sshj hangs (waiting for shell to
finish) after script completed - maybe VPN went down+up during exec
Aled Sage created BROOKLYN-298:
----------------------------------
Summary: sshj hangs (waiting for shell to finish) after script completed - maybe VPN went down+up during exec
Key: BROOKLYN-298
URL: https://issues.apache.org/jira/browse/BROOKLYN-298
Project: Brooklyn
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Aled Sage
I was deploying an app whose launch command started docker and pulled an image. The task hung, showing in the web-console:
{noformat}
In progress - SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}
{noformat}
I believe this is because my VPN disconnected and then reconnected, and our sshj command keeps waiting for the result - even though the command has finished executing.
Looking at the target VM, the command has completed (and the script uploaded by SshjTool has been deleted). There is no evidence of any Brooklyn-initiated commands executing, according to {{ps aux}}.
Drilling into the activity view in the Brooklyn web-console, the currently executing thread shows:
{noformat}
SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}
Task[ssh: launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}]@TPnVc8Qs
Submitted by SoftlyPresent[value=Task[launch (main)]@mvL4OvdH]
In progress, thread waiting (timed) on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@408df99d
At: net.schmizz.concurrent.Promise.tryRetrieve(Promise.java:168)
net.schmizz.concurrent.Promise.retrieve(Promise.java:137)
net.schmizz.concurrent.Event.await(Event.java:103)
net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1012)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:925)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:630)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:616)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$1.run(SshjTool.java:331)
org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:326)
org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:82)
org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:166)
org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:164)
org.apache.brooklyn.util.pool.BasicPool.exec(BasicPool.java:146)
org.apache.brooklyn.location.ssh.SshMachineLocation.execSsh(SshMachineLocation.java:611)
org.apache.brooklyn.location.ssh.SshMachineLocation$13.execWithTool(SshMachineLocation.java:790)
org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:164)
org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:80)
org.apache.brooklyn.location.ssh.SshMachineLocation.execScript(SshMachineLocation.java:774)
org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:272)
org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:366)
org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:285)
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
{noformat}
Running {{netstat -antp TCP}} on my local machine, I still see an established ssh connection:
{noformat}
tcp4 0 0 10.104.3.10.54535 10.104.1.193.22 ESTABLISHED
{noformat}
I do *not* see a corresponding entry when I run {{sudo netsat -anp}} on the target VM.
---
Looking in the Brooklyn code at {{SshjTool$ShellAction.create}}, I wonder what else we could call on sshj to check if our connection is ok and/or the command has actually completed. We are already calling {{shell.isOpen()}} and {{session.getExitStatus()!=null}}. We could add calls to {{session.isOpen()}}, {{session.getExitSignal()}} and/or {{session.getExitWasCoreDumped()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)