You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by "Aled Sage (JIRA)" <ji...@apache.org> on 2016/06/10 11:35:21 UTC

[jira] [Created] (BROOKLYN-298) sshj hangs (waiting for shell to finish) after script completed - maybe VPN went down+up during exec

Aled Sage created BROOKLYN-298:
----------------------------------

             Summary: sshj hangs (waiting for shell to finish) after script completed - maybe VPN went down+up during exec
                 Key: BROOKLYN-298
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-298
             Project: Brooklyn
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Aled Sage


I was deploying an app whose launch command started docker and pulled an image. The task hung, showing in the web-console:

{noformat}
In progress - SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}
{noformat}

I believe this is because my VPN disconnected and then reconnected, and our sshj command keeps waiting for the result - even though the command has finished executing.

Looking at the target VM, the command has completed (and the script uploaded by SshjTool has been deleted). There is no evidence of any Brooklyn-initiated commands executing, according to {{ps aux}}.

Drilling into the activity view in the Brooklyn web-console, the currently executing thread shows:

{noformat}
SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}

Task[ssh: launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}]@TPnVc8Qs
Submitted by SoftlyPresent[value=Task[launch (main)]@mvL4OvdH]

In progress, thread waiting (timed) on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@408df99d
At: net.schmizz.concurrent.Promise.tryRetrieve(Promise.java:168)
    net.schmizz.concurrent.Promise.retrieve(Promise.java:137)
    net.schmizz.concurrent.Event.await(Event.java:103)
    net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1012)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:925)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:630)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:616)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$1.run(SshjTool.java:331)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:326)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:82)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:166)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:164)
    org.apache.brooklyn.util.pool.BasicPool.exec(BasicPool.java:146)
    org.apache.brooklyn.location.ssh.SshMachineLocation.execSsh(SshMachineLocation.java:611)
    org.apache.brooklyn.location.ssh.SshMachineLocation$13.execWithTool(SshMachineLocation.java:790)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:164)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:80)
    org.apache.brooklyn.location.ssh.SshMachineLocation.execScript(SshMachineLocation.java:774)
    org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:272)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:366)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:285)
    org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
    org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
{noformat}

Running {{netstat -antp TCP}} on my local machine, I still see an established ssh connection:

{noformat}
tcp4       0      0  10.104.3.10.54535      10.104.1.193.22        ESTABLISHED
{noformat}

I do *not* see a corresponding entry when I run {{sudo netsat -anp}} on the target VM.

---
Looking in the Brooklyn code at {{SshjTool$ShellAction.create}}, I wonder what else we could call on sshj to check if our connection is ok and/or the command has actually completed. We are already calling {{shell.isOpen()}} and {{session.getExitStatus()!=null}}. We could add calls to {{session.isOpen()}}, {{session.getExitSignal()}} and/or {{session.getExitWasCoreDumped()}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)