You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jclouds.apache.org by Jason Dusek <ja...@gmail.com> on 2014/07/03 23:01:22 UTC

Early SSH failures when provisioning on GCE

Hi All,

When provisioning instances on Google's Compute Engine service,
we are encountering a failure when jClouds attempts to SSH in to
the instance early in the boot process.

The exception is thrown as a result of a call to
`computeService.createNodesInGroup()`. When the error is thrown,
the SSH user and key which are being used are printed in the
stack trace. Logging in using the SSH CLI tool, passing the
indicated user and key, works as expected.

One question that arises from all this is: why is jClouds
logging into the servers this early? Is there something in the
template that causes it to try to configure something? The
options passed to the template are in this case very simple; in
particular, no `runScript()` options are passed.

Another is: what is a safe and reasonable way to resolve this
error case? Perhaps there is a way to delay the step where
jClouds SSHes to the instance, or to disable the
`RunScriptOnNodeAsInitScriptUsingSsh` step entirely.

We've found jClouds to be a big help so far and it'd be great to
find an idiomatic way to resolve this issue. Please let me know
if there is any additional information I can provide.

--
Jason Dusek
@solidsnack




[2014-07-03 17:17:47,000] ERROR << problem customizing
node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
org.jclouds.rest.AuthorizationException:
(jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
(jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
connectTimeout=60000, sessionTimeout=60000}: Auth fail
    at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
    at org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
    at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
    at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
    at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
    at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
    at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
    at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
Caused by: com.jcraft.jsch.JSchException: Auth fail
    at com.jcraft.jsch.Session.connect(Session.java:491)
    at org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
    at org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
    at org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
    ... 10 more

Re: Early SSH failures when provisioning on GCE

Posted by Andrea Turli <an...@gmail.com>.
Hi Jason,

looking at the code (and running some tests) seems that in
GoogleComputeEngineServiceAdapter.getFromImageAndOverrideIfRequired the
options you are specifying are overridden the default image credentials, so
the first SSH connection done during the node creation will fail.
Another interesting piece is the UseNodeCredentialsButOverrideFromTemplate.

So I think what you can try is to reuse the approach at
https://github.com/jclouds/jclouds-examples/blob/master/compute-basics/src/main/java/org/jclouds/examples/compute/basics/MainApp.java
and tweak the `bootInstructions` using AdminAccess.builder() instead of
using AdminAccess.standard(), if you are not happy with that.

Hope this helps,
Andrea


On Mon, Jul 7, 2014 at 5:55 PM, Jason Dusek <ja...@gmail.com> wrote:

> I can definitely share the options with you:
>
>     .options(overrideLoginPrivateKey("a private key"))
>     .options(overrideLoginUser("jclouds"))
>     .options(authorizePublicKey("a public key"))
>
> The authorized public key does show up in .ssh/authorized_keys,
> with the comment
>
>     # Added by Google
>
> on the line above it. It appears to have been added via the API.
>
> --
> Jason Dusek
> @solidsnack
>
>
> On 7 July 2014 07:36, Ignasi Barrera <na...@apache.org> wrote:
> > Thanks,
> >
> > Can you isolate the failure in a small program we could use to reproduce
> it?
> >
> > What might be relevant is the use of the TemplateBuilder and
> > TemplateOptions. There are several options (such as authorizePublicKey or
> > installPrivateKey) that translate into statements that are executed via
> SSH.
> > I don't remember exactly if the authorizePublicKey one causes an SSH
> > connection in GCE (providers that support key pairs implement it using
> the
> > API and not an SSH script) but I'll have a look. Meanwhile, could you
> share
> > which template options are you using?
> >
> > I.
> >
> > El 07/07/2014 04:15, "Jason Dusek" <ja...@gmail.com> escribió:
> >
> >> Hi Ignasi,
> >>
> >> Thanks for being willing to help out.
> >>
> >> Because the code base is proprietary, and factored into a few
> >> components that are generic in their handling of the different clouds,
> >> it's quite difficult to extract a snippet that brings together all the
> >> ingredients in an illustrative way.
> >>
> >> Is there some logging I could add or some options I could set for
> >> better diagnostics?
> >>
> >> It would be great if you could explain what triggers
> >> RunScriptOnNodeAsInitScriptUsingSsh, because if we can find a way to
> >> delay or disable it, it'd put an end to our troubles. If we're careful
> >> to catch the exception caused by this component and then to scan over
> >> available instances to find the relevant ones, generally we find that
> >> everything came up okay and we're able to move forward with the
> >> deployment -- so there appears to be no harm in skipping the
> >> RunScriptOnNodeAsInitScriptUsingSsh task.
> >> --
> >> Jason Dusek
> >> @solidsnack
> >>
> >>
> >> On 5 July 2014 07:23, Ignasi Barrera <na...@apache.org> wrote:
> >> > Hi Jason,
> >> >
> >> > Could you share the code snippet you use to create the nodes, so we
> can
> >> > try
> >> > to reproduce it?
> >> >
> >> > Thanks!
> >> >
> >> > I.
> >> >
> >> > El 03/07/2014 23:46, "Daniel Widdis" <wi...@gmail.com> escribió:
> >> >
> >> >> I can't answer specific to GCE, but with Rackspace I've used an
> >> >> awaitSsh()
> >> >> call first, which just sits and waits for the SSH to be available (or
> >> >> times
> >> >> out after a long time).  Once that method returns, you can use other
> >> >> methods
> >> >> to log in and run scripts.   It sounds like a similar thing would
> work
> >> >> for
> >> >> you.
> >> >>
> >> >> Here's the awaitSsh() stolen from the examples on the jclouds site:
> >> >>
> >> >>   /**
> >> >>    * Wait until ssh is available on specified ip
> >> >>    *
> >> >>    * @param ip
> >> >>    *        The IP Address to check
> >> >>    * @throws TimeoutException
> >> >>    */
> >> >>   private void awaitSsh(String ip) throws TimeoutException {
> >> >>     // SshjSshClientModule module = new SshjSshClientModule();
> >> >>     SocketOpen socketOpen =
> >> >> computeService.getContext().utils().injector()
> >> >>         .getInstance(SocketOpen.class);
> >> >>     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5,
> 5,
> >> >> SECONDS);
> >> >>     socketTester.apply(HostAndPort.fromParts(ip, 22));
> >> >>   }
> >> >>
> >> >>
> >> >> On 7/3/14, 2:01 PM, Jason Dusek wrote:
> >> >>>
> >> >>> Hi All,
> >> >>>
> >> >>> When provisioning instances on Google's Compute Engine service,
> >> >>> we are encountering a failure when jClouds attempts to SSH in to
> >> >>> the instance early in the boot process.
> >> >>>
> >> >>> The exception is thrown as a result of a call to
> >> >>> `computeService.createNodesInGroup()`. When the error is thrown,
> >> >>> the SSH user and key which are being used are printed in the
> >> >>> stack trace. Logging in using the SSH CLI tool, passing the
> >> >>> indicated user and key, works as expected.
> >> >>>
> >> >>> One question that arises from all this is: why is jClouds
> >> >>> logging into the servers this early? Is there something in the
> >> >>> template that causes it to try to configure something? The
> >> >>> options passed to the template are in this case very simple; in
> >> >>> particular, no `runScript()` options are passed.
> >> >>>
> >> >>> Another is: what is a safe and reasonable way to resolve this
> >> >>> error case? Perhaps there is a way to delay the step where
> >> >>> jClouds SSHes to the instance, or to disable the
> >> >>> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
> >> >>>
> >> >>> We've found jClouds to be a big help so far and it'd be great to
> >> >>> find an idiomatic way to resolve this issue. Please let me know
> >> >>> if there is any additional information I can provide.
> >> >>>
> >> >>> --
> >> >>> Jason Dusek
> >> >>> @solidsnack
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> [2014-07-03 17:17:47,000] ERROR << problem customizing
> >> >>> node(us-central1-a/silly-cluster-cd758a1c-fc8):
>  (jclouds.compute:91)
> >> >>> org.jclouds.rest.AuthorizationException:
> >> >>>
> >> >>>
> >> >>>
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@
> 173.255.113.102:22)
> >> >>>
> >> >>>
> >> >>>
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@
> 173.255.113.102:22)
> >> >>> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
> >> >>> connectTimeout=60000, sessionTimeout=60000}: Auth fail
> >> >>>      at
> >> >>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
> >> >>>      at
> >> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
> >> >>>      at
> >> >>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
> >> >>>      at
> >> >>> com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
> >> >>>      at
> >> >>>
> >> >>>
> com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
> >> >>>      at
> >> >>>
> >> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >>>      at
> >> >>>
> >> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >>>      at java.lang.Thread.run(Thread.java:724)
> >> >>> Caused by: com.jcraft.jsch.JSchException: Auth fail
> >> >>>      at com.jcraft.jsch.Session.connect(Session.java:491)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
> >> >>>      at
> >> >>>
> >> >>>
> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
> >> >>>      at
> >> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
> >> >>>      ... 10 more
> >> >>
> >> >>
> >> >
>

Re: Early SSH failures when provisioning on GCE

Posted by Jason Dusek <ja...@gmail.com>.
I can definitely share the options with you:

    .options(overrideLoginPrivateKey("a private key"))
    .options(overrideLoginUser("jclouds"))
    .options(authorizePublicKey("a public key"))

The authorized public key does show up in .ssh/authorized_keys,
with the comment

    # Added by Google

on the line above it. It appears to have been added via the API.

--
Jason Dusek
@solidsnack


On 7 July 2014 07:36, Ignasi Barrera <na...@apache.org> wrote:
> Thanks,
>
> Can you isolate the failure in a small program we could use to reproduce it?
>
> What might be relevant is the use of the TemplateBuilder and
> TemplateOptions. There are several options (such as authorizePublicKey or
> installPrivateKey) that translate into statements that are executed via SSH.
> I don't remember exactly if the authorizePublicKey one causes an SSH
> connection in GCE (providers that support key pairs implement it using the
> API and not an SSH script) but I'll have a look. Meanwhile, could you share
> which template options are you using?
>
> I.
>
> El 07/07/2014 04:15, "Jason Dusek" <ja...@gmail.com> escribió:
>
>> Hi Ignasi,
>>
>> Thanks for being willing to help out.
>>
>> Because the code base is proprietary, and factored into a few
>> components that are generic in their handling of the different clouds,
>> it's quite difficult to extract a snippet that brings together all the
>> ingredients in an illustrative way.
>>
>> Is there some logging I could add or some options I could set for
>> better diagnostics?
>>
>> It would be great if you could explain what triggers
>> RunScriptOnNodeAsInitScriptUsingSsh, because if we can find a way to
>> delay or disable it, it'd put an end to our troubles. If we're careful
>> to catch the exception caused by this component and then to scan over
>> available instances to find the relevant ones, generally we find that
>> everything came up okay and we're able to move forward with the
>> deployment -- so there appears to be no harm in skipping the
>> RunScriptOnNodeAsInitScriptUsingSsh task.
>> --
>> Jason Dusek
>> @solidsnack
>>
>>
>> On 5 July 2014 07:23, Ignasi Barrera <na...@apache.org> wrote:
>> > Hi Jason,
>> >
>> > Could you share the code snippet you use to create the nodes, so we can
>> > try
>> > to reproduce it?
>> >
>> > Thanks!
>> >
>> > I.
>> >
>> > El 03/07/2014 23:46, "Daniel Widdis" <wi...@gmail.com> escribió:
>> >
>> >> I can't answer specific to GCE, but with Rackspace I've used an
>> >> awaitSsh()
>> >> call first, which just sits and waits for the SSH to be available (or
>> >> times
>> >> out after a long time).  Once that method returns, you can use other
>> >> methods
>> >> to log in and run scripts.   It sounds like a similar thing would work
>> >> for
>> >> you.
>> >>
>> >> Here's the awaitSsh() stolen from the examples on the jclouds site:
>> >>
>> >>   /**
>> >>    * Wait until ssh is available on specified ip
>> >>    *
>> >>    * @param ip
>> >>    *        The IP Address to check
>> >>    * @throws TimeoutException
>> >>    */
>> >>   private void awaitSsh(String ip) throws TimeoutException {
>> >>     // SshjSshClientModule module = new SshjSshClientModule();
>> >>     SocketOpen socketOpen =
>> >> computeService.getContext().utils().injector()
>> >>         .getInstance(SocketOpen.class);
>> >>     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5, 5,
>> >> SECONDS);
>> >>     socketTester.apply(HostAndPort.fromParts(ip, 22));
>> >>   }
>> >>
>> >>
>> >> On 7/3/14, 2:01 PM, Jason Dusek wrote:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> When provisioning instances on Google's Compute Engine service,
>> >>> we are encountering a failure when jClouds attempts to SSH in to
>> >>> the instance early in the boot process.
>> >>>
>> >>> The exception is thrown as a result of a call to
>> >>> `computeService.createNodesInGroup()`. When the error is thrown,
>> >>> the SSH user and key which are being used are printed in the
>> >>> stack trace. Logging in using the SSH CLI tool, passing the
>> >>> indicated user and key, works as expected.
>> >>>
>> >>> One question that arises from all this is: why is jClouds
>> >>> logging into the servers this early? Is there something in the
>> >>> template that causes it to try to configure something? The
>> >>> options passed to the template are in this case very simple; in
>> >>> particular, no `runScript()` options are passed.
>> >>>
>> >>> Another is: what is a safe and reasonable way to resolve this
>> >>> error case? Perhaps there is a way to delay the step where
>> >>> jClouds SSHes to the instance, or to disable the
>> >>> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
>> >>>
>> >>> We've found jClouds to be a big help so far and it'd be great to
>> >>> find an idiomatic way to resolve this issue. Please let me know
>> >>> if there is any additional information I can provide.
>> >>>
>> >>> --
>> >>> Jason Dusek
>> >>> @solidsnack
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> [2014-07-03 17:17:47,000] ERROR << problem customizing
>> >>> node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
>> >>> org.jclouds.rest.AuthorizationException:
>> >>>
>> >>>
>> >>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>> >>>
>> >>>
>> >>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>> >>> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
>> >>> connectTimeout=60000, sessionTimeout=60000}: Auth fail
>> >>>      at
>> >>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
>> >>>      at
>> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
>> >>>      at
>> >>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
>> >>>      at
>> >>>
>> >>> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
>> >>>      at
>> >>>
>> >>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
>> >>>      at
>> >>>
>> >>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
>> >>>      at
>> >>>
>> >>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
>> >>>      at
>> >>> com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
>> >>>      at
>> >>>
>> >>> com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
>> >>>      at
>> >>>
>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>      at
>> >>>
>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>      at java.lang.Thread.run(Thread.java:724)
>> >>> Caused by: com.jcraft.jsch.JSchException: Auth fail
>> >>>      at com.jcraft.jsch.Session.connect(Session.java:491)
>> >>>      at
>> >>>
>> >>> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
>> >>>      at
>> >>>
>> >>> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
>> >>>      at
>> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
>> >>>      ... 10 more
>> >>
>> >>
>> >

Re: Early SSH failures when provisioning on GCE

Posted by Ignasi Barrera <na...@apache.org>.
Thanks,

Can you isolate the failure in a small program we could use to reproduce it?

What might be relevant is the use of the TemplateBuilder and
TemplateOptions. There are several options (such as authorizePublicKey or
installPrivateKey) that translate into statements that are executed via
SSH. I don't remember exactly if the authorizePublicKey one causes an SSH
connection in GCE (providers that support key pairs implement it using the
API and not an SSH script) but I'll have a look. Meanwhile, could you share
which template options are you using?

I.
El 07/07/2014 04:15, "Jason Dusek" <ja...@gmail.com> escribió:

> Hi Ignasi,
>
> Thanks for being willing to help out.
>
> Because the code base is proprietary, and factored into a few
> components that are generic in their handling of the different clouds,
> it's quite difficult to extract a snippet that brings together all the
> ingredients in an illustrative way.
>
> Is there some logging I could add or some options I could set for
> better diagnostics?
>
> It would be great if you could explain what triggers
> RunScriptOnNodeAsInitScriptUsingSsh, because if we can find a way to
> delay or disable it, it'd put an end to our troubles. If we're careful
> to catch the exception caused by this component and then to scan over
> available instances to find the relevant ones, generally we find that
> everything came up okay and we're able to move forward with the
> deployment -- so there appears to be no harm in skipping the
> RunScriptOnNodeAsInitScriptUsingSsh task.
> --
> Jason Dusek
> @solidsnack
>
>
> On 5 July 2014 07:23, Ignasi Barrera <na...@apache.org> wrote:
> > Hi Jason,
> >
> > Could you share the code snippet you use to create the nodes, so we can
> try
> > to reproduce it?
> >
> > Thanks!
> >
> > I.
> >
> > El 03/07/2014 23:46, "Daniel Widdis" <wi...@gmail.com> escribió:
> >
> >> I can't answer specific to GCE, but with Rackspace I've used an
> awaitSsh()
> >> call first, which just sits and waits for the SSH to be available (or
> times
> >> out after a long time).  Once that method returns, you can use other
> methods
> >> to log in and run scripts.   It sounds like a similar thing would work
> for
> >> you.
> >>
> >> Here's the awaitSsh() stolen from the examples on the jclouds site:
> >>
> >>   /**
> >>    * Wait until ssh is available on specified ip
> >>    *
> >>    * @param ip
> >>    *        The IP Address to check
> >>    * @throws TimeoutException
> >>    */
> >>   private void awaitSsh(String ip) throws TimeoutException {
> >>     // SshjSshClientModule module = new SshjSshClientModule();
> >>     SocketOpen socketOpen =
> computeService.getContext().utils().injector()
> >>         .getInstance(SocketOpen.class);
> >>     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5, 5,
> >> SECONDS);
> >>     socketTester.apply(HostAndPort.fromParts(ip, 22));
> >>   }
> >>
> >>
> >> On 7/3/14, 2:01 PM, Jason Dusek wrote:
> >>>
> >>> Hi All,
> >>>
> >>> When provisioning instances on Google's Compute Engine service,
> >>> we are encountering a failure when jClouds attempts to SSH in to
> >>> the instance early in the boot process.
> >>>
> >>> The exception is thrown as a result of a call to
> >>> `computeService.createNodesInGroup()`. When the error is thrown,
> >>> the SSH user and key which are being used are printed in the
> >>> stack trace. Logging in using the SSH CLI tool, passing the
> >>> indicated user and key, works as expected.
> >>>
> >>> One question that arises from all this is: why is jClouds
> >>> logging into the servers this early? Is there something in the
> >>> template that causes it to try to configure something? The
> >>> options passed to the template are in this case very simple; in
> >>> particular, no `runScript()` options are passed.
> >>>
> >>> Another is: what is a safe and reasonable way to resolve this
> >>> error case? Perhaps there is a way to delay the step where
> >>> jClouds SSHes to the instance, or to disable the
> >>> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
> >>>
> >>> We've found jClouds to be a big help so far and it'd be great to
> >>> find an idiomatic way to resolve this issue. Please let me know
> >>> if there is any additional information I can provide.
> >>>
> >>> --
> >>> Jason Dusek
> >>> @solidsnack
> >>>
> >>>
> >>>
> >>>
> >>> [2014-07-03 17:17:47,000] ERROR << problem customizing
> >>> node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
> >>> org.jclouds.rest.AuthorizationException:
> >>>
> >>>
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@
> 173.255.113.102:22)
> >>>
> >>>
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@
> 173.255.113.102:22)
> >>> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
> >>> connectTimeout=60000, sessionTimeout=60000}: Auth fail
> >>>      at
> >>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
> >>>      at
> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
> >>>      at
> >>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
> >>>      at
> >>>
> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
> >>>      at
> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
> >>>      at
> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
> >>>      at
> >>>
> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
> >>>      at
> >>> com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
> >>>      at
> >>>
> com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
> >>>      at
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>      at
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>      at java.lang.Thread.run(Thread.java:724)
> >>> Caused by: com.jcraft.jsch.JSchException: Auth fail
> >>>      at com.jcraft.jsch.Session.connect(Session.java:491)
> >>>      at
> >>>
> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
> >>>      at
> >>>
> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
> >>>      at
> >>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
> >>>      ... 10 more
> >>
> >>
> >
>

Re: Early SSH failures when provisioning on GCE

Posted by Jason Dusek <ja...@gmail.com>.
Hi Ignasi,

Thanks for being willing to help out.

Because the code base is proprietary, and factored into a few
components that are generic in their handling of the different clouds,
it's quite difficult to extract a snippet that brings together all the
ingredients in an illustrative way.

Is there some logging I could add or some options I could set for
better diagnostics?

It would be great if you could explain what triggers
RunScriptOnNodeAsInitScriptUsingSsh, because if we can find a way to
delay or disable it, it'd put an end to our troubles. If we're careful
to catch the exception caused by this component and then to scan over
available instances to find the relevant ones, generally we find that
everything came up okay and we're able to move forward with the
deployment -- so there appears to be no harm in skipping the
RunScriptOnNodeAsInitScriptUsingSsh task.
--
Jason Dusek
@solidsnack


On 5 July 2014 07:23, Ignasi Barrera <na...@apache.org> wrote:
> Hi Jason,
>
> Could you share the code snippet you use to create the nodes, so we can try
> to reproduce it?
>
> Thanks!
>
> I.
>
> El 03/07/2014 23:46, "Daniel Widdis" <wi...@gmail.com> escribió:
>
>> I can't answer specific to GCE, but with Rackspace I've used an awaitSsh()
>> call first, which just sits and waits for the SSH to be available (or times
>> out after a long time).  Once that method returns, you can use other methods
>> to log in and run scripts.   It sounds like a similar thing would work for
>> you.
>>
>> Here's the awaitSsh() stolen from the examples on the jclouds site:
>>
>>   /**
>>    * Wait until ssh is available on specified ip
>>    *
>>    * @param ip
>>    *        The IP Address to check
>>    * @throws TimeoutException
>>    */
>>   private void awaitSsh(String ip) throws TimeoutException {
>>     // SshjSshClientModule module = new SshjSshClientModule();
>>     SocketOpen socketOpen = computeService.getContext().utils().injector()
>>         .getInstance(SocketOpen.class);
>>     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5, 5,
>> SECONDS);
>>     socketTester.apply(HostAndPort.fromParts(ip, 22));
>>   }
>>
>>
>> On 7/3/14, 2:01 PM, Jason Dusek wrote:
>>>
>>> Hi All,
>>>
>>> When provisioning instances on Google's Compute Engine service,
>>> we are encountering a failure when jClouds attempts to SSH in to
>>> the instance early in the boot process.
>>>
>>> The exception is thrown as a result of a call to
>>> `computeService.createNodesInGroup()`. When the error is thrown,
>>> the SSH user and key which are being used are printed in the
>>> stack trace. Logging in using the SSH CLI tool, passing the
>>> indicated user and key, works as expected.
>>>
>>> One question that arises from all this is: why is jClouds
>>> logging into the servers this early? Is there something in the
>>> template that causes it to try to configure something? The
>>> options passed to the template are in this case very simple; in
>>> particular, no `runScript()` options are passed.
>>>
>>> Another is: what is a safe and reasonable way to resolve this
>>> error case? Perhaps there is a way to delay the step where
>>> jClouds SSHes to the instance, or to disable the
>>> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
>>>
>>> We've found jClouds to be a big help so far and it'd be great to
>>> find an idiomatic way to resolve this issue. Please let me know
>>> if there is any additional information I can provide.
>>>
>>> --
>>> Jason Dusek
>>> @solidsnack
>>>
>>>
>>>
>>>
>>> [2014-07-03 17:17:47,000] ERROR << problem customizing
>>> node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
>>> org.jclouds.rest.AuthorizationException:
>>>
>>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>>>
>>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>>> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
>>> connectTimeout=60000, sessionTimeout=60000}: Auth fail
>>>      at
>>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
>>>      at
>>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
>>>      at
>>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
>>>      at
>>> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
>>>      at
>>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
>>>      at
>>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
>>>      at
>>> org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
>>>      at
>>> com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
>>>      at
>>> com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
>>>      at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:724)
>>> Caused by: com.jcraft.jsch.JSchException: Auth fail
>>>      at com.jcraft.jsch.Session.connect(Session.java:491)
>>>      at
>>> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
>>>      at
>>> org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
>>>      at
>>> org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
>>>      ... 10 more
>>
>>
>

Re: Early SSH failures when provisioning on GCE

Posted by Ignasi Barrera <na...@apache.org>.
Hi Jason,

Could you share the code snippet you use to create the nodes, so we can try
to reproduce it?

Thanks!

I.
El 03/07/2014 23:46, "Daniel Widdis" <wi...@gmail.com> escribió:

> I can't answer specific to GCE, but with Rackspace I've used an awaitSsh()
> call first, which just sits and waits for the SSH to be available (or times
> out after a long time).  Once that method returns, you can use other
> methods to log in and run scripts.   It sounds like a similar thing would
> work for you.
>
> Here's the awaitSsh() stolen from the examples on the jclouds site:
>
>   /**
>    * Wait until ssh is available on specified ip
>    *
>    * @param ip
>    *        The IP Address to check
>    * @throws TimeoutException
>    */
>   private void awaitSsh(String ip) throws TimeoutException {
>     // SshjSshClientModule module = new SshjSshClientModule();
>     SocketOpen socketOpen = computeService.getContext().utils().injector()
>         .getInstance(SocketOpen.class);
>     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5, 5,
> SECONDS);
>     socketTester.apply(HostAndPort.fromParts(ip, 22));
>   }
>
>
> On 7/3/14, 2:01 PM, Jason Dusek wrote:
>
>> Hi All,
>>
>> When provisioning instances on Google's Compute Engine service,
>> we are encountering a failure when jClouds attempts to SSH in to
>> the instance early in the boot process.
>>
>> The exception is thrown as a result of a call to
>> `computeService.createNodesInGroup()`. When the error is thrown,
>> the SSH user and key which are being used are printed in the
>> stack trace. Logging in using the SSH CLI tool, passing the
>> indicated user and key, works as expected.
>>
>> One question that arises from all this is: why is jClouds
>> logging into the servers this early? Is there something in the
>> template that causes it to try to configure something? The
>> options passed to the template are in this case very simple; in
>> particular, no `runScript()` options are passed.
>>
>> Another is: what is a safe and reasonable way to resolve this
>> error case? Perhaps there is a way to delay the step where
>> jClouds SSHes to the instance, or to disable the
>> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
>>
>> We've found jClouds to be a big help so far and it'd be great to
>> find an idiomatic way to resolve this issue. Please let me know
>> if there is any additional information I can provide.
>>
>> --
>> Jason Dusek
>> @solidsnack
>>
>>
>>
>>
>> [2014-07-03 17:17:47,000] ERROR << problem customizing
>> node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
>> org.jclouds.rest.AuthorizationException:
>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:
>> e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:
>> ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:
>> e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:
>> ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
>> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
>> connectTimeout=60000, sessionTimeout=60000}: Auth fail
>>      at org.jclouds.ssh.jsch.JschSshClient.propagate(
>> JschSshClient.java:335)
>>      at org.jclouds.ssh.jsch.JschSshClient.acquire(
>> JschSshClient.java:187)
>>      at org.jclouds.ssh.jsch.JschSshClient.connect(
>> JschSshClient.java:200)
>>      at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsi
>> ngSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
>>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.java:125)
>>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.java:146)
>>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOr
>> PutExceptionIntoBadMap.java:53)
>>      at com.google.common.util.concurrent.Futures$1.apply(
>> Futures.java:711)
>>      at com.google.common.util.concurrent.Futures$
>> ChainingListenableFuture.run(Futures.java:849)
>>      at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:724)
>> Caused by: com.jcraft.jsch.JSchException: Auth fail
>>      at com.jcraft.jsch.Session.connect(Session.java:491)
>>      at org.jclouds.ssh.jsch.SessionConnection.create(
>> SessionConnection.java:186)
>>      at org.jclouds.ssh.jsch.SessionConnection.create(
>> SessionConnection.java:39)
>>      at org.jclouds.ssh.jsch.JschSshClient.acquire(
>> JschSshClient.java:180)
>>      ... 10 more
>>
>
>

Re: Early SSH failures when provisioning on GCE

Posted by Daniel Widdis <wi...@gmail.com>.
I can't answer specific to GCE, but with Rackspace I've used an 
awaitSsh() call first, which just sits and waits for the SSH to be 
available (or times out after a long time).  Once that method returns, 
you can use other methods to log in and run scripts.   It sounds like a 
similar thing would work for you.

Here's the awaitSsh() stolen from the examples on the jclouds site:

   /**
    * Wait until ssh is available on specified ip
    *
    * @param ip
    *        The IP Address to check
    * @throws TimeoutException
    */
   private void awaitSsh(String ip) throws TimeoutException {
     // SshjSshClientModule module = new SshjSshClientModule();
     SocketOpen socketOpen = computeService.getContext().utils().injector()
         .getInstance(SocketOpen.class);
     Predicate<HostAndPort> socketTester = retry(socketOpen, 300, 5, 5, 
SECONDS);
     socketTester.apply(HostAndPort.fromParts(ip, 22));
   }


On 7/3/14, 2:01 PM, Jason Dusek wrote:
> Hi All,
>
> When provisioning instances on Google's Compute Engine service,
> we are encountering a failure when jClouds attempts to SSH in to
> the instance early in the boot process.
>
> The exception is thrown as a result of a call to
> `computeService.createNodesInGroup()`. When the error is thrown,
> the SSH user and key which are being used are printed in the
> stack trace. Logging in using the SSH CLI tool, passing the
> indicated user and key, works as expected.
>
> One question that arises from all this is: why is jClouds
> logging into the servers this early? Is there something in the
> template that causes it to try to configure something? The
> options passed to the template are in this case very simple; in
> particular, no `runScript()` options are passed.
>
> Another is: what is a safe and reasonable way to resolve this
> error case? Perhaps there is a way to delay the step where
> jClouds SSHes to the instance, or to disable the
> `RunScriptOnNodeAsInitScriptUsingSsh` step entirely.
>
> We've found jClouds to be a big help so far and it'd be great to
> find an idiomatic way to resolve this issue. Please let me know
> if there is any additional information I can provide.
>
> --
> Jason Dusek
> @solidsnack
>
>
>
>
> [2014-07-03 17:17:47,000] ERROR << problem customizing
> node(us-central1-a/silly-cluster-cd758a1c-fc8):  (jclouds.compute:91)
> org.jclouds.rest.AuthorizationException:
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
> (jclouds:rsa[fingerprint(88:06:0b:b5:67:34:d2:ef:71:10:b8:e5:8d:e9:ea:65),sha1(c4:07:6b:1f:cf:f1:32:51:96:70:eb:43:77:ee:74:84:8c:da:bf:6d)]@173.255.113.102:22)
> error acquiring {hostAndPort=173.255.113.102:22, loginUser=jclouds,
> connectTimeout=60000, sessionTimeout=60000}: Auth fail
>      at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:335)
>      at org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:187)
>      at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:200)
>      at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:76)
>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:125)
>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
>      at org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
>      at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
>      at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>      at java.lang.Thread.run(Thread.java:724)
> Caused by: com.jcraft.jsch.JSchException: Auth fail
>      at com.jcraft.jsch.Session.connect(Session.java:491)
>      at org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:186)
>      at org.jclouds.ssh.jsch.SessionConnection.create(SessionConnection.java:39)
>      at org.jclouds.ssh.jsch.JschSshClient.acquire(JschSshClient.java:180)
>      ... 10 more