You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@libcloud.apache.org by "Mark Nottingham (Created) (JIRA)" <ji...@apache.org> on 2012/02/18 01:11:59 UTC

[dev] [jira] [Created] (LIBCLOUD-157) Deployment script retries are brain-dead

Deployment script retries are brain-dead
----------------------------------------

                 Key: LIBCLOUD-157
                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-157
             Project: Libcloud
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.0
            Reporter: Mark Nottingham


in common/base, NodeDriver._run_deployment_script has the following retry wrapper:

        tries = 0
        while tries < max_tries:
            try:
                node = task.run(node, ssh_client)
            except Exception:
                tries += 1
                if tries >= max_tries:
                    raise LibcloudError(value='Failed after %d tries'
                                        % (max_tries), driver=self)
            else:
                ssh_client.close()
                return node

The except Exception swallows *all* errors, making debugging very hard.

Furthermore, max_tries is effectively hard-coded in deploy_node():

            self._run_deployment_script(task=kwargs['deploy'],
                                        node=node,
                                        ssh_client=ssh_client,
                                        max_tries=3)

... forcing people who want to control retries to spin their own deploy_node().

Suggestions:
  - at a minimum, log or warn about the error that's caught in the retry loop
  - better yet, make the catch more fine-grained, so that errors that we know won't be retry-able will fail out immediately. 
  - think about making the default number of max_tries 1
  - make max_tries controllable from deploy_node

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[dev] [jira] [Commented] (LIBCLOUD-157) Deployment script retries are brain-dead

Posted by "Tomaz Muraus (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LIBCLOUD-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210743#comment-13210743 ] 

Tomaz Muraus commented on LIBCLOUD-157:
---------------------------------------

I agree that debugging deployment issues is currently pretty hard. I had this problem myself so I have recently added some changes so now if you use LIBCLOUD_DEBUG=<file obj> this will also turn on paramiko debug mode so this way you at least see paramiko debug messages.

In any case I like the suggestion #2, and #4. As far as the #3 goes I think max_retries=1 is too low, because in many cases node is returned in the response, but the actually server hasn't been fully started yet (SSH server is not yet listening).

In cases like this paramiko throws a socket timeout errors and if max_retries=1 deployment would fail.
                
> Deployment script retries are brain-dead
> ----------------------------------------
>
>                 Key: LIBCLOUD-157
>                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-157
>             Project: Libcloud
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Mark Nottingham
>
> in common/base, NodeDriver._run_deployment_script has the following retry wrapper:
>         tries = 0
>         while tries < max_tries:
>             try:
>                 node = task.run(node, ssh_client)
>             except Exception:
>                 tries += 1
>                 if tries >= max_tries:
>                     raise LibcloudError(value='Failed after %d tries'
>                                         % (max_tries), driver=self)
>             else:
>                 ssh_client.close()
>                 return node
> The except Exception swallows *all* errors, making debugging very hard.
> Furthermore, max_tries is effectively hard-coded in deploy_node():
>             self._run_deployment_script(task=kwargs['deploy'],
>                                         node=node,
>                                         ssh_client=ssh_client,
>                                         max_tries=3)
> ... forcing people who want to control retries to spin their own deploy_node().
> Suggestions:
>   - at a minimum, log or warn about the error that's caught in the retry loop
>   - better yet, make the catch more fine-grained, so that errors that we know won't be retry-able will fail out immediately. 
>   - think about making the default number of max_tries 1
>   - make max_tries controllable from deploy_node

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[dev] [jira] [Assigned] (LIBCLOUD-157) Deployment script retries are brain-dead

Posted by "Tomaz Muraus (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LIBCLOUD-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomaz Muraus reassigned LIBCLOUD-157:
-------------------------------------

    Assignee: Tomaz Muraus
    
> Deployment script retries are brain-dead
> ----------------------------------------
>
>                 Key: LIBCLOUD-157
>                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-157
>             Project: Libcloud
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Mark Nottingham
>            Assignee: Tomaz Muraus
>
> in common/base, NodeDriver._run_deployment_script has the following retry wrapper:
>         tries = 0
>         while tries < max_tries:
>             try:
>                 node = task.run(node, ssh_client)
>             except Exception:
>                 tries += 1
>                 if tries >= max_tries:
>                     raise LibcloudError(value='Failed after %d tries'
>                                         % (max_tries), driver=self)
>             else:
>                 ssh_client.close()
>                 return node
> The except Exception swallows *all* errors, making debugging very hard.
> Furthermore, max_tries is effectively hard-coded in deploy_node():
>             self._run_deployment_script(task=kwargs['deploy'],
>                                         node=node,
>                                         ssh_client=ssh_client,
>                                         max_tries=3)
> ... forcing people who want to control retries to spin their own deploy_node().
> Suggestions:
>   - at a minimum, log or warn about the error that's caught in the retry loop
>   - better yet, make the catch more fine-grained, so that errors that we know won't be retry-able will fail out immediately. 
>   - think about making the default number of max_tries 1
>   - make max_tries controllable from deploy_node

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira