You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Dane Hammer (JIRA)" <ji...@apache.org> on 2014/10/07 20:18:37 UTC

[jira] [Commented] (STORM-167) proposal for storm topology online update

    [ https://issues.apache.org/jira/browse/STORM-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162240#comment-14162240 ] 

Dane Hammer commented on STORM-167:
-----------------------------------

I need this. There's already a lot of work done. What can I do to get this merged to master? Looks like some rebasing and correcting merge conflicts is in order, but what about approval of the design - is this how we want to do this?

> proposal for storm topology online update
> -----------------------------------------
>
>                 Key: STORM-167
>                 URL: https://issues.apache.org/jira/browse/STORM-167
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/540
> Now update topology code can only be done by kill it and re-submit a new one. During the kill and re-submit process some request may delay or fail. It is not so good for online service. So we consider to add topology online update recently.
> Mission
> update running topology code gracefully one worker after another without service total interrupted. Just update topology code, not update topology DAG structure including component, stream and task number.
> Proposal
> * client use "storm update topology-name new-jar-file" to submit new-jar-file update request
> * nimbus update stormdist dir, link topology-dir to new one
> * nimbus update topology version on zk
> * the supervisors that running this topology update it
> ** check topology version on zk, if it is not the same as local version, a topology update begin
> ** each supervisor schedule the topology's worker update at a rand(expect-max-update-time) time point
> ** sync-supervisor download the latest code from nimbus
> ** sync-process check local worker heartbeat version(to be added), if it is not the same with sync-supervisor downloaded version, kill the worker
> ** sync-process restart killed worker
> ** new worker heartbeat to zk with version(to be added), it can be displayed on web ui to check update progress.
> This feature is deployed in our production clusters. It's really useful for topologys handling online request waiting for response. Topology jar can be updated without entire service offline.
> We hope that this feature is useful for others too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)