You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/02 00:37:34 UTC
[jira] [Commented] (STORM-166) Highly available Nimbus

    [ https://issues.apache.org/jira/browse/STORM-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155684#comment-14155684 ] 

ASF GitHub Bot commented on STORM-166:
--------------------------------------

Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/61#issuecomment-57552912
  
    @yveschina The main concern I have is with catastrophic failure of a nimbus node during code distribution. I'm not sure it's acceptable to force users to resubmit a topology in that event.
    
    I'm working with @Parth-Brahmbhatt on a similar solution that involves a pluggable code distribution interface (either bittorrent or a distributed FS) that will also be compatible with the security work being done (e.g. code distribution backed by a secure HDFS).
    
    More details of that work are available in the JIRA, and we will be posting a much more detailed design doc in the future.
    
    For the time being, let's keep this pull request open.


> Highly available Nimbus
> -----------------------
>
>                 Key: STORM-166
>                 URL: https://issues.apache.org/jira/browse/STORM-166
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: James Xu
>            Assignee: Parth Brahmbhatt
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/360
> The goal of this feature is to be able to run multiple Nimbus servers so that if one goes down another one will transparently take over. Here's what needs to happen to implement this:
> 1. Everything currently stored on local disk on Nimbus needs to be stored in a distributed and reliable fashion. A DFS is perfect for this. However, as we do not want to make a DFS a mandatory requirement to run Storm, the storage of these artifacts should be pluggable (default to local filesystem, but the interface should support DFS). You would only be able to run multiple NImbus if you use the right storage, and the storage interface chosen should have a flag indicating whether it's suitable for HA mode or not. If you choose local storage and try to run multiple Nimbus, one of the Nimbus's should fail to launch.
> 2. Nimbus's should register themselves in Zookeeper. They should use a leader election protocol to decide which one is currently responsible for launching and monitoring topologies.
> 3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case the leader changes during submission, it should use a retry protocol to try reconnecting to the new leader and attempting submission again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)