You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/10/08 20:30:26 UTC

[jira] [Commented] (STORM-150) Replace jar distribution strategy with bittorent

    [ https://issues.apache.org/jira/browse/STORM-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949104#comment-14949104 ] 

ASF GitHub Bot commented on STORM-150:
--------------------------------------

Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/71#issuecomment-146646481
  
    @ptgoetz @revans2,
     shall we keep this pull request open?


> Replace jar distribution strategy with bittorent
> ------------------------------------------------
>
>                 Key: STORM-150
>                 URL: https://issues.apache.org/jira/browse/STORM-150
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/435
> Consider using http://turn.github.com/ttorrent/
> ----------
> ptgoetz: I've been looking into implementing this, but have a design question that boils down to this:
> Should the client (storm jar command) be the initial seeder, or should we wait and let nimbus do the seeding?
> The benefit of doing the seeding client-side is that we would only have to transfer a small .torrent through the nimbus thrift API. But I can imagine situations where the network environment would prevent BitTorrent clients from connecting back to the machine that's submitting the topology. This would create an indefinitely "stalled submission" since none of the cluster nodes would be able to connect to the seeder.
> The alternative would be to use the current technique of uploading the jar to nimbus, and have nimbus generate and distribute the .torrent file, and provide the initial seed. If the cluster is properly configured, we're pretty much guaranteed connectivity between nimbus and supervisor nodes.
> I'm leaning toward the latter approach, but would be interested in others' opinions.
> ----------
> nathanmarz: @ptgoetz I think Nimbus should do the seeding. That ensures that when the client finishes submitting, it can disconnect/go away without having to worry about making the topology unlaunchable.
> ----------
> jasonjckn: @nathanmarz How does this solve the nimbuses dependency on reliable local disk state (as you talked about in person)?
> What happens when zookeeper is offline for 1 hour? All the workers will die, and nimbus will be continually restarting. The onus is still on nimbus to store topology jars on local disk, so that when the workers and supervisors reboots it can seed all this again.
> You -can- solve the local disk persistence problem with replicated state to the non-elected nimbuses, but that's orthogonal to a distribution strategy. Yes there is some replication going on in bittorrent, but it's not really a protocol that delivers reliable persistence of state.
> I think it's still a good feature if it gives us performant topology submit times even with 500 workers, which take 3 minutes for us.
> Particularly with the worker heartbeat start-up timeout of 120s, you want to be able to start 500 workers within 120s, or even 1500 workers within 120s, the current distribution strategy is not scalable in that way.
> ----------
> nathanmarz: @jasonjckn On top of the bittorrent stuff we can ensure that a topology is considered submitted only when the artifacts exist on at least N nodes. Nimbus would only be the initial seed for topology files. Also, it wouldn't have to only be Nimbus that acts as a seed, that work could be shared by the supervisors. That's less relevant in the storm-mesos world, but you could still fairly easily run multiple Nimbus's to get replication.
> ----------
> jasonjckn: This PR might be aided by "topology packages" #557, as it bundles all the state that needs to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)