You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:17:28 UTC

[jira] [Updated] (STORM-137) add new feature: "topology package"

     [ https://issues.apache.org/jira/browse/STORM-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-137:
-------------------------------
    Component/s: storm-core

> add new feature: "topology package"
> -----------------------------------
>
>                 Key: STORM-137
>                 URL: https://issues.apache.org/jira/browse/STORM-137
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/557
> Submitting a topology to storm requires executing code which constructs bolts, spouts, a topology and then calling StormSubmitter.submitTopology and uploads the jar, config, serialized objects to Nimbus.
> If you want to have a topology binary store, so that you can retain all versions of production topologies, there's this really precarious property of stormSubmitter where the serialized object state needs to be recomputed, and the assumption is the code used to create serialized object state is a pure function, and has no external dependencies. If either of those properties are not true, for example maybe your API key is queried at this point in time then merely storing the jar is not sufficient to redeploy an older version of a topology (i'm making this up, but one user was trying to access ZK here, don't know why, it's just very precarious to recompute state each time.)
> So my proposal is adding to StormSubmitter and storm command line tool this ability to create "topology package" which contain inside the jar, object serialized state, and topology config (preferably in yaml). And then another StormSubmitter API for accepting topology packages.
> ---------
> jasonjckn: @nathanmarz could you comment on this asap?
> ---------
> nathanmarz: It would be interesting to have syntax like:
> storm package {name of output file} {jar} {class} {args}
> which changes the behavior of StormSubmitter#submitTopology to serialize the topology and package it with the jar into a "package" file. It would also be cool if storm deploy would automatically detect these package files and do the right thing with them.
> ---------
> jasonjckn: So will user code still call StormSubmitter.submitTopology? or should they call .createPackage?
> Right now people can call StormSubmitter.submitTopology as many times as they want in the main function and submit multiple topologies.
> storm package {name of output file} {jar} {class} {args}
> What happens if they call submitTopology twice, is the same outfile filename is used twice? That's why i'm recommending we do this:
> package mytopologpkg;
> class MyTopology {
> public static main(String[] args) {
> Topology topology = TopologyBuilder.setSpout(...).buildTopology();
> TopologyCommandLine.processAction(topology, args);
> }
> }
> Then the user would execute commands like this:
> java -cp topology.jar mytopologpkg.MyTopology make_package <filename>
> java -cp topology.jar mytopologpkg.MyTopology submit -c nimbus.host=xyz  <name>
> java -cp topology.jar mytopologpkg.MyTopology kill <name>
> java -cp topology.jar mytopologpkg.MyTopology submit_package -c nimbus.host=xyz <filename>
> This is also valid:
> java -cp topology.jar backtype.storm.CommandLine kill <name>
> java -cp topology.jar backtype.storm.CommandLine submit_package -c nimbus.host=xyz <filename>
> Users you have this complexity of needing to ensure the storm zip release you downloaded matches the library version you compiled the code with. Everything you need is actually in the storm library jar.
> My last idea has sit in my head for a while, It has a couple problems and not a good idea.
> Something like this would work:
> package mytopologpkg;
> class MyTopology implements storm.ITopologyPackage {
>      @override
>      Map getTopologyConf(String[] commandLineArgs) {
>            ......
>      }
>      @override
>      StormTopology getTopology(String[] commandLineArgs) {
>            .....
>      }
> }
> $ storm make-package-file mytopology.MyTopology package-filename.zip [commandLineArg]
> $ storm submit package-filename.zip -c nimbus.host=xyz -c topology.name=my-topology-name
> OR
> $ storm submit mytopology.MyTopology -c nimbus.host=xyz -c topology.name=my-topology-name  [commandLineArgs]
> NOTE: "-c" means override the conf returned from getTopologyConf.
> NOTE: there's no main function anymore, however if someone does write a main function, they can run it with "storm jar" semantically i think of this as "exec-jar-main with the ability to pass topology conf overrides" and semantically says nothing about how many topologies are submitted or if StormSubmitter.makePackage(filename, topology, stormConf) is called.
> static void main(String[] args) {
>       t1 = buildTopologyT1();
>       .... StormSubmitter.submitTopology(t1)
>       t2 = buildTopologyT2();
>       .... StormSubmitter.submitTopology(t2)
> } 
> I passed this issue to another engineer who was working on package versioning and they didn't want to implement this, so it never got done.
> I just realized there is a tradeoff going on here, if you persist the jar AND serialized object state in a version binary store, then if the class serializationUID ever changes, that state is useless. However if you merely store the jar in the version binary store, and rerun the main method with $ storm jar path.to.mainClass, even if the class serializationUID changed, it doesn't matter. So you get more affordances on making backwards compatible changes if you don't store serialized object state in a topology package as proposed above.
> An API for downloading a topology package would be useful for writing automated tools that move topologies between multiple storm clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)