You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by bo...@apache.org on 2015/08/24 15:52:16 UTC

[31/50] [abbrv] storm git commit: Adding stacktrace to the log. Modifying the design doc with nimbus discovery APIs.

Adding stacktrace to the log. Modifying the design doc with nimbus discovery APIs.


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/8d4e5618
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/8d4e5618
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/8d4e5618

Branch: refs/heads/master
Commit: 8d4e5618efa8a2e0a0ef9d5f199f0a644f31604c
Parents: a8aacca
Author: Parth Brahmbhatt <br...@gmail.com>
Authored: Tue Feb 17 14:56:44 2015 -0800
Committer: Parth Brahmbhatt <br...@gmail.com>
Committed: Tue Feb 17 14:56:44 2015 -0800

----------------------------------------------------------------------
 docs/documentation/nimbus-ha-design.md          | 54 +++++++++-----------
 .../jvm/backtype/storm/utils/NimbusClient.java  |  2 +-
 2 files changed, 25 insertions(+), 31 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/8d4e5618/docs/documentation/nimbus-ha-design.md
----------------------------------------------------------------------
diff --git a/docs/documentation/nimbus-ha-design.md b/docs/documentation/nimbus-ha-design.md
index 00fd115..672eece 100644
--- a/docs/documentation/nimbus-ha-design.md
+++ b/docs/documentation/nimbus-ha-design.md
@@ -169,35 +169,31 @@ The following sequence diagram describes the communication between different com
 ![Nimbus HA Topology Submission](images/nimbus_ha_topology_submission.png)
 
 ##Thrift and Rest API 
+In order to avoid workers/supervisors/ui talking to zookeeper for getting master nimbus address we are going to modify the 
+`getClusterInfo` API so it can also return nimbus information. getClusterInfo currently returns `ClusterSummary` instance
+which has a list of `supervisorSummary` and a list of 'topologySummary` instances. We will add a list of `NimbusSummary` 
+to the `ClusterSummary`. See the structures below:
+
+```thrift
+struct ClusterSummary {
+  1: required list<SupervisorSummary> supervisors;
+  3: required list<TopologySummary> topologies;
+  4: required list<NimbusSummary> nimbuses;
+}
 
-This section only exists to track and document how we can reduce the added load on zookeeper for nimbus discovery if the 
-performance numbers indicated any degradation. The actual implementation will not be part of nimbus HA unless we have 
-performance tests to indicate degradation.  
-
-In order to avoid workers/supervisors/ui talking to zookeeper for getting master nimbus address we can add following new API:
-
-```java
-/**
-* Returns list of all nimbus hosts that are either currently in queue or has
-* the leadership lock.
-*/
-List<NimbusInfo> getNimbusHosts();
-
-/**
-* NimbusInfo
-*/
-Class NimbusInfo {
-	String host;
-	short port;
-	boolean isLeader;
+struct NimbusSummary {
+  1: required string host;
+  2: required i32 port;
+  3: required i32 uptime_secs;
+  4: required bool isLeader;
+  5: required string version;
 }
 ```
 
-These apis will be used by StormSubmitter, Nimbus clients,supervisors and ui to discover the current leaders and participating 
+This will be used by StormSubmitter, Nimbus clients,supervisors and ui to discover the current leaders and participating 
 nimbus hosts. Any nimbus host will be able to respond to these requests. The nimbus hosts can read this information once 
-from zookeeper and cache it and keep updating the cache when the watchers are fired to indicate any changes,which should be 
-rare in general case. In addition we should update all the existing thrift and rest apis’s to throw redirect 
-exceptions when a non leader receives a request that only a leader should serve.
+from zookeeper and cache it and keep updating the cache when the watchers are fired to indicate any changes,which should 
+be rare in general case.
 
 ## Configuration
 You can use nimbus ha with default configuration , however the default configuration assumes a single nimbus host so it
@@ -210,14 +206,12 @@ actual code/config and to get the current replication count. An alternative is t
 "org.apache.storm.hdfs.ha.codedistributor.HDFSCodeDistributor" which relies on HDFS but does not add extra load on zookeeper and will 
 make topology submission faster.
 * topology.min.replication.count : Minimum number of nimbus hosts where the code must be replicated before leader nimbus
-can mark the topology as active and create assignments. Default is 1. in case of HDFSCodeDistributor this represents number
-of data nodes instead of nimbus hosts where code must be replicated before activating topology.
+can mark the topology as active and create assignments. Default is 1.
 * topology.max.replication.wait.time.sec: Maximum wait time for the nimbus host replication to achieve the nimbus.min.replication.count.
 Once this time is elapsed nimbus will go ahead and perform topology activation tasks even if required nimbus.min.replication.count is not achieved. 
 The default is 60 seconds, a value of -1 indicates to wait for ever.
-*nimbus.code.sync.freq.secs: frequency at which the background thread which syncs code for locally missing topologies will run. default is 5 minutes.
+*nimbus.code.sync.freq.secs: frequency at which the background thread on nimbus which syncs code for locally missing topologies will run. default is 5 minutes.
 
 Note: Even though all nimbus hosts have watchers on zookeeper to be notified immediately as soon as a new topology is available for code
-download, due to eventual consistency of zookeeper the callback pretty much never results in code download. In practice we have observed that
-the desired replication is only achieved once the background-thread runs. So you should expect your topology submission time to be somewhere between
-0 to (2 * nimbus.code.sync.freq.secs) for any nimbus.min.replication.count > 0.
\ No newline at end of file
+download, the callback pretty much never results in code download. In practice we have observed that the desired replication is only achieved once the background-thread runs. 
+So you should expect your topology submission time to be somewhere between 0 to (2 * nimbus.code.sync.freq.secs) for any nimbus.min.replication.count > 1.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/8d4e5618/storm-core/src/jvm/backtype/storm/utils/NimbusClient.java
----------------------------------------------------------------------
diff --git a/storm-core/src/jvm/backtype/storm/utils/NimbusClient.java b/storm-core/src/jvm/backtype/storm/utils/NimbusClient.java
index e4222e4..39d3895 100644
--- a/storm-core/src/jvm/backtype/storm/utils/NimbusClient.java
+++ b/storm-core/src/jvm/backtype/storm/utils/NimbusClient.java
@@ -60,7 +60,7 @@ public class NimbusClient extends ThriftClient {
                 throw new RuntimeException("Found nimbuses " + nimbuses + " none of which is elected as leader, please try " +
                         "again after some time.");
             } catch (Exception e) {
-                LOG.warn("Ignoring exception while trying to get leader nimbus info from {}", seed);
+                LOG.warn("Ignoring exception while trying to get leader nimbus info from " + seed, e);
             }
         }
         throw new RuntimeException("Could not find leader nimbus from seed hosts " + seeds +". " +