You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by pt...@apache.org on 2015/05/15 20:56:30 UTC

[04/12] storm git commit: Add documentation about new topology context info available as of STORM-789.

Add documentation about new topology context info available as of STORM-789.

Also cleaned up some awkward language describing heartbeats.


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/4255ab5d
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/4255ab5d
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/4255ab5d

Branch: refs/heads/0.10.x-branch
Commit: 4255ab5d1eb24b7884b7305e57edcabe6f2e3cbb
Parents: c47d7d4
Author: Dan Blanchard <da...@parsely.com>
Authored: Mon Apr 20 16:33:51 2015 -0400
Committer: P. Taylor Goetz <pt...@gmail.com>
Committed: Fri May 15 14:04:19 2015 -0400

----------------------------------------------------------------------
 docs/documentation/Multilang-protocol.md | 63 ++++++++++++++++++++-------
 1 file changed, 47 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/4255ab5d/docs/documentation/Multilang-protocol.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Multilang-protocol.md b/docs/documentation/Multilang-protocol.md
index 43afd32..017ad32 100644
--- a/docs/documentation/Multilang-protocol.md
+++ b/docs/documentation/Multilang-protocol.md
@@ -49,7 +49,7 @@ STDIN and STDOUT.
 
 The initial handshake is the same for both types of shell components:
 
-* STDIN: Setup info. This is a JSON object with the Storm configuration, Topology context, and a PID directory, like this:
+* STDIN: Setup info. This is a JSON object with the Storm configuration, a PID directory, and a topology context, like this:
 
 ```
 {
@@ -57,15 +57,32 @@ The initial handshake is the same for both types of shell components:
         "topology.message.timeout.secs": 3,
         // etc
     },
+    "pidDir": "...",
     "context": {
         "task->component": {
             "1": "example-spout",
             "2": "__acker",
-            "3": "example-bolt"
+            "3": "example-bolt1",
+            "4": "example-bolt2"
         },
-        "taskid": 3
-    },
-    "pidDir": "..."
+        "taskid": 3,
+        // Everything below this line is only available in Storm 0.11.0+
+        "componentid": "example-bolt"
+        "stream->target->grouping": {
+        	"default": {
+        		"example-bolt2": {
+        			"type": "SHUFFLE"}}},
+        "streams": ["default"],
+ 		"stream->outputfields": {"default": ["word"]},
+	    "source->stream->grouping": {
+	    	"example-spout": {
+	    		"default": {
+	    			"type": "FIELDS",
+	    			"fields": ["word"]
+	    		}
+	    	}
+	    }
+	}
 }
 ```
 
@@ -73,6 +90,15 @@ Your script should create an empty file named with its PID in this directory. e.
 the PID is 1234, so an empty file named 1234 is created in the directory. This
 file lets the supervisor know the PID so it can shutdown the process later on.
 
+As of Storm 0.11.0, the context sent by Storm to shell components has been
+enhanced substantially to include all aspects of the topology context available
+to JVM components.  One key addition is the ability to determine a shell
+component's source and targets (i.e., inputs and outputs) in the topology via
+the `stream->target->grouping` and `source->stream->grouping` dictionaries.  At
+the innermost level of these nested dictionaries, groupings are represented as
+a dictionary that minimally has a `type` key, but can also have a `fields` key
+to specify which fields are involved in a `FIELDS` grouping.
+
 * STDOUT: Your PID, in a JSON object, like `{"pid": 1234}`. The shell component will log the PID to its log.
 
 What happens next depends on the type of component:
@@ -222,30 +248,35 @@ A "log" will log a message in the worker log. It looks like:
 * Note that, as of version 0.7.1, there is no longer any need for a
   shell bolt to 'sync'.
 
-### Handling Heartbeat (0.9.3 and later)
+### Handling Heartbeats (0.9.3 and later)
 
-ShellSpout/ShellBolt has been introduced from [STORM-513](https://issues.apache.org/jira/browse/STORM-513) to prevent hanging/zombie subprocess.
+As of Storm 0.9.3, heartbeats have been between ShellSpout/ShellBolt and their
+multi-lang subprocesses to detect hanging/zombie subprocesses.  Any libraries
+for interfacing with Storm via multi-lang must take the following actions
+regarding hearbeats:
 
-* Spout
+#### Spout
 
-Shell spouts are synchronous, and subprocess always send 'sync' at the end of next() so you don't need to take care of.
-One thing you have to take care of is, don't let subprocess sleep too much from next(), especially keep it less to worker timeout.
+Shell spouts are synchronous, so subprocesses always send `sync` commands at the
+end of `next()`,  so you should not have to do much to support heartbeats for
+spouts.  That said, you must not let subprocesses sleep more than the worker
+timeout during `next()`.
 
-* Bolt
+#### Bolt
 
-Shell bolts are asynchronous, so ShellBolt will send heartbeat tuple periodically.
-Heartbeat tuple looks like:
+Shell bolts are asynchronous, so a ShellBolt will send heartbeat tuples to its
+subprocess periodically.  Heartbeat tuple looks like:
 
 ```
 {
 	"id": "-6955786537413359385",
 	"comp": "1",
-	// heartbeat tuple
 	"stream": "__heartbeat",
-	// it's system task id
+	// this shell bolt's system task id
 	"task": -1,
 	"tuple": []
 }
 ```
 
-When subprocess receives heartbeat tuple, it should send 'sync' to ShellBolt.
+When subprocess receives heartbeat tuple, it must send a `sync` command back to
+ShellBolt.