You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by bo...@apache.org on 2017/05/15 21:54:52 UTC

[1/4] storm git commit: STORM-2493: update documents to reflect the changes

Repository: storm
Updated Branches:
  refs/heads/master 7742398d0 -> d4ee957a9


STORM-2493: update documents to reflect the changes


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/e80f9e20
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/e80f9e20
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/e80f9e20

Branch: refs/heads/master
Commit: e80f9e20db208555753b93f48c4af175dbe47c6a
Parents: 500691d
Author: vesense <be...@163.com>
Authored: Thu Apr 27 12:40:57 2017 +0800
Committer: vesense <be...@163.com>
Committed: Thu Apr 27 12:40:57 2017 +0800

----------------------------------------------------------------------
 docs/Acking-framework-implementation.md | 11 ++--
 docs/DSLs-and-multilang-adapters.md     |  3 +-
 docs/Implementation-docs.md             |  1 +
 docs/index.md                           |  6 +-
 docs/storm-pmml.md                      | 37 +++++++++++
 docs/storm-rocketmq.md                  | 94 ++++++++++++++++++++++++++++
 6 files changed, 145 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/Acking-framework-implementation.md
----------------------------------------------------------------------
diff --git a/docs/Acking-framework-implementation.md b/docs/Acking-framework-implementation.md
index 5ca5d93..f181e98 100644
--- a/docs/Acking-framework-implementation.md
+++ b/docs/Acking-framework-implementation.md
@@ -1,13 +1,16 @@
 ---
+title: Acking framework implementation
 layout: documentation
+documentation: true
 ---
-[Storm's acker](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L28) tracks completion of each tupletree with a checksum hash: each time a tuple is sent, its value is XORed into the checksum, and each time a tuple is acked its value is XORed in again. If all tuples have been successfully acked, the checksum will be zero (the odds that the checksum will be zero otherwise are vanishingly small).
+
+[Storm's acker]({{page.git-blob-base}}/storm-client/src/jvm/org/apache/storm/daemon/Acker.java) tracks completion of each tupletree with a checksum hash: each time a tuple is sent, its value is XORed into the checksum, and each time a tuple is acked its value is XORed in again. If all tuples have been successfully acked, the checksum will be zero (the odds that the checksum will be zero otherwise are vanishingly small).
 
 You can read a bit more about the [reliability mechanism](Guaranteeing-message-processing.html#what-is-storms-reliability-api) elsewhere on the wiki -- this explains the internal details.
 
 ### acker `execute()`
 
-The acker is actually a regular bolt, with its  [execute method](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L36) defined withing `mk-acker-bolt`.  When a new tupletree is born, the spout sends the XORed edge-ids of each tuple recipient, which the acker records in its `pending` ledger. Every time an executor acks a tuple, the acker receives a partial checksum that is the XOR of the tuple's own edge-id (clearing it from the ledger) and the edge-id of each downstream tuple the executor emitted (thus entering them into the ledger).
+The acker is actually a regular bolt.  When a new tupletree is born, the spout sends the XORed edge-ids of each tuple recipient, which the acker records in its `pending` ledger. Every time an executor acks a tuple, the acker receives a partial checksum that is the XOR of the tuple's own edge-id (clearing it from the ledger) and the edge-id of each downstream tuple the executor emitted (thus entering them into the ledger).
 
 This is accomplished as follows.
 
@@ -17,7 +20,7 @@ On a tick tuple, just advance pending tupletree checksums towards death and retu
 * on ack:  xor the partial checksum into the existing checksum value
 * on fail: just mark it as failed
 
-Next, [put the record](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L50)),  into the RotatingMap (thus resetting is countdown to expiry) and take action:
+Next, put the record into the RotatingMap (thus resetting is countdown to expiry) and take action:
 
 * if the total checksum is zero, the tupletree is complete: remove it from the pending collection and notify the spout of success
 * if the tupletree has failed, it is also complete:   remove it from the pending collection and notify the spout of failure
@@ -26,7 +29,7 @@ Finally, pass on an ack of our own.
 
 ### Pending tuples and the `RotatingMap`
 
-The acker stores pending tuples in a [`RotatingMap`](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/RotatingMap.java#L19), a simple device used in several places within Storm to efficiently time-expire a process.
+The acker stores pending tuples in a [`RotatingMap`]({{page.git-blob-base}}/storm-client/src/jvm/org/apache/storm/utils/RotatingMap.java), a simple device used in several places within Storm to efficiently time-expire a process.
 
 The RotatingMap behaves as a HashMap, and offers the same O(1) access guarantees.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/DSLs-and-multilang-adapters.md
----------------------------------------------------------------------
diff --git a/docs/DSLs-and-multilang-adapters.md b/docs/DSLs-and-multilang-adapters.md
index 0ed5450..917b419 100644
--- a/docs/DSLs-and-multilang-adapters.md
+++ b/docs/DSLs-and-multilang-adapters.md
@@ -3,8 +3,9 @@ title: Storm DSLs and Multi-Lang Adapters
 layout: documentation
 documentation: true
 ---
+* [Clojure DSL](Clojure-DSL.html)
 * [Scala DSL](https://github.com/velvia/ScalaStorm)
 * [JRuby DSL](https://github.com/colinsurprenant/redstorm)
-* [Clojure DSL](Clojure-DSL.html)
 * [Storm/Esper integration](https://github.com/tomdz/storm-esper): Streaming SQL on top of Storm
 * [io-storm](https://github.com/dan-blanchard/io-storm): Perl multilang adapter
+* [FsShelter](https://github.com/Prolucid/FsShelter): F# DSL and runtime with protobuf multilang

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/Implementation-docs.md
----------------------------------------------------------------------
diff --git a/docs/Implementation-docs.md b/docs/Implementation-docs.md
index 9eb91f5..93459f3 100644
--- a/docs/Implementation-docs.md
+++ b/docs/Implementation-docs.md
@@ -8,6 +8,7 @@ This section of the wiki is dedicated to explaining how Storm is implemented. Yo
 - [Structure of the codebase](Structure-of-the-codebase.html)
 - [Lifecycle of a topology](Lifecycle-of-a-topology.html)
 - [Message passing implementation](Message-passing-implementation.html)
+- [Acking framework implementation](Acking-framework-implementation.html)
 - [Metrics](Metrics.html)
 - [Nimbus HA](nimbus-ha-design.html)
 - [Storm SQL](storm-sql-internal.html)

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 64cd15d..1b5c621 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -71,7 +71,7 @@ But small change will not affect the user experience. We will notify the user wh
 
 * [Serialization](Serialization.html)
 * [Common patterns](Common-patterns.html)
-* [Clojure DSL](Clojure-DSL.html)
+* [DSLs and multilang adapters](DSLs-and-multilang-adapters.html)
 * [Using non-JVM languages with Storm](Using-non-JVM-languages-with-Storm.html)
 * [Distributed RPC](Distributed-RPC.html)
 * [Transactional topologies](Transactional-topologies.html)
@@ -95,16 +95,18 @@ But small change will not affect the user experience. We will notify the user wh
 * [Apache Hive Integration](storm-hive.html)
 * [Apache Solr Integration](storm-solr.html)
 * [Apache Cassandra Integration](storm-cassandra.html)
+* [Apache RocketMQ Integration](storm-rocketmq.html)
 * [JDBC Integration](storm-jdbc.html)
 * [JMS Integration](storm-jms.html)
+* [MQTT Integration](storm-mqtt.html)
 * [Redis Integration](storm-redis.html)
 * [Event Hubs Intergration](storm-eventhubs.html)
 * [Elasticsearch Integration](storm-elasticsearch.html)
-* [MQTT Integration](storm-mqtt.html)
 * [Mongodb Integration](storm-mongodb.html)
 * [OpenTSDB Integration](storm-opentsdb.html)
 * [Kinesis Integration](storm-kinesis.html)
 * [Druid Integration](storm-druid.html)
+* [PMML Integration](storm-pmml.html)
 * [Kestrel Integration](Kestrel-and-Storm.html)
 
 #### Container, Resource Management System Integration

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/storm-pmml.md
----------------------------------------------------------------------
diff --git a/docs/storm-pmml.md b/docs/storm-pmml.md
new file mode 100644
index 0000000..ad2d5b8
--- /dev/null
+++ b/docs/storm-pmml.md
@@ -0,0 +1,37 @@
+#Storm PMML Bolt
+ Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents
+ the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a 
+ runtime environment, which will score the raw data that comes in the tuples. 
+
+#Create Instance of PMML Bolt
+ To create an instance of the `PMMLPredictorBolt`, you must provide the `ModelOutputs`, and a `ModelRunner` using a 
+ `ModelRunnerFactory`. The `ModelOutputs` represents the streams and output fields declared by the `PMMLPredictorBolt`.
+ The `ModelRunner` represents the runtime environment to execute the predictive scoring. It has only one method: 
+ 
+ ```java
+    Map<String, List<Object>> scoredTuplePerStream(Tuple input); 
+ ```
+ 
+ This method contains the logic to compute the scored tuples from the raw inputs tuple.  It's up to the discretion of the 
+ implementation to define which scored values are to be assigned to each stream. The keys of this map are the stream ids, 
+ and the values the predicted scores. 
+   
+ The `PmmlModelRunner` is an extension of `ModelRunner` that represents the typical steps involved 
+ in predictive scoring. Hence, it allows for the **extraction** of raw inputs from the tuple, **pre process** the 
+ raw inputs, and **predict** the scores from the preprocessed data.
+ 
+ The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that uses [JPMML](https://github.com/jpmml/jpmml) as
+ runtime environment. This implementation extracts the raw inputs from the tuple for all `active fields`, 
+ and builds a tuple with the predicted scores for the `predicted fields` and `output fields`. 
+ In this implementation all the declared streams will have the same scored tuple.
+ 
+ The `predicted`, `active`, and `output` fields are extracted from the PMML model.
+
+#Run Bundled Examples
+
+To run the examples you must execute the following command:
+ 
+ ```java
+ STORM-HOME/bin/storm jar STORM-HOME/examples/storm-pmml-examples/storm-pmml-examples-2.0.0-SNAPSHOT.jar 
+ org.apache.storm.pmml.JpmmlRunnerTestTopology jpmmlTopology PMMLModel.xml RawInputData.csv
+ ```

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/storm-rocketmq.md
----------------------------------------------------------------------
diff --git a/docs/storm-rocketmq.md b/docs/storm-rocketmq.md
new file mode 100644
index 0000000..17daf8c
--- /dev/null
+++ b/docs/storm-rocketmq.md
@@ -0,0 +1,94 @@
+# Storm RocketMQ
+
+Storm/Trident integration for [RocketMQ](https://rocketmq.incubator.apache.org/). This package includes the core spout, bolt and trident states that allows a storm topology to either write storm tuples into a topic or read from topics in a storm topology.
+
+
+## Read from Topic
+The spout included in this package for reading data from a topic.
+
+### RocketMQSpout
+To use the `RocketMQSpout`,  you construct an instance of it by specifying a Properties instance which including rocketmq configs.
+RocketMQSpout uses RocketMQ MQPushConsumer as the default implementation. PushConsumer is a high level consumer API, wrapping the pulling details. Looks like broker push messages to consumer.
+RocketMQSpout will retry 3(use `SpoutConfig.DEFAULT_MESSAGES_MAX_RETRY` to change the value) times when messages are failed.
+
+ ```java
+        Properties properties = new Properties();
+        properties.setProperty(SpoutConfig.NAME_SERVER_ADDR, nameserverAddr);
+        properties.setProperty(SpoutConfig.CONSUMER_GROUP, group);
+        properties.setProperty(SpoutConfig.CONSUMER_TOPIC, topic);
+
+        RocketMQSpout spout = new RocketMQSpout(properties);
+ ```
+
+
+## Write into Topic
+The bolt and trident state included in this package for write data into a topic.
+
+### TupleToMessageMapper
+The main API for mapping Storm tuple to a RocketMQ Message is the `org.apache.storm.rocketmq.common.mapper.TupleToMessageMapper` interface:
+
+```java
+public interface TupleToMessageMapper extends Serializable {
+    String getKeyFromTuple(ITuple tuple);
+    byte[] getValueFromTuple(ITuple tuple);
+}
+```
+
+### FieldNameBasedTupleToMessageMapper
+`storm-rocketmq` includes a general purpose `TupleToMessageMapper` implementation called `FieldNameBasedTupleToMessageMapper`.
+
+### TopicSelector
+The main API for selecting topic and tags is the `org.apache.storm.rocketmq.common.selector.TopicSelector` interface:
+
+```java
+public interface TopicSelector extends Serializable {
+    String getTopic(ITuple tuple);
+    String getTag(ITuple tuple);
+}
+```
+
+### DefaultTopicSelector/FieldNameBasedTopicSelector
+`storm-rocketmq` includes general purpose `TopicSelector` implementations called `DefaultTopicSelector` and `FieldNameBasedTopicSelector`.
+
+
+### RocketMQBolt
+To use the `RocketMQBolt`, you construct an instance of it by specifying TupleToMessageMapper, TopicSelector and Properties instances.
+RocketMQBolt send messages async by default. You can change this by invoking `withAsync(false)`.
+
+ ```java
+        TupleToMessageMapper mapper = new FieldNameBasedTupleToMessageMapper("word", "count");
+        TopicSelector selector = new DefaultTopicSelector(topic);
+
+        properties = new Properties();
+        properties.setProperty(RocketMQConfig.NAME_SERVER_ADDR, nameserverAddr);
+
+        RocketMQBolt insertBolt = new RocketMQBolt()
+                .withMapper(mapper)
+                .withSelector(selector)
+                .withProperties(properties);
+ ```
+
+### Trident State
+We support trident persistent state that can be used with trident topologies. To create a RocketMQ persistent trident state you need to initialize it with the TupleToMessageMapper, TopicSelector, Properties instances. See the example below:
+
+ ```java
+        TupleToMessageMapper mapper = new FieldNameBasedTupleToMessageMapper("word", "count");
+        TopicSelector selector = new DefaultTopicSelector(topic);
+
+        Properties properties = new Properties();
+        properties.setProperty(RocketMQConfig.NAME_SERVER_ADDR, nameserverAddr);
+
+        RocketMQState.Options options = new RocketMQState.Options()
+                .withMapper(mapper)
+                .withSelector(selector)
+                .withProperties(properties);
+
+        StateFactory factory = new RocketMQStateFactory(options);
+
+        TridentTopology topology = new TridentTopology();
+        Stream stream = topology.newStream("spout1", spout);
+
+        stream.partitionPersist(factory, fields,
+                new RocketMQStateUpdater(), new Fields());
+ ```
+


[4/4] storm git commit: Added STORM-2493 to Changelog

Posted by bo...@apache.org.
Added STORM-2493 to Changelog


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/d4ee957a
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/d4ee957a
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/d4ee957a

Branch: refs/heads/master
Commit: d4ee957a9d677f7ea2ec684915ea88f7717f5d6e
Parents: 0eb365a
Author: Robert Evans <ev...@yahoo-inc.com>
Authored: Mon May 15 16:46:41 2017 -0500
Committer: Robert Evans <ev...@yahoo-inc.com>
Committed: Mon May 15 16:46:41 2017 -0500

----------------------------------------------------------------------
 CHANGELOG.md | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/d4ee957a/CHANGELOG.md
----------------------------------------------------------------------
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e5cc149..c39297c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,5 @@
 ## 2.0.0
+ * STORM-2493: update documents to reflect the changes
  * STORM-2511: Submitting a topology with name containing unicode getting failed
  * STORM-2510: update checkstyle configuration to lower violations
  * STORM-2479: Fix port assignment race condition in storm-webapp tests


[3/4] storm git commit: Merge branch 'STORM-2493-documents' of https://github.com/vesense/storm into STORM-2493

Posted by bo...@apache.org.
Merge branch 'STORM-2493-documents' of https://github.com/vesense/storm into STORM-2493

STORM-2493: update documents to reflect the changes


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/0eb365ab
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/0eb365ab
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/0eb365ab

Branch: refs/heads/master
Commit: 0eb365abff0ac3c3564266c14172dc9a21b5fea8
Parents: 7742398 98f679d
Author: Robert Evans <ev...@yahoo-inc.com>
Authored: Mon May 15 16:46:14 2017 -0500
Committer: Robert Evans <ev...@yahoo-inc.com>
Committed: Mon May 15 16:46:14 2017 -0500

----------------------------------------------------------------------
 docs/Acking-framework-implementation.md | 11 ++--
 docs/DSLs-and-multilang-adapters.md     |  3 +-
 docs/Implementation-docs.md             |  1 +
 docs/index.md                           |  6 +-
 docs/storm-pmml.md                      | 37 +++++++++++
 docs/storm-rocketmq.md                  | 94 ++++++++++++++++++++++++++++
 external/storm-pmml/README.md           |  6 +-
 7 files changed, 148 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/0eb365ab/docs/index.md
----------------------------------------------------------------------


[2/4] storm git commit: STORM-2493: fix spacing for storm-pmml.md

Posted by bo...@apache.org.
STORM-2493: fix spacing for storm-pmml.md


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/98f679db
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/98f679db
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/98f679db

Branch: refs/heads/master
Commit: 98f679db33f143346bf4bd52f078ba83012bfa0e
Parents: e80f9e2
Author: vesense <be...@163.com>
Authored: Fri Apr 28 16:33:31 2017 +0800
Committer: vesense <be...@163.com>
Committed: Fri Apr 28 16:33:31 2017 +0800

----------------------------------------------------------------------
 docs/storm-pmml.md            | 6 +++---
 external/storm-pmml/README.md | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/98f679db/docs/storm-pmml.md
----------------------------------------------------------------------
diff --git a/docs/storm-pmml.md b/docs/storm-pmml.md
index ad2d5b8..5051489 100644
--- a/docs/storm-pmml.md
+++ b/docs/storm-pmml.md
@@ -1,9 +1,9 @@
-#Storm PMML Bolt
+# Storm PMML Bolt
  Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents
  the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a 
  runtime environment, which will score the raw data that comes in the tuples. 
 
-#Create Instance of PMML Bolt
+## Create Instance of PMML Bolt
  To create an instance of the `PMMLPredictorBolt`, you must provide the `ModelOutputs`, and a `ModelRunner` using a 
  `ModelRunnerFactory`. The `ModelOutputs` represents the streams and output fields declared by the `PMMLPredictorBolt`.
  The `ModelRunner` represents the runtime environment to execute the predictive scoring. It has only one method: 
@@ -27,7 +27,7 @@
  
  The `predicted`, `active`, and `output` fields are extracted from the PMML model.
 
-#Run Bundled Examples
+## Run Bundled Examples
 
 To run the examples you must execute the following command:
  

http://git-wip-us.apache.org/repos/asf/storm/blob/98f679db/external/storm-pmml/README.md
----------------------------------------------------------------------
diff --git a/external/storm-pmml/README.md b/external/storm-pmml/README.md
index e4a5d0d..d237f55 100644
--- a/external/storm-pmml/README.md
+++ b/external/storm-pmml/README.md
@@ -1,9 +1,9 @@
-#Storm PMML Bolt
+# Storm PMML Bolt
  Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents
  the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a 
  runtime environment, which will score the raw data that comes in the tuples. 
 
-#Create Instance of PMML Bolt
+## Create Instance of PMML Bolt
  To create an instance of the `PMMLPredictorBolt`, you must provide the `ModelOutputs`, and a `ModelRunner` using a 
  `ModelRunnerFactory`. The `ModelOutputs` represents the streams and output fields declared by the `PMMLPredictorBolt`.
  The `ModelRunner` represents the runtime environment to execute the predictive scoring. It has only one method: 
@@ -27,7 +27,7 @@
  
  The `predicted`, `active`, and `output` fields are extracted from the PMML model.
 
-#Run Bundled Examples
+## Run Bundled Examples
 
 To run the examples you must execute the following command: