You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by arunmahadevan <gi...@git.apache.org> on 2017/02/20 16:23:15 UTC

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

GitHub user arunmahadevan opened a pull request:

    https://github.com/apache/storm/pull/1945

    [STORM-2367] Documentation for streams API

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/arunmahadevan/storm STORM-2367

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1945
    
----
commit 503ff5822b851dff5ea24efc3922a56aec67b5bf
Author: Arun Mahadevan <ar...@apache.org>
Date:   2017-02-09T13:40:08Z

    [STORM-2367] Documentation for streams API

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1945: [STORM-2367] Documentation for streams API

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1945
  
    +1 Nice explanation.
    
    Since it's documentation you don't need to get +1 and even don't need to wait for 24hr. Or we can wait for more eyes. It's up to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on the issue:

    https://github.com/apache/storm/pull/1945
  
    @HeartSaVioR better to wait for 1-2 days to give a chance for others to review as this is not really urgent to push in short time to be part of some release. :) 
    I will review this PR by today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102142501
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    +        * [filter] (#filter)
    +        * [map] (#map)
    +        * [flatmap] (#flatmap)
    +    * [Windowing] (#windowing)
    +    * [Transformation to key-value pairs] (#keyvaluepairs)
    +        * [mapToPair] (#mapflatmaptopair)
    +        * [flatMapToPair] (#mapflatmaptopair)
    +    * [Aggregations] (#aggregations)
    +        * [aggregate] (#aggregatereduce)
    +        * [reduce] (#aggregatereduce)
    +        * [aggregateByKey] (#aggregatereducebykey)
    +        * [reduceByKey] (#aggregatereducebykey)
    +        * [groupByKey] (#groupbykey)
    +        * [countByKey] (#countbykey)
    +    * [Repartition](#repartition)
    +    * [Output operations](#outputoperations)
    +        * [print](#print)
    +        * [peek](#peek)
    +        * [forEach](#foreach)
    +        * [to](#to)
    +    * [Branch](#branching)
    +    * [Joins](#joins)
    +    * [State](#state)
    +        * [updateStateByKey](#updatestatebykey)
    +        * [stateQuery](#statequery)
    +* [Guarantees](#guarantees)        
    +* [Example](#example)
    +
    +Historically Storm provided Spout and Bolt apis for expressing streaming computations. Though these apis are fairly simple to use, 
    +there are no reusable constructs for expressing common streaming operations like filtering, transformations, windowing, joins, 
    +aggregations and so on.
    +
    +Stream APIs build on top of the Storm's spouts and bolts to provide a typed API for expressing streaming computations and supports functional style operations such as map-reduce. 
    +
    +# <a name="concepts"></a> Concepts 
    +
    +Conceptually a `Stream` can be thought of as a stream of messages flowing through a pipeline. A `Stream` may be generated by reading messages out of a source like spout, or by transforming other streams. For example,
    +
    +```java
    +// imports
    +import org.apache.storm.streams.Stream;
    +import org.apache.storm.streams.StreamBuilder;
    +...
    +
    +StreamBuilder builder = new StreamBuilder();
    +
    +// a stream of sentences obtained from a source spout
    +Stream<String> sentences = builder.newStream(new RandomSentenceSpout()).map(tuple -> tuple.getString(0));
    +
    +// a stream of words obtained by transforming (splitting) the stream of sentences
    +Stream<String> words = sentences.flatMap(s -> Arrays.asList(s.split(" ")));
    +
    +// output operation that prints the words to console
    +words.forEach(w -> System.out.println(w));
    +```
    +
    +
    +Most stream operations accept parameters that describe user-specified behavior typically via lambda expressions like `s -> Arrays.asList(s.split(" "))` as in the above example.
    +
    +A `Stream` supports two kinds of operations, 
    +
    +1. **Transformations** that produce another stream from the current stream  (like the `flatMap` operation in the example above) 
    +1. **Output operations** that produce a result. (like the `forEach` operation in the example above).
    +
    +## <a name="streambuilder"></a> Stream Builder
    +
    +`StreamBuilder` provides the builder apis to create a new stream. Typically a spout forms the source of a stream.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +Stream<Tuple> sentences = builder.newStream(new TestSentenceSpout());
    +```
    +
    +The `StreamBuilder` tracks the overall pipeline of operations expressed via the Stream. One can then create the Storm topology
    +via `build()` and submit it like a normal storm topology via `StormSubmitter`.
    +
    +```java
    +StormSubmitter.submitTopologyWithProgressBar("test", new Config(), streamBuilder.build());
    +```
    +
    +## <a name="valuemapper"></a> Value mapper
    +
    +Value mappers can be used to extract specific fields from the tuples emitted from a spout to produce a typed stream of values. Value mappers are passed as arguments to the `StreamBuilder.newStream`.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +
    +// extract the first field from the tuple to get a Stream<String> of sentences
    +Stream<String> sentences = builder.newStream(new TestWordSpout(), new ValueMapper<String>(0));
    +```
    +
    +Storm provides strongly typed tuples via the `Pair` and Tuple classes (Tuple3 upto Tuple10). One can use a `TupleValueMapper` to produce a stream of typed tuples as shown below.
    +
    +```java
    +// extract first three fields of the tuple emitted by the spout to produce a stream of typed tuples.
    +Stream<Tuple3<String, Integer, Long>> stream = builder.newStream(new TestSpout(), TupleValueMappers.of(0, 1, 2));
    +```
    +
    +# <a name="streamapis"></a> Stream APIs 
    +
    +Storm's streaming apis currently support a wide range of operations such as transformations, filters, windowing, aggregations, branching, joins, stateful, output and debugging operations.  
    +
    +## <a name="basictransformations"></a> Basic transformations 
    +
    +### <a name="filter"></a> filter
    +
    +`filter` returns a stream consisting of the elements of the stream that matches the given `Predicate` (for which the predicate returns true). 
    +
    +```java
    +Stream<String> logs = ...
    +Stream<String> errors = logs.filter(line -> line.contains("ERROR"));
    +```
    +
    +In the above example log lines with 'ERROR' are filtered into an error stream which can be then be further processed.
    +
    +### <a name="map"></a> map 
    +
    +`map` returns a stream consisting of the result of applying the given mapping function to the values of the stream.
    +
    +```java
    +Stream<String> words = ...
    +Stream<Integer> wordLengths = words.map(String::length);
    +```
    +
    +The example generates a stream of word lengths from a stream of words by applying the String.length function on each value. Note that the type of the resultant stream of a map operation can be different from that of the original stream.  
    +
    +### <a name="flatmap"></a> flatMap
    +
    +`flatMap` returns a stream consisting of the results of replacing each value of the stream with the contents produced by applying the provided mapping function to each value. This is similar to map but each value can be mapped to 0 or more values.
    +
    +```java
    +Stream<String> sentences = ...
    +Stream<String> words = sentences.flatMap(s -> Arrays.asList(s.split(" ")));
    +```
    +
    +
    +In the above example, the lambda function splits each value in the stream to a list of words and the flatMap function generates a flattened stream of words out of it.
    +
    +## <a name="windowing"></a> Windowing
    +
    +A `window` operation produces a windowed stream consisting of the elements that fall within the window as specified by the window parameter. All the windowing options supported in the underlying windowed bolts is supported via the Stream apis.
    --- End diff --
    
    All the windowing options supported in the underlying windowed bolts ~~is~~ are supported via the Stream apis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102196464
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    +        * [filter] (#filter)
    +        * [map] (#map)
    +        * [flatmap] (#flatmap)
    +    * [Windowing] (#windowing)
    +    * [Transformation to key-value pairs] (#keyvaluepairs)
    +        * [mapToPair] (#mapflatmaptopair)
    +        * [flatMapToPair] (#mapflatmaptopair)
    +    * [Aggregations] (#aggregations)
    +        * [aggregate] (#aggregatereduce)
    +        * [reduce] (#aggregatereduce)
    +        * [aggregateByKey] (#aggregatereducebykey)
    +        * [reduceByKey] (#aggregatereducebykey)
    +        * [groupByKey] (#groupbykey)
    +        * [countByKey] (#countbykey)
    +    * [Repartition](#repartition)
    +    * [Output operations](#outputoperations)
    +        * [print](#print)
    +        * [peek](#peek)
    +        * [forEach](#foreach)
    +        * [to](#to)
    +    * [Branch](#branching)
    +    * [Joins](#joins)
    +    * [State](#state)
    +        * [updateStateByKey](#updatestatebykey)
    +        * [stateQuery](#statequery)
    +* [Guarantees](#guarantees)        
    +* [Example](#example)
    +
    +Historically Storm provided Spout and Bolt apis for expressing streaming computations. Though these apis are fairly simple to use, 
    +there are no reusable constructs for expressing common streaming operations like filtering, transformations, windowing, joins, 
    +aggregations and so on.
    +
    +Stream APIs build on top of the Storm's spouts and bolts to provide a typed API for expressing streaming computations and supports functional style operations such as map-reduce. 
    +
    +# <a name="concepts"></a> Concepts 
    +
    +Conceptually a `Stream` can be thought of as a stream of messages flowing through a pipeline. A `Stream` may be generated by reading messages out of a source like spout, or by transforming other streams. For example,
    +
    +```java
    +// imports
    +import org.apache.storm.streams.Stream;
    +import org.apache.storm.streams.StreamBuilder;
    +...
    +
    +StreamBuilder builder = new StreamBuilder();
    +
    +// a stream of sentences obtained from a source spout
    +Stream<String> sentences = builder.newStream(new RandomSentenceSpout()).map(tuple -> tuple.getString(0));
    +
    +// a stream of words obtained by transforming (splitting) the stream of sentences
    +Stream<String> words = sentences.flatMap(s -> Arrays.asList(s.split(" ")));
    +
    +// output operation that prints the words to console
    +words.forEach(w -> System.out.println(w));
    +```
    +
    +
    +Most stream operations accept parameters that describe user-specified behavior typically via lambda expressions like `s -> Arrays.asList(s.split(" "))` as in the above example.
    +
    +A `Stream` supports two kinds of operations, 
    +
    +1. **Transformations** that produce another stream from the current stream  (like the `flatMap` operation in the example above) 
    +1. **Output operations** that produce a result. (like the `forEach` operation in the example above).
    +
    +## <a name="streambuilder"></a> Stream Builder
    +
    +`StreamBuilder` provides the builder apis to create a new stream. Typically a spout forms the source of a stream.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +Stream<Tuple> sentences = builder.newStream(new TestSentenceSpout());
    +```
    +
    +The `StreamBuilder` tracks the overall pipeline of operations expressed via the Stream. One can then create the Storm topology
    +via `build()` and submit it like a normal storm topology via `StormSubmitter`.
    +
    +```java
    +StormSubmitter.submitTopologyWithProgressBar("test", new Config(), streamBuilder.build());
    +```
    +
    +## <a name="valuemapper"></a> Value mapper
    +
    +Value mappers can be used to extract specific fields from the tuples emitted from a spout to produce a typed stream of values. Value mappers are passed as arguments to the `StreamBuilder.newStream`.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +
    +// extract the first field from the tuple to get a Stream<String> of sentences
    +Stream<String> sentences = builder.newStream(new TestWordSpout(), new ValueMapper<String>(0));
    +```
    +
    +Storm provides strongly typed tuples via the `Pair` and Tuple classes (Tuple3 upto Tuple10). One can use a `TupleValueMapper` to produce a stream of typed tuples as shown below.
    +
    +```java
    +// extract first three fields of the tuple emitted by the spout to produce a stream of typed tuples.
    +Stream<Tuple3<String, Integer, Long>> stream = builder.newStream(new TestSpout(), TupleValueMappers.of(0, 1, 2));
    +```
    +
    +# <a name="streamapis"></a> Stream APIs 
    +
    +Storm's streaming apis currently support a wide range of operations such as transformations, filters, windowing, aggregations, branching, joins, stateful, output and debugging operations.  
    +
    --- End diff --
    
    Added reference the the Stream/PairStream java files. Maybe the javadoc is not generated automatically,  I am not able to find the link.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102143718
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    --- End diff --
    
    Same as the above comment to remove space between [] and () for generating links till aggregations section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102196235
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    --- End diff --
    
    It seems the space does not matter. I am able to click on the links here - https://github.com/arunmahadevan/storm/blob/STORM-2367/docs/Stream-API.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102217060
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    --- End diff --
    
    Some of the IDE MD plugins and tools are not generating links appropriately when there is a space. You can see the syntax [here](https://guides.github.com/features/mastering-markdown/) and [here] (https://en.support.wordpress.com/markdown-quick-reference/)
    
    Even Github comments throws broken links when there is space between [] and (), you can observe above `here` links to see the difference. I guess it is better to address that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102196192
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    --- End diff --
    
    It seems the space does not matter. I am able to click on the links here - https://github.com/arunmahadevan/storm/blob/STORM-2367/docs/Stream-API.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1945: [STORM-2367] Documentation for streams API

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1945
  
    Sure. :) Just would like to remind the bylaw for documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102216885
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    --- End diff --
    
    Some of the IDE MD plugins and tools are not generating links appropriately when there is a space. You can see the syntax [here](https://guides.github.com/features/mastering-markdown/) and [here] (https://en.support.wordpress.com/markdown-quick-reference/)
    
    Even Github comments throws broken links when there is space between [] and (), you can observe above `here` links to see the difference. I guess it is better to address that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102246874
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    --- End diff --
    
    updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102143480
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    +    * [Value mapper] (#valuemapper)
    +* [Stream APIs](#streamapis)
    +    * [Basic transformations] (#basictransformations)
    +        * [filter] (#filter)
    +        * [map] (#map)
    +        * [flatmap] (#flatmap)
    +    * [Windowing] (#windowing)
    +    * [Transformation to key-value pairs] (#keyvaluepairs)
    +        * [mapToPair] (#mapflatmaptopair)
    +        * [flatMapToPair] (#mapflatmaptopair)
    +    * [Aggregations] (#aggregations)
    +        * [aggregate] (#aggregatereduce)
    +        * [reduce] (#aggregatereduce)
    +        * [aggregateByKey] (#aggregatereducebykey)
    +        * [reduceByKey] (#aggregatereducebykey)
    +        * [groupByKey] (#groupbykey)
    +        * [countByKey] (#countbykey)
    +    * [Repartition](#repartition)
    +    * [Output operations](#outputoperations)
    +        * [print](#print)
    +        * [peek](#peek)
    +        * [forEach](#foreach)
    +        * [to](#to)
    +    * [Branch](#branching)
    +    * [Joins](#joins)
    +    * [State](#state)
    +        * [updateStateByKey](#updatestatebykey)
    +        * [stateQuery](#statequery)
    +* [Guarantees](#guarantees)        
    +* [Example](#example)
    +
    +Historically Storm provided Spout and Bolt apis for expressing streaming computations. Though these apis are fairly simple to use, 
    +there are no reusable constructs for expressing common streaming operations like filtering, transformations, windowing, joins, 
    +aggregations and so on.
    +
    +Stream APIs build on top of the Storm's spouts and bolts to provide a typed API for expressing streaming computations and supports functional style operations such as map-reduce. 
    +
    +# <a name="concepts"></a> Concepts 
    +
    +Conceptually a `Stream` can be thought of as a stream of messages flowing through a pipeline. A `Stream` may be generated by reading messages out of a source like spout, or by transforming other streams. For example,
    +
    +```java
    +// imports
    +import org.apache.storm.streams.Stream;
    +import org.apache.storm.streams.StreamBuilder;
    +...
    +
    +StreamBuilder builder = new StreamBuilder();
    +
    +// a stream of sentences obtained from a source spout
    +Stream<String> sentences = builder.newStream(new RandomSentenceSpout()).map(tuple -> tuple.getString(0));
    +
    +// a stream of words obtained by transforming (splitting) the stream of sentences
    +Stream<String> words = sentences.flatMap(s -> Arrays.asList(s.split(" ")));
    +
    +// output operation that prints the words to console
    +words.forEach(w -> System.out.println(w));
    +```
    +
    +
    +Most stream operations accept parameters that describe user-specified behavior typically via lambda expressions like `s -> Arrays.asList(s.split(" "))` as in the above example.
    +
    +A `Stream` supports two kinds of operations, 
    +
    +1. **Transformations** that produce another stream from the current stream  (like the `flatMap` operation in the example above) 
    +1. **Output operations** that produce a result. (like the `forEach` operation in the example above).
    +
    +## <a name="streambuilder"></a> Stream Builder
    +
    +`StreamBuilder` provides the builder apis to create a new stream. Typically a spout forms the source of a stream.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +Stream<Tuple> sentences = builder.newStream(new TestSentenceSpout());
    +```
    +
    +The `StreamBuilder` tracks the overall pipeline of operations expressed via the Stream. One can then create the Storm topology
    +via `build()` and submit it like a normal storm topology via `StormSubmitter`.
    +
    +```java
    +StormSubmitter.submitTopologyWithProgressBar("test", new Config(), streamBuilder.build());
    +```
    +
    +## <a name="valuemapper"></a> Value mapper
    +
    +Value mappers can be used to extract specific fields from the tuples emitted from a spout to produce a typed stream of values. Value mappers are passed as arguments to the `StreamBuilder.newStream`.
    +
    +```java
    +StreamBuilder builder = new StreamBuilder();
    +
    +// extract the first field from the tuple to get a Stream<String> of sentences
    +Stream<String> sentences = builder.newStream(new TestWordSpout(), new ValueMapper<String>(0));
    +```
    +
    +Storm provides strongly typed tuples via the `Pair` and Tuple classes (Tuple3 upto Tuple10). One can use a `TupleValueMapper` to produce a stream of typed tuples as shown below.
    +
    +```java
    +// extract first three fields of the tuple emitted by the spout to produce a stream of typed tuples.
    +Stream<Tuple3<String, Integer, Long>> stream = builder.newStream(new TestSpout(), TupleValueMappers.of(0, 1, 2));
    +```
    +
    +# <a name="streamapis"></a> Stream APIs 
    +
    +Storm's streaming apis currently support a wide range of operations such as transformations, filters, windowing, aggregations, branching, joins, stateful, output and debugging operations.  
    +
    --- End diff --
    
    You may want to give a link to Javadoc of `Stream` class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1945#discussion_r102142060
  
    --- Diff: docs/Stream-API.md ---
    @@ -0,0 +1,473 @@
    +---
    +title: Stream API Overview
    +layout: documentation
    +documentation: true
    +---
    +
    +* [Concepts](#concepts)
    +    * [Stream Builder] (#streambuilder)
    --- End diff --
    
    You may need to remove space between [] and () to generate links. You need to do this till Aggregations section below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1945: [STORM-2367] Documentation for streams API

Posted by satishd <gi...@git.apache.org>.
Github user satishd commented on the issue:

    https://github.com/apache/storm/pull/1945
  
    +1, Nice summary of streams API. Thanks @arunmahadevan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---