You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by mxm <gi...@git.apache.org> on 2015/11/24 11:09:04 UTC

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/1398

    [FLINK-2837][storm] various improvements for Storm compatibility

    This pull request contains various fixes. Most prominently, the parsing logic for Storm topologies has been changed to support multiple inputs. The API has been redone slightly. Two new examples have been added.
    
    - refactor to use Storm's topology builder
    
    - remove FlinkTopologyBuilder
    
    - instantiate context-based StreamExecutionEnvironment (local or remote)
    
    - remove Flink and Storm behavior replicating classes
    
    - modify FlinkTopology to parse Storm topology directly
    
    - replace StormTestBase with StreamingTestBase
    
    - let the FiniteFileSpout finish in corner cases
    
    - expose taskId, fix off by one task id
    
    - add print example
    
    - FlinkTopologyBuilder changes (check if all inputs are available before processing)
    
    - correct package typo
    
    - support getter methods on TupleWrapper
    
    - two input support
    
    - add join example
    
    - update docs
    
    - use Flink file system access

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink storm-dev-dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1398
    
----
commit 613936cc7dce75e8dd811626cd2152b1f1383fe0
Author: Maximilian Michels <mx...@apache.org>
Date:   2015-11-12T13:39:45Z

    [FLINK-2837][storm] various improvements for Storm compatibility
    
    - refactor to use Storm's topology builder
    
    - remove FlinkTopologyBuilder
    
    - instantiate context-based StreamExecutionEnvironment (local or remote)
    
    - remove Flink and Storm behavior replicating classes
    
    - modify FlinkTopology to parse Storm topology directly
    
    - replace StormTestBase with StreamingTestBase
    
    - let the FiniteFileSpout finish in corner cases
    
    - expose taskId, fix off by one task id
    
    - add print example
    
    - FlinkTopologyBuilder changes (check if all inputs are available before processing)
    
    - correct package typo
    
    - two input support
    
    - add join example
    
    - update docs
    
    - use Flink file system access

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45734993
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FileSpout.java ---
    @@ -38,6 +38,8 @@
     	protected String path = null;
     	protected BufferedReader reader;
     
    +	protected boolean finished;
    +
     	public FileSpout() {}
    --- End diff --
    
    The point it the following. `FileSpout` is a Flink agnostic implementation that is improved in a Flink aware way by `FiniteFileSpout`. Thus, `FileSpout` should be implemented the Storm way without any knowledge of Flink. And `FiniteFileSpout` should use Flink stuff, but not change the code of `FileSpout`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45970656
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +45,30 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String producerStreamId;
    +	private final MessageId id;
    +	private final String producerComponentId;
    +
    +
    +	/**
    +	 * Constructor which sets defaults for producerComponentId, taskId, and componentID
    +	 * @param flinkTuple the Flink tuple
    +	 * @param schema The schema of the storm fields
    +	 */
    +	StormTuple(final IN flinkTuple, final Fields schema) {
    +		this(flinkTuple, schema, -1, "testStream", "componentID");
    +	}
    --- End diff --
    
    Ok. I guess it would make sense to use Storm's `Utils.DEFAULT_STREAM_ID` here? And maybe add a `public final static String DEFAULT_OPERATOR_ID` variable to `StormTuple`? What about using "defaultID" or "unspecified" instead of "componentID" or similar? Just to make it clear if the name shows up in the UI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45854121
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -150,7 +153,7 @@ static synchronized TopologyContext createTopologyContext(
     			}
     			stormTopology = new StormTopology(spouts, bolts, new HashMap<String, StateSpoutSpec>());
     
    -			taskId = context.getIndexOfThisSubtask();
    +			taskId = context.getIndexOfThisSubtask() + 1;
     
    --- End diff --
    
    Actually, it doesn't matter. I set this before changing the topology parsing logic. For some topologies it would only run with this fix. But this has been fixed so the +1 is not necessary anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45727062
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,6 +45,21 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String streamId;
    +	private final MessageId id;
    +	private final String componentId;
    +
    --- End diff --
    
    There is a JIRA for this. https://issues.apache.org/jira/browse/FLINK-2721 I am already working on it. I think that FLINK-2721 should be resolved before this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45889838
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -53,21 +51,26 @@
     	private static final long serialVersionUID = -4788589118464155835L;
     
     	/** The wrapped Storm {@link IRichBolt bolt}. */
    -	private final IRichBolt bolt;
    +	protected final IRichBolt bolt;
     	/** The name of the bolt. */
     	private final String name;
     	/** Number of attributes of the bolt's output tuples per stream. */
    -	private final HashMap<String, Integer> numberOfAttributes;
    +	protected final HashMap<String, Integer> numberOfAttributes;
     	/** The schema (ie, ordered field names) of the input stream. */
    -	private final Fields inputSchema;
    +	protected final Fields inputSchema;
     	/** The original Storm topology. */
     	protected StormTopology stormTopology;
     
    +	protected transient TopologyContext topologyContext;
    +
    +	protected final String inputComponentId;
    +	protected final String inputStreamId;
    +
    --- End diff --
    
    Please add JavaDoc to those three.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890289
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -108,20 +112,19 @@ public BoltWrapper(final IRichBolt bolt, final Fields inputSchema)
     	 * for POJO input types. The output type can be any type if parameter {@code rawOutput} is {@code true} and the
     	 * bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will be one
     	 * of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    +	 *  @param bolt
    --- End diff --
    
    space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964241
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkOutputFieldsDeclarerTest.java ---
    @@ -18,9 +18,7 @@
     
     import backtype.storm.tuple.Fields;
     import backtype.storm.utils.Utils;
    -
     import org.apache.flink.api.common.typeinfo.TypeInformation;
    -import org.apache.flink.storm.api.FlinkOutputFieldsDeclarer;
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46035298
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -224,7 +224,7 @@ static synchronized TopologyContext createTopologyContext(
     	 *            OUTPUT: A map from all component IDs to there output streams and output fields.
     	 * 
     	 * @return A unique task ID if the currently processed Spout or Bolt ({@code componentId}) is equal to the current
    -	 *         Flink operator ({@link operatorName}) -- {@code null} otherwise.
    +	 *         Flink operator ({@param operatorName}) -- {@code null} otherwise.
    --- End diff --
    
    I guess `@link` is wrong, but does `@param` work? Maybe `@code` would be correct? But I am not sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-160982088
  
    I've addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893790
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/TestBolt.java ---
    @@ -16,14 +16,14 @@
      */
     package org.apache.flink.storm.api;
     
    -import java.util.Map;
    -
     import backtype.storm.task.OutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichBolt;
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Tuple;
     
    +import java.util.Map;
    +
     public class TestBolt implements IRichBolt {
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45729369
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -75,11 +78,13 @@
     	 * 
     	 * @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param componentId
    +	 * @param streamId
     	 * @throws IllegalArgumentException
     	 *             If the number of declared output attributes is not with range [0;25].
     	 */
    -	public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
    -		this(bolt, null, (Collection<String>) null);
    +	public BoltWrapper(final IRichBolt bolt, String componentId, String streamId) throws IllegalArgumentException {
    +		this(bolt, componentId, streamId, null, (Collection<String>) null);
    --- End diff --
    
    The naming is a bit odd but this is actually the component and stream id of the *producer*.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45725866
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FiniteFileSpout.java ---
    @@ -32,46 +23,17 @@
     public class FiniteFileSpout extends FileSpout implements FiniteSpout {
     	private static final long serialVersionUID = -1472978008607215864L;
     
    -	private String line;
    -	private boolean newLineRead;
    -
     	public FiniteFileSpout() {}
     
     	public FiniteFileSpout(String path) {
     		super(path);
     	}
     
    -	@SuppressWarnings("rawtypes")
    -	@Override
    -	public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
    -		super.open(conf, context, collector);
    -		newLineRead = false;
    -	}
    -
    -	@Override
    -	public void nextTuple() {
    -		this.collector.emit(new Values(line));
    -		newLineRead = false;
    -	}
    -
     	/**
     	 * Can be called before nextTuple() any times including 0.
     	 */
     	@Override
     	public boolean reachedEnd() {
    -		try {
    -			readLine();
    -		} catch (IOException e) {
    -			throw new RuntimeException("Exception occured while reading file " + path);
    -		}
    -		return line == null;
    +		return finished;
     	}
    --- End diff --
    
    This example shows how a regular Storm spout can be improved
    using FiniteSpout interface -- I would keep it as is (even if seems to
    be unnecessary complicated -- imagine that you don't have the code of
    FileSpout)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733816
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -75,11 +78,13 @@
     	 * 
     	 * @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param componentId
    +	 * @param streamId
     	 * @throws IllegalArgumentException
     	 *             If the number of declared output attributes is not with range [0;25].
     	 */
    -	public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
    -		this(bolt, null, (Collection<String>) null);
    +	public BoltWrapper(final IRichBolt bolt, String componentId, String streamId) throws IllegalArgumentException {
    +		this(bolt, componentId, streamId, null, (Collection<String>) null);
    --- End diff --
    
    Okay, I will check if I can use the information contained in TopologyContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45886387
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/WordCountLocal.java ---
    @@ -57,16 +57,13 @@ public static void main(final String[] args) throws Exception {
     		}
     
     		// build Topology the Storm way
    -		final FlinkTopologyBuilder builder = WordCountTopology.buildTopology();
    +		final TopologyBuilder builder = WordCountTopology.buildTopology(false);
     
    --- End diff --
    
    Please remove `false` -- this test should use index (and not name) to specify the key. `WordCountLocalByName` does it the other way round such that both cases are covered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964904
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/SpoutWrapperTest.java ---
    @@ -21,7 +21,6 @@
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichSpout;
     import backtype.storm.tuple.Fields;
    -
     import org.apache.flink.api.common.ExecutionConfig;
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45886615
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/operators/WordCountFileSpout.java ---
    @@ -17,10 +17,9 @@
     
     package org.apache.flink.storm.wordcount.operators;
     
    -import org.apache.flink.storm.util.FileSpout;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

https://github.com/apache/flink/pull/1398#discussion_r45724616

--- Diff: docs/apis/storm_compatibility.md ---
@@ -57,20 +57,22 @@ See *WordCount Storm* within `flink-storm-examples/pom.xml` for an example how t

Flink provides a Storm compatible API (`org.apache.flink.storm.api`) that offers replacements for the following classes:

-- `TopologyBuilder` replaced by `FlinkTopologyBuilder`
- `StormSubmitter` replaced by `FlinkSubmitter`
- `NimbusClient` and `Client` replaced by `FlinkClient`
- `LocalCluster` replaced by `FlinkLocalCluster`

-In order to submit a Storm topology to Flink, it is sufficient to replace the used Storm classes with their Flink replacements in the Storm *client code that assembles* the topology.
-The actual runtime code, ie, Spouts and Bolts, can be uses *unmodified*.
-If a topology is executed in a remote cluster, parameters `nimbus.host` and `nimbus.thrift.port` are used as `jobmanger.rpc.address` and `jobmanger.rpc.port`, respectively.
-If a parameter is not specified, the value is taken from `flink-conf.yaml`.
+In order to submit a Storm topology to Flink, it is sufficient to replace the
+used Storm classes with their Flink replacements in the Storm *client code that
+assembles* the topology. The actual runtime code, ie, Spouts and Bolts, can be
+used *unmodified*. If a topology is executed in a remote cluster, parameters
+`nimbus.host` and `nimbus.thrift.port` are used as `jobmanger.rpc.address` and
+`jobmanger.rpc.port`, respectively. If a parameter is not specified, the value
+is taken from `flink-conf.yaml`.

--- End diff --

I like the single-line format because it make reviewing easier. If you have fixed line length, changing a small thing can lead for reformatting of a whole paragraph due to new line breaks etc. This make spotting the actual change quite hard. (I would appreciate it, if we could keep the current formatting -- but if is not "an issue" in the strong sense).

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45727099
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -150,7 +153,7 @@ static synchronized TopologyContext createTopologyContext(
     			}
     			stormTopology = new StormTopology(spouts, bolts, new HashMap<String, StateSpoutSpec>());
     
    -			taskId = context.getIndexOfThisSubtask();
    +			taskId = context.getIndexOfThisSubtask() + 1;
     
    --- End diff --
    
    Why `+1` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45966621
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkOutputFieldsDeclarer.java ---
    @@ -20,11 +20,9 @@
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Fields;
     import backtype.storm.utils.Utils;
    -
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964023
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/SetupOutputFieldsDeclarer.java ---
    @@ -17,12 +17,12 @@
     
     package org.apache.flink.storm.wrappers;
     
    -import java.util.HashMap;
    -
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728317
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    --- End diff --
    
    Asserts are not enabled by default. I think it's important to always check this condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-161304907
  
    Thanks for your feedback!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726600
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    +		Preconditions.checkNotNull(grouping);
    +		Preconditions.checkNotNull(producer);
    +
    +		final String producerId = streamId.get_componentId();
    +		final String inputStreamId = streamId.get_streamId();
    +
    +		DataStream<Tuple> inputStream = producer.get(inputStreamId);
    +
    +		final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +		declarers.put(boltId, declarer);
    +		userBolt.declareOutputFields(declarer);
    +		this.outputStreams.put(boltId, declarer.outputStreams);
    +
    +		// if producer was processed already
    +		if (grouping.is_set_shuffle()) {
    +			// Storm uses a round-robin shuffle strategy
    +			inputStream = inputStream.rebalance();
    +		} else if (grouping.is_set_fields()) {
    +			// global grouping is emulated in Storm via an empty fields grouping list
    +			final List<String> fields = grouping.get_fields();
    +			if (fields.size() > 0) {
    +				FlinkOutputFieldsDeclarer prodDeclarer = this.declarers.get(producerId);
    +				inputStream = inputStream.keyBy(prodDeclarer
    +						.getGroupingFieldIndexes(inputStreamId,
    +								grouping.get_fields()));
    +			} else {
    +				inputStream = inputStream.global();
    +			}
    +		} else if (grouping.is_set_all()) {
    +			inputStream = inputStream.broadcast();
    +		} else if (!grouping.is_set_local_or_shuffle()) {
    +			throw new UnsupportedOperationException(
    +					"Flink only supports (local-or-)shuffle, fields, all, and global grouping");
    +		}
    +
    +		return inputStream;
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt, GlobalStreamId streamId, DataStream<Tuple> inputStream) {
    +		return createOutput(boltId, bolt, streamId, inputStream, null, null);
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt,
    +														GlobalStreamId streamId, DataStream<Tuple> inputStream,
    +														GlobalStreamId streamId2, DataStream<Tuple> inputStream2) {
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    --- End diff --
    
    `assert`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45888894
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    --- End diff --
    
    add JavaDoc link to `ExecutionEnvironment`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46279370
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/exclamation/ExclamationWithBolt.java ---
    @@ -72,7 +71,7 @@ public static void main(final String[] args) throws Exception {
     				.transform("StormBoltTokenizer",
     						TypeExtractor.getForObject(""),
     						new BoltWrapper<String, String>(new ExclamationBolt(),
    -								new String[] { Utils.DEFAULT_STREAM_ID }))
    +								"stream", "component", new String[] { Utils.DEFAULT_STREAM_ID }))
    --- End diff --
    
    Can do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45889142
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    --- End diff --
    
    Can we describe the exception in a meaningful way? Or are there too many reasons to get listed here? What does JavaDoc of `env.execute()` say?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46033770
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/BoltTokenizerWordCountPojo.java ---
    @@ -71,7 +70,7 @@ public static void main(final String[] args) throws Exception {
     				// this is done by a bolt that is wrapped accordingly
     				.transform("BoltTokenizerPojo",
     						TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
    -						new BoltWrapper<Sentence, Tuple2<String, Integer>>(new BoltTokenizerByName()))
    +						new BoltWrapper<Sentence, Tuple2<String, Integer>>(new BoltTokenizerByName(), "stream", "component"))
    --- End diff --
    
    Again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45891225
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +45,30 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String producerStreamId;
    +	private final MessageId id;
    +	private final String producerComponentId;
    +
    +
    +	/**
    +	 * Constructor which sets defaults for producerComponentId, taskId, and componentID
    +	 * @param flinkTuple the Flink tuple
    +	 * @param schema The schema of the storm fields
    +	 */
    +	StormTuple(final IN flinkTuple, final Fields schema) {
    +		this(flinkTuple, schema, -1, "testStream", "componentID");
    +	}
    +
     	/**
     	 * Create a new Storm tuple from the given Flink tuple. The provided {@code nameIndexMap} is ignored for raw input
     	 * types.
    -	 * 
    -	 * @param flinkTuple
    +	 *  @param flinkTuple
     	 * 		The Flink tuple to be converted.
     	 * @param schema
    -	 * 		The schema (ie, ordered field names) of the tuple.
    +	 * @param producerComponentId
     	 */
    --- End diff --
    
    formatting; incomplete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46033750
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/BoltTokenizerWordCount.java ---
    @@ -64,7 +63,7 @@ public static void main(final String[] args) throws Exception {
     				// this is done by a bolt that is wrapped accordingly
     				.transform("BoltTokenizer",
     						TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
    -						new BoltWrapper<String, Tuple2<String, Integer>>(new BoltTokenizer()))
    +						new BoltWrapper<String, Tuple2<String, Integer>>(new BoltTokenizer(), "stream", "component"))
    --- End diff --
    
    Same question here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46034951
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -193,24 +189,22 @@ public void testCreateTopologyContext() {
     			Utils.sleep(++counter * 10000);
     			cluster.shutdown();
     
    -			if (TestSink.result.size() == 8) {
    +			if (TestSink.result.size() >= 4) {
    --- End diff --
    
    Why `>=` and not `==` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45725506
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -18,20 +18,23 @@
     package org.apache.flink.storm.util;
     
     import backtype.storm.task.TopologyContext;
    +import org.apache.flink.core.fs.FSDataOutputStream;
    +import org.apache.flink.core.fs.FileSystem;
    +import org.apache.flink.core.fs.Path;
     
     import java.io.BufferedWriter;
    -import java.io.FileWriter;
     import java.io.IOException;
    +import java.io.OutputStreamWriter;
     import java.util.Map;
     
     /**
    - * Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
    + * Implements a sink that writes the received data to the given file (as a result of {@code Object.toString()} for each
      * attribute).
      */
     public final class BoltFileSink extends AbstractBoltSink {
     	private static final long serialVersionUID = 2014027288631273666L;
     
    -	private final String path;
    +	private final Path path;
     	private BufferedWriter writer;
    --- End diff --
    
    Please do not use `Path` -- **all** spouts/bolts should be written the "Storm" way and not include dependencies to Flink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46033582
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/exclamation/ExclamationWithBolt.java ---
    @@ -72,7 +71,7 @@ public static void main(final String[] args) throws Exception {
     				.transform("StormBoltTokenizer",
     						TypeExtractor.getForObject(""),
     						new BoltWrapper<String, String>(new ExclamationBolt(),
    -								new String[] { Utils.DEFAULT_STREAM_ID }))
    +								"stream", "component", new String[] { Utils.DEFAULT_STREAM_ID }))
    --- End diff --
    
    Can't we use default values for streamID and componentID here, ie, obit both parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45886799
  
    --- Diff: flink-contrib/flink-storm-examples/src/test/java/org/apache/flink/storm/split/SplitBolt.java ---
    @@ -17,8 +17,6 @@
      */
     package org.apache.flink.storm.split;
     
    -import java.util.Map;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964876
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestSink.java ---
    @@ -16,16 +16,16 @@
      */
     package org.apache.flink.storm.util;
     
    -import java.util.LinkedList;
    -import java.util.List;
    -import java.util.Map;
    -
     import backtype.storm.task.OutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichBolt;
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Tuple;
     
    +import java.util.LinkedList;
    +import java.util.List;
    +import java.util.Map;
    +
     public class TestSink implements IRichBolt {
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45894573
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -613,29 +611,29 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     		return new StormTuple(tuple, schema);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    +	@Test
     	public void testGetSourceGlobalStreamid() {
    -		new StormTuple<Object>(null, null).getSourceGlobalStreamid();
    +		Assert.assertNotNull(new StormTuple<Object>(null, null).getSourceGlobalStreamid());
    --- End diff --
    
    Can we improve on all this tests? Just a check for "not-null" seems a  little limited to me. We should rather check for defaults values and also check the full constructor case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726840
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -75,11 +78,13 @@
     	 * 
     	 * @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param componentId
    +	 * @param streamId
     	 * @throws IllegalArgumentException
     	 *             If the number of declared output attributes is not with range [0;25].
     	 */
    -	public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
    -		this(bolt, null, (Collection<String>) null);
    +	public BoltWrapper(final IRichBolt bolt, String componentId, String streamId) throws IllegalArgumentException {
    +		this(bolt, componentId, streamId, null, (Collection<String>) null);
    --- End diff --
    
    This makes the interface quite hard to use and should not be necessary. The `TopologyContext` created in `open()` contains this information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728366
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,6 +45,21 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String streamId;
    +	private final MessageId id;
    +	private final String componentId;
    +
    --- End diff --
    
    This is fixed in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45725806
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FileSpout.java ---
    @@ -38,6 +38,8 @@
     	protected String path = null;
     	protected BufferedReader reader;
     
    +	protected boolean finished;
    +
     	public FileSpout() {}
    --- End diff --
    
    `FileSpout` should be implemented the Storm way -- to set the "finished"
    flag here does not make sense from a Storm point of view (there is no
    such thing as a finite spout)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726196
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java ---
    @@ -48,12 +49,10 @@
     	private static final Logger LOG = LoggerFactory.getLogger(FlinkLocalCluster.class);
     
     	/** The flink mini cluster on which to execute the programs */
    -	private final FlinkMiniCluster flink;
    +	private FlinkMiniCluster flink;
     
     
     	public FlinkLocalCluster() {
    -		this.flink = new LocalFlinkMiniCluster(new Configuration(), true, StreamingMode.STREAMING);
    -		this.flink.start();
     	}
    --- End diff --
    
    What is the advantage by removing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45887487
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java ---
    @@ -99,6 +111,7 @@ public void rebalance(final String name, final RebalanceOptions options) {
     
     	public void shutdown() {
     		flink.stop();
    +		flink = null;
    --- End diff --
    
    Should be kept. Otherwise, calling `submitTopologyWithOpts` a second time will run into a NPE. (Or add proper exception as `else` of `if (flink == null)` check, ie, "Cannot run topology. Cluster got shut down." or similar.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46277213
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/BoltWrapperTest.java ---
    @@ -265,12 +264,12 @@ public void testOpen() throws Exception {
     	@Test
     	public void testOpenSink() throws Exception {
     		final IRichBolt bolt = mock(IRichBolt.class);
    -		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt);
    +		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt, "stream", "component");
     		
     		wrapper.setup(createMockStreamTask(), new StreamConfig(new Configuration()), mock(Output.class));
     		wrapper.open();
     		
    -		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), any(OutputCollector.class));
    +		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), isNotNull(OutputCollector.class));
    --- End diff --
    
    `any` matches also null (literally anything) but here I want to explicitly check `isNotNull`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46284023
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -89,13 +99,9 @@ public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
     	 * used within a Flink streaming program. The given input schema enable attribute-by-name access for input types
     	 * {@link Tuple0} to {@link Tuple25}. The output type will be one of {@link Tuple0} to {@link Tuple25} depending on
     	 * the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    -	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param bolt The Storm {@link IRichBolt bolt} to be used.
     	 * @param inputSchema
    -	 *            The schema (ie, ordered field names) of the input stream.
    -	 * @throws IllegalArgumentException
    --- End diff --
    
    Please keep `@throws`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45965523
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/FlinkTopologyContext.java ---
    @@ -27,13 +27,12 @@
     import backtype.storm.state.ISubscribedState;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.tuple.Fields;
    +import clojure.lang.Atom;
     
    --- End diff --
    
    I see. DataArtians committer can do any change, but external committers get bullied if they apply similar changes... It is not against you or the change itself -- it unifies the style which does make sense. But I got bullied multiple times in other PRs when I did similar stuff...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728889
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FileSpout.java ---
    @@ -38,6 +38,8 @@
     	protected String path = null;
     	protected BufferedReader reader;
     
    +	protected boolean finished;
    +
     	public FileSpout() {}
    --- End diff --
    
    But this doesn't alter the behavior of FileSpout. It is still infinite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45885626
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/split/operators/VerifyAndEnrichBolt.java ---
    @@ -17,8 +17,6 @@
      */
     package org.apache.flink.storm.split.operators;
     
    -import java.util.Map;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726093
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
    --- End diff --
    
    Why this change? Just want to understand?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45887969
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkOutputFieldsDeclarer.java ---
    @@ -20,11 +20,9 @@
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Fields;
     import backtype.storm.utils.Utils;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45758926
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Don't understand me wrong. I don't want to discard your work! And I believe that you did not intent do get a "messy" PR. But that's the current state.
    
    I think we can refine and merge it. But it does not resolve FLINK-2837 even if it improves on it. I would also assume, that your union code will be reworked heavily later on... Not sure about your tuple meta information code. Need to have a look in detail. That is the reason why I had the idea to apply the discussed API changes only in a single PR. But if this is too complex, we should just carry on with this PR.
    
    Btw: even if the JIRA is quite old it is not assigned to you; thus you should have ask about it. You did the same with FLINK-2837 which was assigned to me, too -- I did not work in it yet so a assigned it to you (I thought as you did have the union code together with the API changes, that should be fine).
    
    Additionally, the reason I just assigned it to you was, that FLINK-2837 is actually a requirement for FLINK-2721. That is why I stopped working on it back than, but did not have time to fix FLINK-2837 either. I did not assume that you tackle the join-case which does require the tuple meta info... A regular union does not require it.
    
    Anyway. Just let us get this PR done. :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728523
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Because only number of inputs <= 2 are supported by the translator :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45742093
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Well. From a Storm point of view, there is only `union`. As it is the generic Storm case it includes the join case. I guess you specialized join solution would be obsolete after generic union is supported. Therefore, I would prefer to get it right from the beginning on... My idea would be to try to get rid of `TwoInputBoltWrapper` and "union" the incoming streams somehow to feed a single stream to `BoltWrapper`. The tricky part is, that we cannot use Flink's `union` because it assume the same input type, but Storm can union different types into one stream... What do you think about this idea? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45886597
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/operators/WordCountDataPojos.java ---
    @@ -17,10 +17,10 @@
     
     package org.apache.flink.storm.wordcount.operators;
     
    -import java.io.Serializable;
    -
     import org.apache.flink.examples.java.wordcount.util.WordCountData;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45889924
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -77,11 +80,13 @@
     	 * 
     	 * @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param inputStreamId
    +	 * @param inputComponentId
    --- End diff --
    
    JavaDoc incomplete -- same below -- will not mark it again. Please complete everywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731988
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -75,11 +78,13 @@
     	 * 
     	 * @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param componentId
    +	 * @param streamId
     	 * @throws IllegalArgumentException
     	 *             If the number of declared output attributes is not with range [0;25].
     	 */
    -	public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
    -		this(bolt, null, (Collection<String>) null);
    +	public BoltWrapper(final IRichBolt bolt, String componentId, String streamId) throws IllegalArgumentException {
    +		this(bolt, componentId, streamId, null, (Collection<String>) null);
    --- End diff --
    
    `TopologyContext` contains this information, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728449
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    --- End diff --
    
    We should try to fix this... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45740805
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/join/SingleJoinExample.java ---
    @@ -0,0 +1,86 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.join;
    +
    +import backtype.storm.Config;
    +import backtype.storm.testing.FeederSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import backtype.storm.tuple.Values;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import org.apache.flink.storm.util.BoltFileSink;
    +import org.apache.flink.storm.util.TupleOutputFormatter;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.bolt.SingleJoinBolt;
    +
    +
    +public class SingleJoinExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		final FeederSpout genderSpout = new FeederSpout(new Fields("id", "gender"));
    +		final FeederSpout ageSpout = new FeederSpout(new Fields("id", "age"));
    +
    --- End diff --
    
    Yes. :) For example, 2 fields for one input and 3 fields for the other input. (The idea is to have different number for both inputs to make sure the input schemas of both can differ not only by their types and also by their number of attributes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46033705
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/split/SpoutSplitExample.java ---
    @@ -70,7 +70,7 @@ public static void main(final String[] args) throws Exception {
     		oddStream.transform("oddBolt",
     				TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
     				new BoltWrapper<SplitStreamType<Integer>, Tuple2<String, Integer>>(
    -						new VerifyAndEnrichBolt(false)))
    +						new VerifyAndEnrichBolt(false), "stream", "component"))
    --- End diff --
    
    Default values for both? Or use `ODD_STREAM` for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45734220
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FileSpout.java ---
    @@ -38,6 +38,8 @@
     	protected String path = null;
     	protected BufferedReader reader;
     
    +	protected boolean finished;
    +
     	public FileSpout() {}
    --- End diff --
    
    Okay, I'll revert although `FiniteSpout` is Flink specific as well...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728322
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    +		Preconditions.checkNotNull(grouping);
    +		Preconditions.checkNotNull(producer);
    +
    +		final String producerId = streamId.get_componentId();
    +		final String inputStreamId = streamId.get_streamId();
    +
    +		DataStream<Tuple> inputStream = producer.get(inputStreamId);
    +
    +		final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +		declarers.put(boltId, declarer);
    +		userBolt.declareOutputFields(declarer);
    +		this.outputStreams.put(boltId, declarer.outputStreams);
    +
    +		// if producer was processed already
    +		if (grouping.is_set_shuffle()) {
    +			// Storm uses a round-robin shuffle strategy
    +			inputStream = inputStream.rebalance();
    +		} else if (grouping.is_set_fields()) {
    +			// global grouping is emulated in Storm via an empty fields grouping list
    +			final List<String> fields = grouping.get_fields();
    +			if (fields.size() > 0) {
    +				FlinkOutputFieldsDeclarer prodDeclarer = this.declarers.get(producerId);
    +				inputStream = inputStream.keyBy(prodDeclarer
    +						.getGroupingFieldIndexes(inputStreamId,
    +								grouping.get_fields()));
    +			} else {
    +				inputStream = inputStream.global();
    +			}
    +		} else if (grouping.is_set_all()) {
    +			inputStream = inputStream.broadcast();
    +		} else if (!grouping.is_set_local_or_shuffle()) {
    +			throw new UnsupportedOperationException(
    +					"Flink only supports (local-or-)shuffle, fields, all, and global grouping");
    +		}
    +
    +		return inputStream;
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt, GlobalStreamId streamId, DataStream<Tuple> inputStream) {
    +		return createOutput(boltId, bolt, streamId, inputStream, null, null);
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt,
    +														GlobalStreamId streamId, DataStream<Tuple> inputStream,
    +														GlobalStreamId streamId2, DataStream<Tuple> inputStream2) {
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    --- End diff --
    
    Asserts are not enabled by default. I think it's important to always check this condition.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45963921
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/FlinkTopologyContext.java ---
    @@ -27,13 +27,12 @@
     import backtype.storm.state.ISubscribedState;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.tuple.Fields;
    +import clojure.lang.Atom;
     
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46035118
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/BoltWrapperTest.java ---
    @@ -265,12 +264,12 @@ public void testOpen() throws Exception {
     	@Test
     	public void testOpenSink() throws Exception {
     		final IRichBolt bolt = mock(IRichBolt.class);
    -		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt);
    +		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt, "stream", "component");
     		
     		wrapper.setup(createMockStreamTask(), new StreamConfig(new Configuration()), mock(Output.class));
     		wrapper.open();
     		
    -		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), any(OutputCollector.class));
    +		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), isNotNull(OutputCollector.class));
    --- End diff --
    
    Just out of curiosity: Why `isNotNull` instead of `any`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731878
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    --- End diff --
    
    It's a matter of philosophy I guess. I think it is good enough to test via `assert` (which are enabled in the test). No external user can call this method, so checking if the calling code is correct in the test is sufficient to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893951
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/SpoutWrapperTest.java ---
    @@ -21,7 +21,6 @@
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichSpout;
     import backtype.storm.tuple.Fields;
    -
     import org.apache.flink.api.common.ExecutionConfig;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893834
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestDummyBolt.java ---
    @@ -26,6 +24,8 @@
     import backtype.storm.tuple.Tuple;
     import backtype.storm.tuple.Values;
     
    +import java.util.Map;
    +
     public class TestDummyBolt implements IRichBolt {
    --- End diff --
    
    pure reformatiing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45895556
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    --- End diff --
    
    Maybe just put something similar -- even if it's not too useful... I thinks leaving it blank is even worse... (and if we really activate stricter code style checks once -- if this ever happens ;) -- it is already complete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728214
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java ---
    @@ -48,12 +49,10 @@
     	private static final Logger LOG = LoggerFactory.getLogger(FlinkLocalCluster.class);
     
     	/** The flink mini cluster on which to execute the programs */
    -	private final FlinkMiniCluster flink;
    +	private FlinkMiniCluster flink;
     
     
     	public FlinkLocalCluster() {
    -		this.flink = new LocalFlinkMiniCluster(new Configuration(), true, StreamingMode.STREAMING);
    -		this.flink.start();
     	}
    --- End diff --
    
    Lazy startup of the flink cluster. This enables us to set the right number of task slots for the job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890996
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/FlinkTopologyContext.java ---
    @@ -27,13 +27,12 @@
     import backtype.storm.state.ISubscribedState;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.tuple.Fields;
    +import clojure.lang.Atom;
     
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45885529
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/print/PrintSampleStream.java ---
    @@ -0,0 +1,61 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.storm.print;
    +
    +import backtype.storm.Config;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.spout.TwitterSampleSpout;
    +
    +import java.util.Arrays;
    +
    +/**
    + * Prints incoming tweets. Tweets can be filtered by keywords.
    + */
    +public class PrintSampleStream {        
    +	public static void main(String[] args) throws Exception {
    --- End diff --
    
    What is the purpose of this example? Does it show anything special?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728165
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
    --- End diff --
    
    Because FlinkTopology is not subclassed from StreamExecutionEnvironment anymore. The class has been removed and FlinkTopologyBuilder is now FlinkTopology.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728576
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -595,7 +593,7 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     	private <T> StormTuple testGetByField(int arity, int index, T value)
     			throws Exception {
     
    -		assert (index < arity);
    +		Assert.assertTrue(index < arity);
     
    --- End diff --
    
    Hmm, it's part of the test, no? Doesn't really matter but in general I thought usage of assert is discouraged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45971476
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +45,30 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String producerStreamId;
    +	private final MessageId id;
    +	private final String producerComponentId;
    +
    +
    +	/**
    +	 * Constructor which sets defaults for producerComponentId, taskId, and componentID
    +	 * @param flinkTuple the Flink tuple
    +	 * @param schema The schema of the storm fields
    +	 */
    +	StormTuple(final IN flinkTuple, final Fields schema) {
    +		this(flinkTuple, schema, -1, "testStream", "componentID");
    +	}
    --- End diff --
    
    Fair enough, I use default id constants now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45889473
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltCollector.java ---
    @@ -19,7 +19,6 @@
     
     import backtype.storm.task.IOutputCollector;
     import backtype.storm.tuple.Tuple;
    -
     import org.apache.flink.api.java.tuple.Tuple0;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45734323
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -40,16 +43,17 @@ public BoltFileSink(final String path) {
     
     	public BoltFileSink(final String path, final OutputFormatter formatter) {
     		super(formatter);
    -		this.path = path;
    +		this.path = new Path(path);
     	}
     
     	@SuppressWarnings("rawtypes")
     	@Override
     	public void prepareSimple(final Map stormConf, final TopologyContext context) {
     		try {
    -			this.writer = new BufferedWriter(new FileWriter(this.path));
    +			FSDataOutputStream outputStream = FileSystem.getLocalFileSystem().create(path, false);
    +			this.writer = new BufferedWriter(new OutputStreamWriter(outputStream));
     		} catch (final IOException e) {
    --- End diff --
    
    That's odd but if so I'd change this to strip the file:// prefix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733403
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -187,14 +190,15 @@ static synchronized TopologyContext createTopologyContext(
     				}
     			}
     			for (Entry<String, StateSpoutSpec> stateSpout : stateSpouts.entrySet()) {
    -				Integer rc = taskId = processSingleOperator(stateSpout.getKey(), stateSpout
    +				Integer rc = processSingleOperator(stateSpout.getKey(), stateSpout
     						.getValue().get_common(), operatorName, context.getIndexOfThisSubtask(),
     						dop, taskToComponents, componentToSortedTasks, componentToStreamToFields);
     				if (rc != null) {
     					taskId = rc;
     				}
     			}
    -			assert (taskId != null);
    +
    +			Preconditions.checkNotNull("Task ID may not be null!", taskId);
     		}
    --- End diff --
    
    Not "reliable enough" because they are disabled by default :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45896492
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
    --- End diff --
    
    Sorry for being not consistent by myself -- I always try hard (but not always with success). The name `FlinkTopology` is kind of hybrid as it brides between both systems. I would claim that a `FlinkTopology` is a Flink streaming program that was derived from a Storm topology. And as a stated above: **keep it as is also fine**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733970
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -595,7 +593,7 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     	private <T> StormTuple testGetByField(int arity, int index, T value)
     			throws Exception {
     
    -		assert (index < arity);
    +		Assert.assertTrue(index < arity);
     
    --- End diff --
    
    But not in testing scenario (ie, while executing JUnit and ITCase asserts are enabled). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893853
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestDummySpout.java ---
    @@ -26,6 +24,8 @@
     import backtype.storm.tuple.Values;
     import backtype.storm.utils.Utils;
     
    +import java.util.Map;
    +
     public class TestDummySpout implements IRichSpout {
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45886536
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/SpoutSourceWordCount.java ---
    @@ -19,7 +19,6 @@
     
     import backtype.storm.topology.IRichSpout;
     import backtype.storm.utils.Utils;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728168
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
     
    -		final StreamGraph streamGraph = topology.getStreamGraph();
    +		final StreamGraph streamGraph = topology.getExecutionEnvironment().getStreamGraph();
     		streamGraph.setJobName(name);
    --- End diff --
    
    Because FlinkTopology is not subclassed from StreamExecutionEnvironment anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-160985940
  
    If Travis is green please merge. You can fix the last tiny comments directly before merging. No need to update this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45732428
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -187,14 +190,15 @@ static synchronized TopologyContext createTopologyContext(
     				}
     			}
     			for (Entry<String, StateSpoutSpec> stateSpout : stateSpouts.entrySet()) {
    -				Integer rc = taskId = processSingleOperator(stateSpout.getKey(), stateSpout
    +				Integer rc = processSingleOperator(stateSpout.getKey(), stateSpout
     						.getValue().get_common(), operatorName, context.getIndexOfThisSubtask(),
     						dop, taskToComponents, componentToSortedTasks, componentToStreamToFields);
     				if (rc != null) {
     					taskId = rc;
     				}
     			}
    -			assert (taskId != null);
    +
    +			Preconditions.checkNotNull("Task ID may not be null!", taskId);
     		}
    --- End diff --
    
    Why not readable enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45963933
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +45,30 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String producerStreamId;
    +	private final MessageId id;
    +	private final String producerComponentId;
    +
    +
    +	/**
    +	 * Constructor which sets defaults for producerComponentId, taskId, and componentID
    +	 * @param flinkTuple the Flink tuple
    +	 * @param schema The schema of the storm fields
    +	 */
    +	StormTuple(final IN flinkTuple, final Fields schema) {
    +		this(flinkTuple, schema, -1, "testStream", "componentID");
    +	}
    --- End diff --
    
    The use of null is often problematic. I prefer default values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890399
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -108,20 +112,19 @@ public BoltWrapper(final IRichBolt bolt, final Fields inputSchema)
     	 * for POJO input types. The output type can be any type if parameter {@code rawOutput} is {@code true} and the
     	 * bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will be one
     	 * of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    +	 *  @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param inputStreamId
    +	 * @param inputComponentId
     	 * @param rawOutputs
     	 *            Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
    -	 *            of a raw type.
    -	 * @throws IllegalArgumentException
    --- End diff --
    
    keep `@throws`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893875
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestSink.java ---
    @@ -16,16 +16,16 @@
      */
     package org.apache.flink.storm.util;
     
    -import java.util.LinkedList;
    -import java.util.List;
    -import java.util.Map;
    -
     import backtype.storm.task.OutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichBolt;
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Tuple;
     
    +import java.util.LinkedList;
    +import java.util.List;
    +import java.util.Map;
    +
     public class TestSink implements IRichBolt {
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45963943
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/SpoutWrapper.java ---
    @@ -33,7 +30,8 @@
     import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
     import org.apache.flink.streaming.api.operators.StreamingRuntimeContext;
     
    -import com.google.common.collect.Sets;
    +import java.util.Collection;
    +import java.util.HashMap;
     
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731348
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
    --- End diff --
    
    I see. Makes sense now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45743505
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Well. Get it right from the beginning. I think it was all but right until now :) And in this regard, its much more defined now. At least you get an error if you have more than two inputs.
    
    I agree that we should fix this. But it's going to be a bit tricky because we have to hack around Flink's limitation. I would rather not do this in this pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890812
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapperTwoInput.java ---
    @@ -0,0 +1,130 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.wrappers;
    +
    +import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.api.java.tuple.Tuple0;
    +import org.apache.flink.api.java.tuple.Tuple1;
    +import org.apache.flink.api.java.tuple.Tuple25;
    +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator;
    +import org.apache.flink.streaming.api.watermark.Watermark;
    +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
    +
    +import java.util.Collection;
    +
    +/**
    + * A {@link BoltWrapperTwoInput} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming
    + * program. In contrast to {@link BoltWrapper}, this wrapper takes two input stream as input.
    + */
    +public class BoltWrapperTwoInput<IN1, IN2, OUT> extends BoltWrapper<IN1, OUT> implements TwoInputStreamOperator<IN1, IN2, OUT> {
    +
    +	/** The schema (ie, ordered field names) of the second input stream. */
    +	private final Fields inputSchema2;
    +
    +	private final String componentId2;
    +	private final String streamId2;
    +
    --- End diff --
    
    missing JavaDoc for both members


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893736
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkTopologyTest.java ---
    @@ -14,50 +14,70 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    -import org.apache.flink.storm.api.FlinkTopology;
    -import org.junit.Assert;
    +
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.storm.util.TestDummyBolt;
    +import org.apache.flink.storm.util.TestDummySpout;
    +import org.apache.flink.storm.util.TestSink;
    +import org.junit.Ignore;
     import org.junit.Test;
     
     public class FlinkTopologyTest {
     
    -	@Test
    -	public void testDefaultParallelism() {
    -		final FlinkTopology topology = new FlinkTopology();
    -		Assert.assertEquals(1, topology.getParallelism());
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowSpout() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecute() throws Exception {
    -		new FlinkTopology().execute();
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowBolt() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt1", new TestBolt()).shuffleGrouping("spout");
    +		builder.setBolt("bolt2", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecuteWithName() throws Exception {
    -		new FlinkTopology().execute(null);
    +	@Test(expected = RuntimeException.class)
    +	public void testUndeclaredStream() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("spout");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
     	@Test
    -	public void testNumberOfTasks() {
    -		final FlinkTopology topology = new FlinkTopology();
    +	@Ignore
    --- End diff --
    
    Please enable this test. I forgot to do this in my last commit which fixes this issue...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726133
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
     
    -		final StreamGraph streamGraph = topology.getStreamGraph();
    +		final StreamGraph streamGraph = topology.getExecutionEnvironment().getStreamGraph();
     		streamGraph.setJobName(name);
    --- End diff --
    
    Same here? Why changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45725539
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -40,16 +43,17 @@ public BoltFileSink(final String path) {
     
     	public BoltFileSink(final String path, final OutputFormatter formatter) {
     		super(formatter);
    -		this.path = path;
    +		this.path = new Path(path);
     	}
     
     	@SuppressWarnings("rawtypes")
     	@Override
     	public void prepareSimple(final Map stormConf, final TopologyContext context) {
     		try {
    -			this.writer = new BufferedWriter(new FileWriter(this.path));
    +			FSDataOutputStream outputStream = FileSystem.getLocalFileSystem().create(path, false);
    +			this.writer = new BufferedWriter(new OutputStreamWriter(outputStream));
     		} catch (final IOException e) {
    --- End diff --
    
    Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45730899
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -40,16 +43,17 @@ public BoltFileSink(final String path) {
     
     	public BoltFileSink(final String path, final OutputFormatter formatter) {
     		super(formatter);
    -		this.path = path;
    +		this.path = new Path(path);
     	}
     
     	@SuppressWarnings("rawtypes")
     	@Override
     	public void prepareSimple(final Map stormConf, final TopologyContext context) {
     		try {
    -			this.writer = new BufferedWriter(new FileWriter(this.path));
    +			FSDataOutputStream outputStream = FileSystem.getLocalFileSystem().create(path, false);
    +			this.writer = new BufferedWriter(new OutputStreamWriter(outputStream));
     		} catch (final IOException e) {
    --- End diff --
    
    In my examples (ITCases) I strip the `file://" prefix before given the path to the spout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

https://github.com/apache/flink/pull/1398#discussion_r45720083

--- Diff: docs/apis/storm_compatibility.md ---
@@ -57,20 +57,22 @@ See *WordCount Storm* within `flink-storm-examples/pom.xml` for an example how t

Flink provides a Storm compatible API (`org.apache.flink.storm.api`) that offers replacements for the following classes:

--- End diff --

Is there any change here? Looks like reformatting? I used a "single-line per sentence" formatting to make reviewing changes simples. Please stick to this formatting.

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45732371
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -150,7 +153,7 @@ static synchronized TopologyContext createTopologyContext(
     			}
     			stormTopology = new StormTopology(spouts, bolts, new HashMap<String, StateSpoutSpec>());
     
    -			taskId = context.getIndexOfThisSubtask();
    +			taskId = context.getIndexOfThisSubtask() + 1;
     
    --- End diff --
    
    Are you sure about this? I doubt it (but not 100% sure)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45727477
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Why did you change this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/1398#discussion_r45722053

--- Diff: docs/apis/storm_compatibility.md ---
@@ -57,20 +57,22 @@ See *WordCount Storm* within `flink-storm-examples/pom.xml` for an example how t

Flink provides a Storm compatible API (`org.apache.flink.storm.api`) that offers replacements for the following classes:

--- End diff --

Minor typo "can be uses" -> "can be used". I think fixed line length is the best because sentences can span multiple lines. If it is an issue I can revert it.

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893806
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/TestSpout.java ---
    @@ -16,13 +16,13 @@
      */
     package org.apache.flink.storm.api;
     
    -import java.util.Map;
    -
     import backtype.storm.spout.SpoutOutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichSpout;
     import backtype.storm.topology.OutputFieldsDeclarer;
     
    +import java.util.Map;
    +
     public class TestSpout implements IRichSpout {
     	private static final long serialVersionUID = -4884029383198924007L;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45747566
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    I was actually not specifically trying to address JIRA issues but just fixed everything I discovered on the way while trying out the compatibility layer. Only after fixing I realized there are open JIRA issues. One is assigned to me (FLINK-2837] and the other one (FLINK-2721) is open since two months. I think it would be a shame not to merge this pull request soon. It provides a good foundation to address any further issues. Splitting this PR should not be trivial with all the changes.
    
    I already accommodated you with the API changes. Also, I would like to address most of your comments but I'm not too inclined to split up this PR (if it is even possible). Could you base your work on this pull request and do a follow-up? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45732952
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -595,7 +593,7 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     	private <T> StormTuple testGetByField(int arity, int index, T value)
     			throws Exception {
     
    -		assert (index < arity);
    +		Assert.assertTrue(index < arity);
     
    --- End diff --
    
    Is it? I am not aware if this. Why? Using `assert` to check internal code invariants is the best way to go IMHO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-161412796
  
    Thanks for you patience. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46278994
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -193,24 +189,22 @@ public void testCreateTopologyContext() {
     			Utils.sleep(++counter * 10000);
     			cluster.shutdown();
     
    -			if (TestSink.result.size() == 8) {
    +			if (TestSink.result.size() >= 4) {
    --- End diff --
    
    The Storm executor sometimes returned more results for me. I've adjusted it to a fixed size again. I think the important thing here is that we check all the returned results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731894
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    +		Preconditions.checkNotNull(grouping);
    +		Preconditions.checkNotNull(producer);
    +
    +		final String producerId = streamId.get_componentId();
    +		final String inputStreamId = streamId.get_streamId();
    +
    +		DataStream<Tuple> inputStream = producer.get(inputStreamId);
    +
    +		final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +		declarers.put(boltId, declarer);
    +		userBolt.declareOutputFields(declarer);
    +		this.outputStreams.put(boltId, declarer.outputStreams);
    +
    +		// if producer was processed already
    +		if (grouping.is_set_shuffle()) {
    +			// Storm uses a round-robin shuffle strategy
    +			inputStream = inputStream.rebalance();
    +		} else if (grouping.is_set_fields()) {
    +			// global grouping is emulated in Storm via an empty fields grouping list
    +			final List<String> fields = grouping.get_fields();
    +			if (fields.size() > 0) {
    +				FlinkOutputFieldsDeclarer prodDeclarer = this.declarers.get(producerId);
    +				inputStream = inputStream.keyBy(prodDeclarer
    +						.getGroupingFieldIndexes(inputStreamId,
    +								grouping.get_fields()));
    +			} else {
    +				inputStream = inputStream.global();
    +			}
    +		} else if (grouping.is_set_all()) {
    +			inputStream = inputStream.broadcast();
    +		} else if (!grouping.is_set_local_or_shuffle()) {
    +			throw new UnsupportedOperationException(
    +					"Flink only supports (local-or-)shuffle, fields, all, and global grouping");
    +		}
    +
    +		return inputStream;
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt, GlobalStreamId streamId, DataStream<Tuple> inputStream) {
    +		return createOutput(boltId, bolt, streamId, inputStream, null, null);
    +	}
    +
    +	private SingleOutputStreamOperator<?, ?> createOutput(String boltId, IRichBolt bolt,
    +														GlobalStreamId streamId, DataStream<Tuple> inputStream,
    +														GlobalStreamId streamId2, DataStream<Tuple> inputStream2) {
    +		Preconditions.checkNotNull(boltId);
    +		Preconditions.checkNotNull(streamId);
    --- End diff --
    
    See above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45891117
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/SpoutWrapper.java ---
    @@ -33,7 +30,8 @@
     import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
     import org.apache.flink.streaming.api.operators.StreamingRuntimeContext;
     
    -import com.google.common.collect.Sets;
    +import java.util.Collection;
    +import java.util.HashMap;
     
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46033787
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/BoltTokenizerWordCountWithNames.java ---
    @@ -75,7 +74,7 @@ public static void main(final String[] args) throws Exception {
     						"BoltTokenizerWithNames",
     						TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
     						new BoltWrapper<Tuple1<String>, Tuple2<String, Integer>>(
    -								new BoltTokenizerByName(), new Fields("sentence")))
    +								new BoltTokenizerByName(), "stream", "component", new Fields("sentence")))
    --- End diff --
    
    Again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45730807
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -18,20 +18,23 @@
     package org.apache.flink.storm.util;
     
     import backtype.storm.task.TopologyContext;
    +import org.apache.flink.core.fs.FSDataOutputStream;
    +import org.apache.flink.core.fs.FileSystem;
    +import org.apache.flink.core.fs.Path;
     
     import java.io.BufferedWriter;
    -import java.io.FileWriter;
     import java.io.IOException;
    +import java.io.OutputStreamWriter;
     import java.util.Map;
     
     /**
    - * Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
    + * Implements a sink that writes the received data to the given file (as a result of {@code Object.toString()} for each
      * attribute).
      */
     public final class BoltFileSink extends AbstractBoltSink {
     	private static final long serialVersionUID = 2014027288631273666L;
     
    -	private final String path;
    +	private final Path path;
     	private BufferedWriter writer;
    --- End diff --
    
    That's not the point. If I re-use an existing bolt, I don't want to change anything. And I want to be able (from an example point of view) to run this bolt as-is in Storm, too. (Without any dependencies to Flink)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733449
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -595,7 +593,7 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     	private <T> StormTuple testGetByField(int arity, int index, T value)
     			throws Exception {
     
    -		assert (index < arity);
    +		Assert.assertTrue(index < arity);
     
    --- End diff --
    
    Because they are disabled during run time. This makes bugs e.g. in stack traces harder to find.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45727399
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/StormTupleTest.java ---
    @@ -595,7 +593,7 @@ public void testGetBinaryByFieldPojoGetter() throws Exception {
     	private <T> StormTuple testGetByField(int arity, int index, T value)
     			throws Exception {
     
    -		assert (index < arity);
    +		Assert.assertTrue(index < arity);
     
    --- End diff --
    
    This is not a assertion for a test result; thus usage of `assert` is correct (checks if test is written properly).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964832
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/TestSpout.java ---
    @@ -16,13 +16,13 @@
      */
     package org.apache.flink.storm.api;
     
    -import java.util.Map;
    -
     import backtype.storm.spout.SpoutOutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichSpout;
     import backtype.storm.topology.OutputFieldsDeclarer;
     
    +import java.util.Map;
    +
     public class TestSpout implements IRichSpout {
     	private static final long serialVersionUID = -4884029383198924007L;
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45892746
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
    --- End diff --
    
    Let me get this straight. You're saying you don't want to use the name Flink topology but you are the author of the classes called FlinkTopology and FlinkTopologyBuilder (deleted by now). This is still a user-facing documentation and there we wanted to maintain the Storm vocabulary, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-159641971
  
    I've rebased to the latest master and addressed your comments. I would like to merge this and programmatically fix the multiple inputs issue afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45891019
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/SetupOutputFieldsDeclarer.java ---
    @@ -17,12 +17,12 @@
     
     package org.apache.flink.storm.wrappers;
     
    -import java.util.HashMap;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1398#issuecomment-159256830
  
    I will integrate the changes of #1387. It would be great if we merged these changes afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45888777
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
    --- End diff --
    
    typo: Strom with capital letter
    "Flink program" or "Flink job" (not "Flink topology" -- if you keep it as is also fine. -- I personally prefer the use topology only for Storm to avoid confusion in terminology)
    Typo: topology with lower case letter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890269
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -89,17 +94,16 @@ public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
     	 * used within a Flink streaming program. The given input schema enable attribute-by-name access for input types
     	 * {@link Tuple0} to {@link Tuple25}. The output type will be one of {@link Tuple0} to {@link Tuple25} depending on
     	 * the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    +	 *  @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param inputStreamId
    +	 * @param inputComponentId
     	 * @param inputSchema
    -	 *            The schema (ie, ordered field names) of the input stream.
    -	 * @throws IllegalArgumentException
    -	 *             If the number of declared output attributes is not with range [0;25].
    --- End diff --
    
    Why do you delete `@throws` ? More documentation is always better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45732254
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,6 +45,21 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String streamId;
    +	private final MessageId id;
    +	private final String componentId;
    +
    --- End diff --
    
    I have already worked on this and I think my solution is smoother as it compute all this stuff automatically under the hood (without the need that the user specifies the name in the constructor). No PR yet as not finished completely. Need to get in sync about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890041
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -89,17 +94,16 @@ public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
     	 * used within a Flink streaming program. The given input schema enable attribute-by-name access for input types
     	 * {@link Tuple0} to {@link Tuple25}. The output type will be one of {@link Tuple0} to {@link Tuple25} depending on
     	 * the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    +	 *  @param bolt
     	 *            The Storm {@link IRichBolt bolt} to be used.
    --- End diff --
    
    delete one space before `@param`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45971856
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/print/PrintSampleStream.java ---
    @@ -0,0 +1,61 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.storm.print;
    +
    +import backtype.storm.Config;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.spout.TwitterSampleSpout;
    +
    +import java.util.Arrays;
    +
    +/**
    + * Prints incoming tweets. Tweets can be filtered by keywords.
    + */
    +public class PrintSampleStream {        
    +	public static void main(String[] args) throws Exception {
    --- End diff --
    
    The problem was that the `BoltWrapper` wouldn't create a `BoltCollector` if the bolt didn't define any output fields. That led to a NullPointerException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728678
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -18,20 +18,23 @@
     package org.apache.flink.storm.util;
     
     import backtype.storm.task.TopologyContext;
    +import org.apache.flink.core.fs.FSDataOutputStream;
    +import org.apache.flink.core.fs.FileSystem;
    +import org.apache.flink.core.fs.Path;
     
     import java.io.BufferedWriter;
    -import java.io.FileWriter;
     import java.io.IOException;
    +import java.io.OutputStreamWriter;
     import java.util.Map;
     
     /**
    - * Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
    + * Implements a sink that writes the received data to the given file (as a result of {@code Object.toString()} for each
      * attribute).
      */
     public final class BoltFileSink extends AbstractBoltSink {
     	private static final long serialVersionUID = 2014027288631273666L;
     
    -	private final String path;
    +	private final Path path;
     	private BufferedWriter writer;
    --- End diff --
    
    Hmm but they depend on Flink anyways because of the Maven dependency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964853
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestDummyBolt.java ---
    @@ -26,6 +24,8 @@
     import backtype.storm.tuple.Tuple;
     import backtype.storm.tuple.Values;
     
    +import java.util.Map;
    +
     public class TestDummyBolt implements IRichBolt {
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45967336
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/FlinkTopologyContext.java ---
    @@ -27,13 +27,12 @@
     import backtype.storm.state.ISubscribedState;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.tuple.Fields;
    +import clojure.lang.Atom;
     
    --- End diff --
    
    Not sure who is bullying whom :) Look at the classes and you will see that all imports are arranged like this. We want to be consistent, right? According to your suggestion, I changed the other import statements which were just reformatting.
    
    Open source is often about compromises. Very rarely you will find that the code style of a person reflects exactly how you would do it. I'm making compromises and changing things as you like them. That's fine for me. Please don't give me a harder time by blaming my employer. I'm not aware I have done something like this to you. Next time you get blamed for something like this, please contact me and I'll try to help you. I don't think this is the right place to sort out things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45741402
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/join/SingleJoinExample.java ---
    @@ -0,0 +1,86 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.join;
    +
    +import backtype.storm.Config;
    +import backtype.storm.testing.FeederSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import backtype.storm.tuple.Values;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import org.apache.flink.storm.util.BoltFileSink;
    +import org.apache.flink.storm.util.TupleOutputFormatter;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.bolt.SingleJoinBolt;
    +
    +
    +public class SingleJoinExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		final FeederSpout genderSpout = new FeederSpout(new Fields("id", "gender"));
    +		final FeederSpout ageSpout = new FeederSpout(new Fields("id", "age"));
    +
    --- End diff --
    
    Okay should be doable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731351
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkClient.java ---
    @@ -183,10 +183,10 @@ public void submitTopologyWithOpts(final String name, final String uploadedJarLo
     
     		/* set storm configuration */
     		if (this.conf != null) {
    -			topology.getConfig().setGlobalJobParameters(new StormConfig(this.conf));
    +			topology.getExecutionEnvironment().getConfig().setGlobalJobParameters(new StormConfig(this.conf));
     		}
     
    -		final StreamGraph streamGraph = topology.getStreamGraph();
    +		final StreamGraph streamGraph = topology.getExecutionEnvironment().getStreamGraph();
     		streamGraph.setJobName(name);
    --- End diff --
    
    I see. Makes sense now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46276898
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -224,7 +224,7 @@ static synchronized TopologyContext createTopologyContext(
     	 *            OUTPUT: A map from all component IDs to there output streams and output fields.
     	 * 
     	 * @return A unique task ID if the currently processed Spout or Bolt ({@code componentId}) is equal to the current
    -	 *         Flink operator ({@link operatorName}) -- {@code null} otherwise.
    +	 *         Flink operator ({@param operatorName}) -- {@code null} otherwise.
    --- End diff --
    
    Ok, let's use code.
    
    http://stackoverflow.com/questions/1667212/reference-a-method-parameter-in-javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45889260
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/print/PrintSampleStream.java ---
    @@ -0,0 +1,61 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.storm.print;
    +
    +import backtype.storm.Config;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.spout.TwitterSampleSpout;
    +
    +import java.util.Arrays;
    +
    +/**
    + * Prints incoming tweets. Tweets can be filtered by keywords.
    + */
    +public class PrintSampleStream {        
    +	public static void main(String[] args) throws Exception {
    --- End diff --
    
    It shows how to run an existing Storm topology with Flink. It prints from Twitter which is kind of neat. It's also included in Storm. It's nice to have some other examples other than WordCount. This was actually not working before this PR...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45892911
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,474 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    --- End diff --
    
    Yes, execute just throws an Exception. Nothing to explain here. StreamExecutionEnvironment says
    > 	 * @throws Exception which occurs during job execution.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731544
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java ---
    @@ -48,12 +49,10 @@
     	private static final Logger LOG = LoggerFactory.getLogger(FlinkLocalCluster.class);
     
     	/** The flink mini cluster on which to execute the programs */
    -	private final FlinkMiniCluster flink;
    +	private FlinkMiniCluster flink;
     
     
     	public FlinkLocalCluster() {
    -		this.flink = new LocalFlinkMiniCluster(new Configuration(), true, StreamingMode.STREAMING);
    -		this.flink.start();
     	}
    --- End diff --
    
    I just realized that I have something similar in my current work on FLINK-2721 ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46281582
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/BoltWrapperTest.java ---
    @@ -265,12 +264,12 @@ public void testOpen() throws Exception {
     	@Test
     	public void testOpenSink() throws Exception {
     		final IRichBolt bolt = mock(IRichBolt.class);
    -		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt);
    +		BoltWrapper<Object, Object> wrapper = new BoltWrapper<Object, Object>(bolt, "stream", "component");
     		
     		wrapper.setup(createMockStreamTask(), new StreamConfig(new Configuration()), mock(Output.class));
     		wrapper.open();
     		
    -		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), any(OutputCollector.class));
    +		verify(bolt).prepare(any(Map.class), any(TopologyContext.class), isNotNull(OutputCollector.class));
    --- End diff --
    
    I see. Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45892009
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkTopologyTest.java ---
    @@ -14,50 +14,70 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    -import org.apache.flink.storm.api.FlinkTopology;
    -import org.junit.Assert;
    +
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.storm.util.TestDummyBolt;
    +import org.apache.flink.storm.util.TestDummySpout;
    +import org.apache.flink.storm.util.TestSink;
    +import org.junit.Ignore;
     import org.junit.Test;
     
     public class FlinkTopologyTest {
     
    -	@Test
    -	public void testDefaultParallelism() {
    --- End diff --
    
    Why removing this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733215
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    We need to fix this... Supporting only 2 inputs is quite weak. Think about a union of many input streams... Join is not the only use case for multiple inputs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731264
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FiniteFileSpout.java ---
    @@ -32,46 +23,17 @@
     public class FiniteFileSpout extends FileSpout implements FiniteSpout {
     	private static final long serialVersionUID = -1472978008607215864L;
     
    -	private String line;
    -	private boolean newLineRead;
    -
     	public FiniteFileSpout() {}
     
     	public FiniteFileSpout(String path) {
     		super(path);
     	}
     
    -	@SuppressWarnings("rawtypes")
    -	@Override
    -	public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
    -		super.open(conf, context, collector);
    -		newLineRead = false;
    -	}
    -
    -	@Override
    -	public void nextTuple() {
    -		this.collector.emit(new Values(line));
    -		newLineRead = false;
    -	}
    -
     	/**
     	 * Can be called before nextTuple() any times including 0.
     	 */
     	@Override
     	public boolean reachedEnd() {
    -		try {
    -			readLine();
    -		} catch (IOException e) {
    -			throw new RuntimeException("Exception occured while reading file " + path);
    -		}
    -		return line == null;
    +		return finished;
     	}
    --- End diff --
    
    It is more elegant. But you have already Flink in mind. Given the Flink agnostic `FileSpout` we have to do it this way. (assume you do not have the source code of `FileSpout` but only the class file and want to extend it with `FiniteSpout` interface)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45735762
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -18,20 +18,23 @@
     package org.apache.flink.storm.util;
     
     import backtype.storm.task.TopologyContext;
    +import org.apache.flink.core.fs.FSDataOutputStream;
    +import org.apache.flink.core.fs.FileSystem;
    +import org.apache.flink.core.fs.Path;
     
     import java.io.BufferedWriter;
    -import java.io.FileWriter;
     import java.io.IOException;
    +import java.io.OutputStreamWriter;
     import java.util.Map;
     
     /**
    - * Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
    + * Implements a sink that writes the received data to the given file (as a result of {@code Object.toString()} for each
      * attribute).
      */
     public final class BoltFileSink extends AbstractBoltSink {
     	private static final long serialVersionUID = 2014027288631273666L;
     
    -	private final String path;
    +	private final Path path;
     	private BufferedWriter writer;
    --- End diff --
    
    No. People will have spout/bolt code they do not want to touch when running it in Flink. Thus there code will be written in the same way -- and so should be example be. Otherwise, we deliver the impression they need to change there code -- but they don't. Thus, we implement the example Spout/Bolts in a pure Storm way. Of course, if somebody develops a new Spout/Bolt with Flink in mind, your approach makes sense. However, this is not the main focus (it would be even better if the code new stuff Flink native in embedded mode, instead of develop Spout/Bolts which are Flink tailored).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45891849
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkOutputFieldsDeclarerTest.java ---
    @@ -18,9 +18,7 @@
     
     import backtype.storm.tuple.Fields;
     import backtype.storm.utils.Utils;
    -
     import org.apache.flink.api.common.typeinfo.TypeInformation;
    -import org.apache.flink.storm.api.FlinkOutputFieldsDeclarer;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964776
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/TestBolt.java ---
    @@ -16,14 +16,14 @@
      */
     package org.apache.flink.storm.api;
     
    -import java.util.Map;
    -
     import backtype.storm.task.OutputCollector;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.topology.IRichBolt;
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.tuple.Tuple;
     
    +import java.util.Map;
    +
     public class TestBolt implements IRichBolt {
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45965316
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/print/PrintSampleStream.java ---
    @@ -0,0 +1,61 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.storm.print;
    +
    +import backtype.storm.Config;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.spout.TwitterSampleSpout;
    +
    +import java.util.Arrays;
    +
    +/**
    + * Prints incoming tweets. Tweets can be filtered by keywords.
    + */
    +public class PrintSampleStream {        
    +	public static void main(String[] args) throws Exception {
    --- End diff --
    
    I cannot see why it did not work before? Can you explain what the problem was?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964610
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkTopologyTest.java ---
    @@ -14,50 +14,70 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    -import org.apache.flink.storm.api.FlinkTopology;
    -import org.junit.Assert;
    +
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.storm.util.TestDummyBolt;
    +import org.apache.flink.storm.util.TestDummySpout;
    +import org.apache.flink.storm.util.TestSink;
    +import org.junit.Ignore;
     import org.junit.Test;
     
     public class FlinkTopologyTest {
     
    -	@Test
    -	public void testDefaultParallelism() {
    -		final FlinkTopology topology = new FlinkTopology();
    -		Assert.assertEquals(1, topology.getParallelism());
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowSpout() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecute() throws Exception {
    -		new FlinkTopology().execute();
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowBolt() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt1", new TestBolt()).shuffleGrouping("spout");
    +		builder.setBolt("bolt2", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecuteWithName() throws Exception {
    -		new FlinkTopology().execute(null);
    +	@Test(expected = RuntimeException.class)
    +	public void testUndeclaredStream() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("spout");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
     	@Test
    -	public void testNumberOfTasks() {
    -		final FlinkTopology topology = new FlinkTopology();
    +	@Ignore
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45726538
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    --- End diff --
    
    I would use `assert` here because it a private method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45729000
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FiniteFileSpout.java ---
    @@ -32,46 +23,17 @@
     public class FiniteFileSpout extends FileSpout implements FiniteSpout {
     	private static final long serialVersionUID = -1472978008607215864L;
     
    -	private String line;
    -	private boolean newLineRead;
    -
     	public FiniteFileSpout() {}
     
     	public FiniteFileSpout(String path) {
     		super(path);
     	}
     
    -	@SuppressWarnings("rawtypes")
    -	@Override
    -	public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
    -		super.open(conf, context, collector);
    -		newLineRead = false;
    -	}
    -
    -	@Override
    -	public void nextTuple() {
    -		this.collector.emit(new Values(line));
    -		newLineRead = false;
    -	}
    -
     	/**
     	 * Can be called before nextTuple() any times including 0.
     	 */
     	@Override
     	public boolean reachedEnd() {
    -		try {
    -			readLine();
    -		} catch (IOException e) {
    -			throw new RuntimeException("Exception occured while reading file " + path);
    -		}
    -		return line == null;
    +		return finished;
     	}
    --- End diff --
    
    I don't understand why I should copy code instead. It is more elegant this way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728386
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -150,7 +153,7 @@ static synchronized TopologyContext createTopologyContext(
     			}
     			stormTopology = new StormTopology(spouts, bolts, new HashMap<String, StateSpoutSpec>());
     
    -			taskId = context.getIndexOfThisSubtask();
    +			taskId = context.getIndexOfThisSubtask() + 1;
     
    --- End diff --
    
    Because Flink's task index starts with 0 but Storm starts with 1...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45734177
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FiniteFileSpout.java ---
    @@ -32,46 +23,17 @@
     public class FiniteFileSpout extends FileSpout implements FiniteSpout {
     	private static final long serialVersionUID = -1472978008607215864L;
     
    -	private String line;
    -	private boolean newLineRead;
    -
     	public FiniteFileSpout() {}
     
     	public FiniteFileSpout(String path) {
     		super(path);
     	}
     
    -	@SuppressWarnings("rawtypes")
    -	@Override
    -	public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
    -		super.open(conf, context, collector);
    -		newLineRead = false;
    -	}
    -
    -	@Override
    -	public void nextTuple() {
    -		this.collector.emit(new Values(line));
    -		newLineRead = false;
    -	}
    -
     	/**
     	 * Can be called before nextTuple() any times including 0.
     	 */
     	@Override
     	public boolean reachedEnd() {
    -		try {
    -			readLine();
    -		} catch (IOException e) {
    -			throw new RuntimeException("Exception occured while reading file " + path);
    -		}
    -		return line == null;
    +		return finished;
     	}
    --- End diff --
    
    Following your argument, we should also remove the `FiniteSpout` functionality, because it is not included in Storm as well. I'll revert this but I don't understand your argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46284474
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +47,32 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	/** The task id where this tuple is processed */
    +	private final int taskId;
    +	/** The producer of this tuple */
    +	private final String producerStreamId;
    +	/** The producer's component id of this tuple */
    +	private final String producerComponentId;
    +	/*+ The message that is associated with this tuple */
    --- End diff --
    
    `/**` not `+`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45893753
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkTopologyTest.java ---
    @@ -14,50 +14,70 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    -import org.apache.flink.storm.api.FlinkTopology;
    -import org.junit.Assert;
    +
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.storm.util.TestDummyBolt;
    +import org.apache.flink.storm.util.TestDummySpout;
    +import org.apache.flink.storm.util.TestSink;
    +import org.junit.Ignore;
     import org.junit.Test;
     
     public class FlinkTopologyTest {
     
    -	@Test
    -	public void testDefaultParallelism() {
    -		final FlinkTopology topology = new FlinkTopology();
    -		Assert.assertEquals(1, topology.getParallelism());
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowSpout() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecute() throws Exception {
    -		new FlinkTopology().execute();
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowBolt() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt1", new TestBolt()).shuffleGrouping("spout");
    +		builder.setBolt("bolt2", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecuteWithName() throws Exception {
    -		new FlinkTopology().execute(null);
    +	@Test(expected = RuntimeException.class)
    +	public void testUndeclaredStream() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("spout");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
     	@Test
    -	public void testNumberOfTasks() {
    -		final FlinkTopology topology = new FlinkTopology();
    +	@Ignore
    +	public void testFieldsGroupingOnMultipleSpoutOutputStreams() {
    +		TopologyBuilder builder = new TopologyBuilder();
     
    -		Assert.assertEquals(0, topology.getNumberOfTasks());
    +		builder.setSpout("spout", new TestDummySpout());
    +		builder.setBolt("sink", new TestSink()).fieldsGrouping("spout",
    +				TestDummySpout.spoutStreamId, new Fields("id"));
     
    -		topology.increaseNumberOfTasks(3);
    -		Assert.assertEquals(3, topology.getNumberOfTasks());
    +		FlinkTopology.createTopology(builder);
    +	}
     
    -		topology.increaseNumberOfTasks(2);
    -		Assert.assertEquals(5, topology.getNumberOfTasks());
    +	@Test
    +	@Ignore
    --- End diff --
    
    Please enable this test. I forgot to do this in my last commit which fixes this issue...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45740238
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/join/SingleJoinExample.java ---
    @@ -0,0 +1,86 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.join;
    +
    +import backtype.storm.Config;
    +import backtype.storm.testing.FeederSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import backtype.storm.tuple.Values;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import org.apache.flink.storm.util.BoltFileSink;
    +import org.apache.flink.storm.util.TupleOutputFormatter;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.bolt.SingleJoinBolt;
    +
    +
    +public class SingleJoinExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		final FeederSpout genderSpout = new FeederSpout(new Fields("id", "gender"));
    +		final FeederSpout ageSpout = new FeederSpout(new Fields("id", "age"));
    +
    --- End diff --
    
    You mean different number of fields?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728796
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -40,16 +43,17 @@ public BoltFileSink(final String path) {
     
     	public BoltFileSink(final String path, final OutputFormatter formatter) {
     		super(formatter);
    -		this.path = path;
    +		this.path = new Path(path);
     	}
     
     	@SuppressWarnings("rawtypes")
     	@Override
     	public void prepareSimple(final Map stormConf, final TopologyContext context) {
     		try {
    -			this.writer = new BufferedWriter(new FileWriter(this.path));
    +			FSDataOutputStream outputStream = FileSystem.getLocalFileSystem().create(path, false);
    +			this.writer = new BufferedWriter(new OutputStreamWriter(outputStream));
     		} catch (final IOException e) {
    --- End diff --
    
    I had problems if the path contained a file scheme without the file system abstraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45891069
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/SpoutCollector.java ---
    @@ -18,7 +18,6 @@
     package org.apache.flink.storm.wrappers;
     
     import backtype.storm.spout.ISpoutOutputCollector;
    -
     import org.apache.flink.api.java.tuple.Tuple0;
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45734441
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/BoltFileSink.java ---
    @@ -18,20 +18,23 @@
     package org.apache.flink.storm.util;
     
     import backtype.storm.task.TopologyContext;
    +import org.apache.flink.core.fs.FSDataOutputStream;
    +import org.apache.flink.core.fs.FileSystem;
    +import org.apache.flink.core.fs.Path;
     
     import java.io.BufferedWriter;
    -import java.io.FileWriter;
     import java.io.IOException;
    +import java.io.OutputStreamWriter;
     import java.util.Map;
     
     /**
    - * Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
    + * Implements a sink that writes the received data to the given file (as a result of {@code Object.toString()} for each
      * attribute).
      */
     public final class BoltFileSink extends AbstractBoltSink {
     	private static final long serialVersionUID = 2014027288631273666L;
     
    -	private final String path;
    +	private final Path path;
     	private BufferedWriter writer;
    --- End diff --
    
    So people download Flink and learn about the Storm compatibility layer to write spouts/bolts which they use in Storm topologies that run without Flink? That is beyond my imagination :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r46284171
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapper.java ---
    @@ -108,16 +114,13 @@ public BoltWrapper(final IRichBolt bolt, final Fields inputSchema)
     	 * for POJO input types. The output type can be any type if parameter {@code rawOutput} is {@code true} and the
     	 * bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will be one
     	 * of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
    -	 * 
    -	 * @param bolt
    -	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param bolt The Storm {@link IRichBolt bolt} to be used.
     	 * @param rawOutputs
     	 *            Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
     	 *            of a raw type.
     	 * @throws IllegalArgumentException
     	 *             If {@code rawOuput} is {@code true} and the number of declared output attributes is not 1 or if
     	 *             {@code rawOuput} is {@code false} and the number of declared output attributes is not with range
    -	 *             [1;25].
    --- End diff --
    
    Please keep and update: should be `[0;25]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45894174
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/StormTuple.java ---
    @@ -44,16 +45,30 @@
     	/** The schema (ie, ordered field names) of the tuple */
     	private final Fields schema;
     
    +	private final int taskId;
    +	private final String producerStreamId;
    +	private final MessageId id;
    +	private final String producerComponentId;
    +
    +
    +	/**
    +	 * Constructor which sets defaults for producerComponentId, taskId, and componentID
    +	 * @param flinkTuple the Flink tuple
    +	 * @param schema The schema of the storm fields
    +	 */
    +	StormTuple(final IN flinkTuple, final Fields schema) {
    +		this(flinkTuple, schema, -1, "testStream", "componentID");
    +	}
    --- End diff --
    
    Are this meaningful/helpful defaults? Why not just set it to `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964692
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/api/FlinkTopologyTest.java ---
    @@ -14,50 +14,70 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    -import org.apache.flink.storm.api.FlinkTopology;
    -import org.junit.Assert;
    +
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.storm.util.TestDummyBolt;
    +import org.apache.flink.storm.util.TestDummySpout;
    +import org.apache.flink.storm.util.TestSink;
    +import org.junit.Ignore;
     import org.junit.Test;
     
     public class FlinkTopologyTest {
     
    -	@Test
    -	public void testDefaultParallelism() {
    -		final FlinkTopology topology = new FlinkTopology();
    -		Assert.assertEquals(1, topology.getParallelism());
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowSpout() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecute() throws Exception {
    -		new FlinkTopology().execute();
    +	@Test(expected = RuntimeException.class)
    +	public void testUnknowBolt() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt1", new TestBolt()).shuffleGrouping("spout");
    +		builder.setBolt("bolt2", new TestBolt()).shuffleGrouping("unknown");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
    -	@Test(expected = UnsupportedOperationException.class)
    -	public void testExecuteWithName() throws Exception {
    -		new FlinkTopology().execute(null);
    +	@Test(expected = RuntimeException.class)
    +	public void testUndeclaredStream() {
    +		TopologyBuilder builder = new TopologyBuilder();
    +		builder.setSpout("spout", new TestSpout());
    +		builder.setBolt("bolt", new TestBolt()).shuffleGrouping("spout");
    +
    +		FlinkTopology.createTopology(builder);
     	}
     
     	@Test
    -	public void testNumberOfTasks() {
    -		final FlinkTopology topology = new FlinkTopology();
    +	@Ignore
    +	public void testFieldsGroupingOnMultipleSpoutOutputStreams() {
    +		TopologyBuilder builder = new TopologyBuilder();
     
    -		Assert.assertEquals(0, topology.getNumberOfTasks());
    +		builder.setSpout("spout", new TestDummySpout());
    +		builder.setBolt("sink", new TestSink()).fieldsGrouping("spout",
    +				TestDummySpout.spoutStreamId, new Fields("id"));
     
    -		topology.increaseNumberOfTasks(3);
    -		Assert.assertEquals(3, topology.getNumberOfTasks());
    +		FlinkTopology.createTopology(builder);
    +	}
     
    -		topology.increaseNumberOfTasks(2);
    -		Assert.assertEquals(5, topology.getNumberOfTasks());
    +	@Test
    +	@Ignore
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45741186
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    Agreed. My concern was to get the join working but union should also be supported for same data output types. Could we do that in a follow-up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733689
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    --- End diff --
    
    This is an improvement. Before, it would just blindly take one input at a time and create an operator. As far as I understand, Flink only supports two inputs at a time. I would like to address any further improvements in this regard after the merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733444
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/join/SingleJoinExample.java ---
    @@ -0,0 +1,86 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.join;
    +
    +import backtype.storm.Config;
    +import backtype.storm.testing.FeederSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import backtype.storm.tuple.Values;
    +import backtype.storm.utils.Utils;
    +import org.apache.flink.storm.api.FlinkLocalCluster;
    +import org.apache.flink.storm.api.FlinkTopology;
    +import org.apache.flink.storm.util.BoltFileSink;
    +import org.apache.flink.storm.util.TupleOutputFormatter;
    +import storm.starter.bolt.PrinterBolt;
    +import storm.starter.bolt.SingleJoinBolt;
    +
    +
    +public class SingleJoinExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		final FeederSpout genderSpout = new FeederSpout(new Fields("id", "gender"));
    +		final FeederSpout ageSpout = new FeederSpout(new Fields("id", "age"));
    +
    --- End diff --
    
    Can you extend this to use different number of attributes. Makes test more generic I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45744427
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/wrappers/WrapperSetupHelperTest.java ---
    @@ -180,8 +178,6 @@ public void testCreateTopologyContext() {
     		builder.setBolt("bolt2", (IRichBolt) operators.get("bolt2"), dops.get("bolt2")).allGrouping("spout2");
     		builder.setBolt("sink", (IRichBolt) operators.get("sink"), dops.get("sink"))
     				.shuffleGrouping("bolt1", TestDummyBolt.groupingStreamId)
    -				.shuffleGrouping("bolt1", TestDummyBolt.shuffleStreamId)
    -				.shuffleGrouping("bolt2", TestDummyBolt.groupingStreamId)
     				.shuffleGrouping("bolt2", TestDummyBolt.shuffleStreamId);
    --- End diff --
    
    I see you point. But this PR might be too big anyway. You try to do 3 thing at the same time (two are backed up by a JIRA). How hard would it be to split this PR? Last but not least, the multi-input-stream JIRA is not resolved by this. [And the second JIRA you try to resolve is assigned to me, and I have already worked on it -- I actually would like to finish my work on it]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45890935
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/BoltWrapperTwoInput.java ---
    @@ -0,0 +1,130 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.storm.wrappers;
    +
    +import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.tuple.Fields;
    +import org.apache.flink.api.java.tuple.Tuple0;
    +import org.apache.flink.api.java.tuple.Tuple1;
    +import org.apache.flink.api.java.tuple.Tuple25;
    +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator;
    +import org.apache.flink.streaming.api.watermark.Watermark;
    +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
    +
    +import java.util.Collection;
    +
    +/**
    + * A {@link BoltWrapperTwoInput} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming
    + * program. In contrast to {@link BoltWrapper}, this wrapper takes two input stream as input.
    + */
    +public class BoltWrapperTwoInput<IN1, IN2, OUT> extends BoltWrapper<IN1, OUT> implements TwoInputStreamOperator<IN1, IN2, OUT> {
    +
    +	/** The schema (ie, ordered field names) of the second input stream. */
    +	private final Fields inputSchema2;
    +
    +	private final String componentId2;
    +	private final String streamId2;
    +
    +	/**
    +	 * Instantiates a new {@link BoltWrapperTwoInput} that wraps the given Storm {@link IRichBolt bolt} such that it can be
    +	 * used within a Flink streaming program. The given input schema enable attribute-by-name access for input types
    +	 * {@link Tuple0} to {@link Tuple25}. The output type can be any type if parameter {@code rawOutput} is {@code true}
    +	 * and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will
    +	 * be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
    +	 *  @param bolt
    +	 *            The Storm {@link IRichBolt bolt} to be used.
    +	 * @param boltId
    +	 * @param componentId2
    +	 * @param streamId1
    +	 * @param inputSchema1
    +*            The schema (ie, ordered field names) of the input stream.    @throws IllegalArgumentException
    +*             If {@code rawOuput} is {@code true} and the number of declared output attributes is not 1 or if
    +*             {@code rawOuput} is {@code false} and the number of declared output attributes is not with range
    +	 * */
    --- End diff --
    
    formatting (space and stars) incomplete JavaDoc; missing `@throws`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45970389
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/FlinkTopologyContext.java ---
    @@ -27,13 +27,12 @@
     import backtype.storm.state.ISubscribedState;
     import backtype.storm.task.TopologyContext;
     import backtype.storm.tuple.Fields;
    +import clojure.lang.Atom;
     
    --- End diff --
    
    Good we are on the same page. And I don't want to bully you! I just mentioned the classes that do not contain any actual code change -- actually, according to the coding guidelines -- there should be no import-order changes even in the classes with code changes -- I did not comment on them -- just on the classes with pure reformatting. I like consistency so please apply the changes to all classes. But I did import-reorderings or making code formatting consistent (if it was inconsistent) and was always told "don't do this". So if it is a general rule, I just point it out here, too. I did not come up with the rule. And I never force my own code style -- a always adapt to the given style. :) It's is really about time to get a proper maven formatting tool running to get rid of all this stupid discussions. (And a said already: "It is not against you or the change itself" -- but the process seems to be inconsistent -- people follow the rules more or less strictly)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45885615
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/split/operators/RandomSpout.java ---
    @@ -17,9 +17,6 @@
      */
     package org.apache.flink.storm.split.operators;
     
    -import java.util.Map;
    -import java.util.Random;
    -
    --- End diff --
    
    pure reformatting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45731094
  
    --- Diff: flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/util/FileSpout.java ---
    @@ -38,6 +38,8 @@
     	protected String path = null;
     	protected BufferedReader reader;
     
    +	protected boolean finished;
    +
     	public FileSpout() {}
    --- End diff --
    
    Yes. But as the argument from above explains. All spout/bolts should be written in a Flink agnostic way. No Storm developer would write it like this, because this flag does only make sense if you know about `FinitSpout` interface -- but you should not assume to know about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45728418
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -187,14 +190,15 @@ static synchronized TopologyContext createTopologyContext(
     				}
     			}
     			for (Entry<String, StateSpoutSpec> stateSpout : stateSpouts.entrySet()) {
    -				Integer rc = taskId = processSingleOperator(stateSpout.getKey(), stateSpout
    +				Integer rc = processSingleOperator(stateSpout.getKey(), stateSpout
     						.getValue().get_common(), operatorName, context.getIndexOfThisSubtask(),
     						dop, taskToComponents, componentToSortedTasks, componentToStreamToFields);
     				if (rc != null) {
     					taskId = rc;
     				}
     			}
    -			assert (taskId != null);
    +
    +			Preconditions.checkNotNull("Task ID may not be null!", taskId);
     		}
    --- End diff --
    
    They are disabled by default and are not reliable enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45733908
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java ---
    @@ -15,75 +16,468 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -
     package org.apache.flink.storm.api;
     
    +import backtype.storm.generated.ComponentCommon;
    +import backtype.storm.generated.GlobalStreamId;
    +import backtype.storm.generated.Grouping;
     import backtype.storm.generated.StormTopology;
    +import backtype.storm.topology.IRichBolt;
    +import backtype.storm.topology.IRichSpout;
    +import backtype.storm.topology.IRichStateSpout;
    +import backtype.storm.topology.TopologyBuilder;
    +import backtype.storm.tuple.Fields;
    +import com.google.common.base.Preconditions;
     import org.apache.flink.api.common.JobExecutionResult;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.api.java.tuple.Tuple;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.storm.util.SplitStreamMapper;
    +import org.apache.flink.storm.util.SplitStreamType;
    +import org.apache.flink.storm.util.StormStreamSelector;
    +import org.apache.flink.storm.wrappers.BoltWrapper;
    +import org.apache.flink.storm.wrappers.BoltWrapperTwoInput;
    +import org.apache.flink.storm.wrappers.SpoutWrapper;
    +import org.apache.flink.streaming.api.datastream.DataStream;
    +import org.apache.flink.streaming.api.datastream.DataStreamSource;
    +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    +import org.apache.flink.streaming.api.datastream.SplitStream;
     import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    +import org.apache.flink.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     /**
    - * {@link FlinkTopology} mimics a {@link StormTopology} and is implemented in terms of a {@link
    - * StreamExecutionEnvironment} . In contrast to a regular {@link StreamExecutionEnvironment}, a {@link FlinkTopology}
    - * cannot be executed directly, but must be handed over to a {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or
    - * {@link FlinkClient}.
    + * {@link FlinkTopology} translates a {@link TopologyBuilder} to a Flink program.
    + * <strong>CAUTION: {@link IRichStateSpout StateSpout}s are currently not supported.</strong>
      */
    -public class FlinkTopology extends StreamExecutionEnvironment {
    +public class FlinkTopology {
    +
    +	/** All declared streams and output schemas by operator ID */
    +	private final HashMap<String, HashMap<String, Fields>> outputStreams = new HashMap<String, HashMap<String, Fields>>();
    +	/** All spouts&bolts declarers by their ID */
    +	private final HashMap<String, FlinkOutputFieldsDeclarer> declarers = new HashMap<String, FlinkOutputFieldsDeclarer>();
    +
    +	private final HashMap<String, Set<Entry<GlobalStreamId, Grouping>>> unprocessdInputsPerBolt =
    +			new HashMap<String, Set<Entry<GlobalStreamId, Grouping>>>();
    +
    +	final HashMap<String, HashMap<String, DataStream<Tuple>>> availableInputs = new HashMap<>();
     
    -	/** The number of declared tasks for the whole program (ie, sum over all dops) */
    -	private int numberOfTasks = 0;
    +	private final TopologyBuilder builder;
     
    -	public FlinkTopology() {
    -		// Set default parallelism to 1, to mirror Storm default behavior
    -		super.setParallelism(1);
    +	// needs to be a class member for internal testing purpose
    +	private final StormTopology stormTopology;
    +
    +	private final Map<String, IRichSpout> spouts;
    +	private final Map<String, IRichBolt> bolts;
    +
    +	private final StreamExecutionEnvironment env;
    +
    +	private FlinkTopology(TopologyBuilder builder) {
    +		this.builder = builder;
    +		this.stormTopology = builder.createTopology();
    +		// extract the spouts and bolts
    +		this.spouts = getPrivateField("_spouts");
    +		this.bolts = getPrivateField("_bolts");
    +
    +		this.env = StreamExecutionEnvironment.getExecutionEnvironment();
    +
    +		// Kick off the translation immediately
    +		translateTopology();
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter}, or {@link
    -	 * FlinkClient}.
     	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Creates a Flink program that uses the specified spouts and bolts.
    +	 * @param stormBuilder The storm topology builder to use for creating the Flink topology.
    +	 * @return A Flink Topology which may be executed.
     	 */
    -	@Override
    -	public JobExecutionResult execute() throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
    +		return new FlinkTopology(stormBuilder);
     	}
     
     	/**
    -	 * Is not supported. In order to execute use {@link FlinkLocalCluster}, {@link FlinkSubmitter} or {@link
    -	 * FlinkClient}.
    -	 *
    -	 * @throws UnsupportedOperationException
    -	 * 		at every invocation
    +	 * Returns the underlying Flink ExecutionEnvironment for the Storm topology.
    +	 * @return The contextual environment.
     	 */
    -	@Override
    -	public JobExecutionResult execute(final String jobName) throws Exception {
    -		throw new UnsupportedOperationException(
    -				"A FlinkTopology cannot be executed directly. Use FlinkLocalCluster, FlinkSubmitter, or FlinkClient " +
    -				"instead.");
    +	public StreamExecutionEnvironment getExecutionEnvironment() {
    +		return this.env;
     	}
     
     	/**
    -	 * Increased the number of declared tasks of this program by the given value.
    -	 *
    -	 * @param dop
    -	 * 		The dop of a new operator that increases the number of overall tasks.
    +	 * Directly executes the Storm topology based on the current context (local when in IDE and
    +	 * remote when executed thorugh ./bin/flink).
    +	 * @return The execution result
    +	 * @throws Exception
     	 */
    -	public void increaseNumberOfTasks(final int dop) {
    -		assert (dop > 0);
    -		this.numberOfTasks += dop;
    +	public JobExecutionResult execute() throws Exception {
    +		return env.execute();
    +	}
    +
    +
    +	@SuppressWarnings("unchecked")
    +	private <T> Map<String, T> getPrivateField(String field) {
    +		try {
    +			Field f = builder.getClass().getDeclaredField(field);
    +			f.setAccessible(true);
    +			return copyObject((Map<String, T>) f.get(builder));
    +		} catch (NoSuchFieldException | IllegalAccessException e) {
    +			throw new RuntimeException("Couldn't get " + field + " from TopologyBuilder", e);
    +		}
    +	}
    +
    +	private <T> T copyObject(T object) {
    +		try {
    +			return InstantiationUtil.deserializeObject(
    +					InstantiationUtil.serializeObject(object),
    +					getClass().getClassLoader()
    +			);
    +		} catch (IOException | ClassNotFoundException e) {
    +			throw new RuntimeException("Failed to copy object.");
    +		}
     	}
     
     	/**
    -	 * Return the number or required tasks to execute this program.
    -	 *
    -	 * @return the number or required tasks to execute this program
    +	 * Creates a Flink program that uses the specified spouts and bolts.
     	 */
    -	public int getNumberOfTasks() {
    -		return this.numberOfTasks;
    +	private void translateTopology() {
    +
    +		unprocessdInputsPerBolt.clear();
    +		outputStreams.clear();
    +		declarers.clear();
    +		availableInputs.clear();
    +
    +		// Storm defaults to parallelism 1
    +		env.setParallelism(1);
    +
    +		/* Translation of topology */
    +
    +
    +		for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
    +			final String spoutId = spout.getKey();
    +			final IRichSpout userSpout = spout.getValue();
    +
    +			final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
    +			userSpout.declareOutputFields(declarer);
    +			final HashMap<String,Fields> sourceStreams = declarer.outputStreams;
    +			this.outputStreams.put(spoutId, sourceStreams);
    +			declarers.put(spoutId, declarer);
    +
    +
    +			final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
    +			final DataStreamSource<?> source;
    +
    +			if (sourceStreams.size() == 1) {
    +				final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout);
    +				spoutWrapperSingleOutput.setStormTopology(stormTopology);
    +
    +				final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
    +
    +				DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
    +						declarer.getOutputType(outputStreamId));
    +
    +				outputStreams.put(outputStreamId, src);
    +				source = src;
    +			} else {
    +				final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
    +						userSpout);
    +				spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
    +
    +				@SuppressWarnings({ "unchecked", "rawtypes" })
    +				DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
    +						spoutWrapperMultipleOutputs, spoutId,
    +						(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
    +
    +				SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
    +						.split(new StormStreamSelector<Tuple>());
    +				for (String streamId : sourceStreams.keySet()) {
    +					outputStreams.put(streamId, splitSource.select(streamId).map(new SplitStreamMapper<Tuple>()));
    +				}
    +				source = multiSource;
    +			}
    +			availableInputs.put(spoutId, outputStreams);
    +
    +			final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
    +			if (common.is_set_parallelism_hint()) {
    +				int dop = common.get_parallelism_hint();
    +				source.setParallelism(dop);
    +			} else {
    +				common.set_parallelism_hint(1);
    +			}
    +		}
    +
    +		/**
    +		* 1. Connect all spout streams with bolts streams
    +		* 2. Then proceed with the bolts stream already connected
    +		*
    +		*  Because we do not know the order in which an iterator steps over a set, we might process a consumer before
    +		* its producer
    +		* ->thus, we might need to repeat multiple times
    +		*/
    +		boolean makeProgress = true;
    +		while (bolts.size() > 0) {
    +			if (!makeProgress) {
    +				throw new RuntimeException(
    +						"Unable to build Topology. Could not connect the following bolts: "
    +								+ bolts.keySet());
    +			}
    +			makeProgress = false;
    +
    +			final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
    +			while (boltsIterator.hasNext()) {
    +
    +				final Entry<String, IRichBolt> bolt = boltsIterator.next();
    +				final String boltId = bolt.getKey();
    +				final IRichBolt userBolt = copyObject(bolt.getValue());
    +
    +				final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
    +
    +				Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
    +				if (unprocessedBoltInputs == null) {
    +					unprocessedBoltInputs = new HashSet<>();
    +					unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
    +					unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
    +				}
    +
    +				// check if all inputs are available
    +				final int numberOfInputs = unprocessedBoltInputs.size();
    +				int inputsAvailable = 0;
    +				for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
    +					final String producerId = entry.getKey().get_componentId();
    +					final String streamId = entry.getKey().get_streamId();
    +					final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
    +					if (streams != null && streams.get(streamId) != null) {
    +						inputsAvailable++;
    +					}
    +				}
    +
    +				if (inputsAvailable != numberOfInputs) {
    +					// traverse other bolts first until inputs are available
    +					continue;
    +				} else {
    +					makeProgress = true;
    +					boltsIterator.remove();
    +				}
    +
    +				final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
    +
    +				for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
    +					final GlobalStreamId streamId = input.getKey();
    +					final Grouping grouping = input.getValue();
    +
    +					final String producerId = streamId.get_componentId();
    +
    +					final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
    +
    +					inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
    +				}
    +
    +				final Iterator<Entry<GlobalStreamId, DataStream<Tuple>>> iterator = inputStreams.entrySet().iterator();
    +
    +				final Entry<GlobalStreamId, DataStream<Tuple>> firstInput = iterator.next();
    +				GlobalStreamId streamId = firstInput.getKey();
    +				DataStream<Tuple> inputStream = firstInput.getValue();
    +
    +				final SingleOutputStreamOperator<?, ?> outputStream;
    +
    +				switch (numberOfInputs) {
    +					case 1:
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream);
    +						break;
    +					case 2:
    +						Entry<GlobalStreamId, DataStream<Tuple>> secondInput = iterator.next();
    +						GlobalStreamId streamId2 = secondInput.getKey();
    +						DataStream<Tuple> inputStream2 = secondInput.getValue();
    +						outputStream = createOutput(boltId, userBolt, streamId, inputStream, streamId2, inputStream2);
    +						break;
    +					default:
    +						throw new UnsupportedOperationException("Don't know how to translate a bolt "
    +								+ boltId + " with " + numberOfInputs + " inputs.");
    +				}
    +
    +				if (common.is_set_parallelism_hint()) {
    +					int dop = common.get_parallelism_hint();
    +					outputStream.setParallelism(dop);
    +				} else {
    +					common.set_parallelism_hint(1);
    +				}
    +
    +			}
    +		}
     	}
     
    +	private DataStream<Tuple> processInput(String boltId, IRichBolt userBolt,
    +										GlobalStreamId streamId, Grouping grouping,
    +										Map<String, DataStream<Tuple>> producer) {
    +
    +		Preconditions.checkNotNull(userBolt);
    --- End diff --
    
    Yes, but these checks can be executed during run time of the user program. Assert only checks these when debugging is enabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mxm <gi...@git.apache.org>.

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45964866
  
    --- Diff: flink-contrib/flink-storm/src/test/java/org/apache/flink/storm/util/TestDummySpout.java ---
    @@ -26,6 +24,8 @@
     import backtype.storm.tuple.Values;
     import backtype.storm.utils.Utils;
     
    +import java.util.Map;
    +
     public class TestDummySpout implements IRichSpout {
    --- End diff --
    
    It follows the import style of the other classes, so I'll leave this as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2837][storm] various improvements for S...

Posted by mjsax <gi...@git.apache.org>.

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1398#discussion_r45727164
  
    --- Diff: flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/wrappers/WrapperSetupHelper.java ---
    @@ -187,14 +190,15 @@ static synchronized TopologyContext createTopologyContext(
     				}
     			}
     			for (Entry<String, StateSpoutSpec> stateSpout : stateSpouts.entrySet()) {
    -				Integer rc = taskId = processSingleOperator(stateSpout.getKey(), stateSpout
    +				Integer rc = processSingleOperator(stateSpout.getKey(), stateSpout
     						.getValue().get_common(), operatorName, context.getIndexOfThisSubtask(),
     						dop, taskToComponents, componentToSortedTasks, componentToStreamToFields);
     				if (rc != null) {
     					taskId = rc;
     				}
     			}
    -			assert (taskId != null);
    +
    +			Preconditions.checkNotNull("Task ID may not be null!", taskId);
     		}
    --- End diff --
    
    I prefer `assert` for internal method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---