You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by aljoscha <gi...@git.apache.org> on 2015/08/04 18:40:31 UTC

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/988

    [FLINK-2398][api-breaking] Introduce StreamGraphGenerator

    This decouples the building of the StreamGraph from the API methods.
    Before, the methods would build the StreamGraph as they go. Now the API
    methods build a hierachy of StreamTransformation nodes. From these a
    StreamGraph is generated upon execution.
    
    This also introduces some API breaking changes:
    
     - The result of methods that create sinks is now DataStreamSink instead
       of DataStream
     - Iterations cannot have feedback edges with differing parallelism
     - "Preserve partitioning" is not the default for feedback edges. The
       previous option for this is removed.
     - You can close an iteration several times, no need for a union.
     - Strict checking of whether partitioning and parallelism work
       together. I.e. if upstream and downstream parallelism don't match it
       is not legal to have Forward partitioning anymore. This was not very
       transparent: When you went from low parallelism to high dop some
       downstream  operators would never get any input. When you went from high
       parallelism to low dop you would get skew in the downstream operators
       because all elements that would be forwarded to an operator that is not
       "there" go to another operator. This requires insertion of global()
       or rebalance() in some places. For example with most sources which
       have parallelism one.
    
    This is from the Javadoc of StreamTransformation, it describes quite well how it works:
    ```
    A {@code StreamTransformation} represents the operation that creates a
     * {@link org.apache.flink.streaming.api.datastream.DataStream}. Every
     * {@link org.apache.flink.streaming.api.datastream.DataStream} has an underlying
     * {@code StreamTransformation} that is the origin of said DataStream.
     *
     * <p>
     * API operations such as {@link org.apache.flink.streaming.api.datastream.DataStream#map} create
     * a tree of {@code StreamTransformation}s underneath. When the stream program is to be executed this
     * graph is translated to a {@link StreamGraph} using
     * {@link org.apache.flink.streaming.api.graph.StreamGraphGenerator}.
     *
     * <p>
     * A {@code StreamTransformation} does not necessarily correspond to a physical operation
     * at runtime. Some operations are only logical concepts. Examples of this are union,
     * split/select data stream, partitioning.
     *
     * <p>
     * The following graph of {@code StreamTransformations}:
     *
     * <pre>
     *   Source              Source        
     *      +                   +           
     *      |                   |           
     *      v                   v           
     *  Rebalance          HashPartition    
     *      +                   +           
     *      |                   |           
     *      |                   |           
     *      +------>Union<------+           
     *                +                     
     *                |                     
     *                v                     
     *              Split                   
     *                +                     
     *                |                     
     *                v                     
     *              Select                  
     *                +                     
     *                v                     
     *               Map                    
     *                +                     
     *                |                     
     *                v                     
     *              Sink 
     * </pre>
     *
     * Would result in this graph of operations at runtime:
     *
     * <pre>
     *  Source              Source
     *    +                   +
     *    |                   |
     *    |                   |
     *    +------->Map<-------+
     *              +
     *              |
     *              v
     *             Sink
     * </pre>
     *
     * The information about partitioning, union, split/select end up being encoded in the edges
     * that connect the sources to the map operation.
    ```
    
    I still have to fix the Scala examples, but you can already comment on the overall idea and implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink stream-api-rework

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #988
    
----
commit dce02be23fc98390b8b0b98f02ad1dd69be30d4c
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2015-07-23T13:12:38Z

    [FLINK-2398][api-breaking] Introduce StreamGraphGenerator
    
    This decouples the building of the StreamGraph from the API methods.
    Before the methods would build the StreamGraph as they go. Now the API
    methods build a hierachy of StreamTransformation nodes. From these a
    StreamGraph is generated upon execution.
    
    This also introduces some API breaking changes:
    
     - The result of methods that create sinks is now DataStreamSink instead
       of DataStream
     - Iterations cannot have feedback edges with differing parallelism
     - "Preserve partitioning" is not the default for feedback edges. The
       previous option for this is removed.
     - You can close an iteration several times, no need for a union.
     - Strict checking of whether partitioning and parallelism work
       together. I.e. if upstream and downstream parallelism don't match it
       is not legal to have Forward partitioning anymore. This was not very
       transparent: When you went from low parallelism to high dop some
       downstream  operators would never get any input. When you went from high
       parallelism to low dop you would get skew in the downstream operators
       because all elements that would be forwarded to an operator that is not
       "there" go to another operator. This requires insertion of global()
       or rebalance() in some places. For example with most sources which
       have parallelism one.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-131892889
  
    I changed it to execute dangling operators now. There is, however, a strange "feature". This code works on master: https://gist.github.com/aljoscha/bbe74309a31a16ca8413. It catches away the exception that results from not being able to determine the output type of the generic map. Then, when execute is called it executes just fine up until (and including) the generic map as can be seen from the `println` output.
    
    With this PR this won't work anymore. The upon `execute` the the StreamGraphBuilder tries to build the StreamGraph from the graph of Transformations. It encounters the dangling map for which the output type cannot be determined and then it fails.
    
    This behavior is problematic since the TestStreamEnvironment is reused for several streaming tests. Tests fail in seemingly unconnected parts of the code because dangling operators without type information still linger in the execution environment. I mentioned this here: https://issues.apache.org/jira/browse/FLINK-2508
    
    I have a quick fix for this, for now. I think, however, that the streaming tests need to be consolidated and the streaming environments also need to be refactored a bit. (In addition to the batch exec envs, because they should probably be reused in large parts for streaming.)  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/988#discussion_r36347361
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/transformations/SinkTransformation.java ---
    @@ -0,0 +1,106 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.api.transformations;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.flink.api.java.functions.KeySelector;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.streaming.api.operators.StreamOperator;
    +import org.apache.flink.streaming.api.operators.StreamSink;
    +
    +import java.util.Collection;
    +import java.util.List;
    +
    +/**
    + * This Transformation represents a Sink.
    + *
    + * @param <T> The type of the elements in the input {@code SinkTransformation}
    + */
    +public class SinkTransformation<T> extends StreamTransformation<Object> {
    --- End diff --
    
    Because the Transformation does not emit any elements. But you're right, it could also be the special Nothing type that we have in Flink


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-129563293
  
    Sorry for the lack of activity. I'm currently on vacation and will pick this up again when I'm back, next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by gyfora <gi...@git.apache.org>.
Github user gyfora commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-127924076
  
    Why isn't rebalance implied when the 2 operators don't have the same parallelism and partitioning is not defined? If you don't specify the partitioning (which defaults to forward) means that you don't specifically care, in this case implied rebalance is the most natural thing.
    
    It's arguable that if the user specifically says forward then we give an error but otherwise I this this just hurts the developers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by mbalassi <gi...@git.apache.org>.
Github user mbalassi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/988#discussion_r36284249
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/transformations/CoFeedbackTransformation.java ---
    @@ -0,0 +1,122 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.api.transformations;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.streaming.api.operators.StreamOperator;
    +
    +import java.util.Collection;
    +import java.util.Collections;
    +import java.util.List;
    +
    +/**
    + * This represents a feedback point in a topology. The type of the feedback elements must not match
    + * the type of the upstream {@code StreamTransformation} because the only allowed operations
    + * after a {@code CoFeedbackTransformation} are
    + * {@link org.apache.flink.streaming.api.transformations.TwoInputTransformation TwoInputTransformations}.
    --- End diff --
    
    woInputTransformation TwoInputTransformations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by mbalassi <gi...@git.apache.org>.
Github user mbalassi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/988#discussion_r36284613
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/transformations/SinkTransformation.java ---
    @@ -0,0 +1,106 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.api.transformations;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.flink.api.java.functions.KeySelector;
    +import org.apache.flink.api.java.typeutils.TypeExtractor;
    +import org.apache.flink.streaming.api.operators.StreamOperator;
    +import org.apache.flink.streaming.api.operators.StreamSink;
    +
    +import java.util.Collection;
    +import java.util.List;
    +
    +/**
    + * This Transformation represents a Sink.
    + *
    + * @param <T> The type of the elements in the input {@code SinkTransformation}
    + */
    +public class SinkTransformation<T> extends StreamTransformation<Object> {
    --- End diff --
    
    Why is Object the generic type of the base class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/988#discussion_r36347244
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/transformations/CoFeedbackTransformation.java ---
    @@ -0,0 +1,122 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.streaming.api.transformations;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +import org.apache.flink.streaming.api.operators.StreamOperator;
    +
    +import java.util.Collection;
    +import java.util.Collections;
    +import java.util.List;
    +
    +/**
    + * This represents a feedback point in a topology. The type of the feedback elements must not match
    + * the type of the upstream {@code StreamTransformation} because the only allowed operations
    + * after a {@code CoFeedbackTransformation} are
    + * {@link org.apache.flink.streaming.api.transformations.TwoInputTransformation TwoInputTransformations}.
    --- End diff --
    
    what's the issue? :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-132582717
  
    What I did is basically adding every operator to the list of "sinks". When `execute` is called the translation starts from every operator, which is ok, since an operator is not transformed twice.
    
    Are there any objections to merging this now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by gyfora <gi...@git.apache.org>.
Github user gyfora commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-128404999
  
    It would be good to get some feedback from the others as well, but in general my arguments are the following:
    
    1. Getting exceptions after non-parallel sources just because you didn't rebalance (or in any other case), is very confusing for the user and seems unintuitive. Therefore I vote for fixing the current behaviour and keeping rebalance implied.
     
    2. There is probably a good reason why a user implements an operator (maybe some outside world communication), so not executing it (for instance no sink attached) will lead to incorrect behaviour. Also we cannot force someone to always use a sink in this case as they might need some special behaviour implemented by some other operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha closed the pull request at:

    https://github.com/apache/flink/pull/988


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-132585797
  
    I also updated the documentation to reflect the changes in shipping strategies/partitioning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-128507574
  
    Yes, I think the automatic rebalance is good. My approach of throwing the exception was just the easiest way of dealing with the previously faulty behavior. I think people coming from storm are also used to just having operations that are executed even if you don't have a sink. So maybe we should keep that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-128405914
  
    +1 for rebalancing automatically between operators of different DOP. The batch API does the same. But it should really be "rebalance", not a form of forward that typically creates skew.
    
    I am somewhat indifferent to the sink-or-no-sink question. The batch API requires sinks strictly, and it makes total sense there. For streaming, it may be different.
    
    If we require sinks, we should have a simple function `sink()` or so, which marks the operation as sink, so we don't force people to implement a *discarding sink* or so unnecessarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by gyfora <gi...@git.apache.org>.
Github user gyfora commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-127926607
  
    If I understand correctly, this also this changes the semantics that we execute programs without sinks, and also topology branches which don't end in sinks. I personally don't like the fact that the each branch in the processing graph needs to end in a sink, it is rather artificial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by mbalassi <gi...@git.apache.org>.
Github user mbalassi commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-127940968
  
    I am not sure that I understand this correctly: If a non parallel source is used does the user need to call `rebalance` to use all parallel instances of the downstream operator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-132689802
  
    Manually merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-132622278
  
    I think this looks reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-132260208
  
    I think this is good now. I adapted the Streaming Tests to always use `StreamingMultipleProgramTestBase` when appropriate. The earlier problems where caused by some tests using `StreamingExecutionEnvironment.getExecutionEnvironment()` without being in a class that is derived from `StreamingMultipleProgramTestBase`. This caused those tests to pick up the used environment from another test when run on travis. (In an IDE the tests would just use a LocalEnvironment because no other environment would be set.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-128133833
  
    About rebalance()/forward(). Yes, when the parallelism differs it throws an exception now. Previously, when a user did not specify a partition strategy, forward was assumed. This was valid for a change of parallelism, which led to either the degenerative case of only one downstream instance receiving elements (1 to n parallelism) or one or several downstream instances receiving skewed numbers of instances (m to n, where m > n). 
    
    I think we can document forward as the default for n -> n parallelism and rebalance as default for n -> m parallelism and change the behavior.
    
    About the dangling operators, also true. I think before it was more an implementation artifact because the stream graph was basically being built form the sources. Now it is built from the sinks. I see that this can be good behavior and I can adapt the current code if we agree on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---