You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2015/09/07 17:12:45 UTC

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1101

    [FLINK-2631] [streaming] Fixes the StreamFold operator and adds output type configurable stream operators

    Adds support for non-serializable initial fold values by storing the value in a byte array before shipping it. The shipped initial fold value is deserialized on the TM while calling the `open` method.
    
    Furthermore, this PR introduces the `OutputTypeConfigurable` interface which allows stream operators to get to know their output type. The `OutputTypeConfigurable` interface offers the method `setOutputType` which is called by the `StreamGraph` when the `StreamOperator` is added in the `addOperator` method. At the latest at this moment, the concrete output type, whether inferred from the UDF or set manually with `returns`, should be know to the system, because also the input and output type serializers for the vertex are created in the `addOperator` method. All stream operators which need to know their output type should implement the `OutputTypeConfigurable` interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixStreamingFold

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1101
    
----
commit 63951adca0e8bfefd1d81b933017e9fadc5f556f
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-09-07T09:34:48Z

    [FLINK-2631] [streaming] Fixes the StreamFold operator. Adds OutputTypeConfigurable interface to support type injection at StreamGraph creation.
    
    Adds test for non serializable fold type. Adds test to verify proper output type forwarding for OutputTypeConfigurable implementations.
    
    Adds comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-139178919
  
    +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-138521208
  
    The `StreamGraph.addOperator` method takes a `StreamOperator` object as input. Thus, it should also work with `TwoInputStreamOperator`, because it is a subclass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1101#discussion_r38903676
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/operators/OutputTypeConfigurable.java ---
    @@ -0,0 +1,42 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.api.operators;
    +
    +import org.apache.flink.api.common.ExecutionConfig;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +
    +/**
    + * Stream operators can implement this interface if they need access to the output type information
    + * at {@link org.apache.flink.streaming.api.graph.StreamGraph} generation. This can be useful for
    + * cases where the output type is specified by the returns method and, thus, after the stream
    + * operator has been created.
    + */
    +public interface OutputTypeConfigurable {
    +
    +	/**
    +	 * Is called by the {@link org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer, StreamOperator, TypeInformation, TypeInformation, String)}
    +	 * method when the {@link org.apache.flink.streaming.api.graph.StreamGraph} is generated. The
    +	 * method is called with the output {@link TypeInformation} which is also used for the
    +	 * {@link org.apache.flink.streaming.runtime.tasks.StreamTask} output serializer.
    +	 *
    +	 * @param outTypeInfo Output type information of the {@link org.apache.flink.streaming.runtime.tasks.StreamTask}
    +	 * @param executionConfig Execution configuration
    +	 */
    +	void setOutputType(TypeInformation<?> outTypeInfo, ExecutionConfig executionConfig);
    --- End diff --
    
    Why don't we type the interface, so that we don't need casts in `setOutputType` implementations at the operator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-139194304
  
    LGTM, will merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-138522040
  
    Sorry, my bad. Your concern is completely right. Will fix it as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-138487240
  
    Does the `OutputTypeConfigurable` also work for `TwoInputOperator`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1101#issuecomment-138568165
  
    Thanks for reviewing @aljoscha. I addressed your comments and pushed an update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2631] [streaming] Fixes the StreamFold ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1101#discussion_r38913819
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/operators/OutputTypeConfigurable.java ---
    @@ -0,0 +1,42 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.api.operators;
    +
    +import org.apache.flink.api.common.ExecutionConfig;
    +import org.apache.flink.api.common.typeinfo.TypeInformation;
    +
    +/**
    + * Stream operators can implement this interface if they need access to the output type information
    + * at {@link org.apache.flink.streaming.api.graph.StreamGraph} generation. This can be useful for
    + * cases where the output type is specified by the returns method and, thus, after the stream
    + * operator has been created.
    + */
    +public interface OutputTypeConfigurable {
    +
    +	/**
    +	 * Is called by the {@link org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer, StreamOperator, TypeInformation, TypeInformation, String)}
    +	 * method when the {@link org.apache.flink.streaming.api.graph.StreamGraph} is generated. The
    +	 * method is called with the output {@link TypeInformation} which is also used for the
    +	 * {@link org.apache.flink.streaming.runtime.tasks.StreamTask} output serializer.
    +	 *
    +	 * @param outTypeInfo Output type information of the {@link org.apache.flink.streaming.runtime.tasks.StreamTask}
    +	 * @param executionConfig Execution configuration
    +	 */
    +	void setOutputType(TypeInformation<?> outTypeInfo, ExecutionConfig executionConfig);
    --- End diff --
    
    Good point. Will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---