You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/07 17:12:45 UTC

[jira] [Commented] (FLINK-2631) StreamFold operator does not respect returns type and stores non serializable values

    [ https://issues.apache.org/jira/browse/FLINK-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733825#comment-14733825 ] 

ASF GitHub Bot commented on FLINK-2631:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1101

    [FLINK-2631] [streaming] Fixes the StreamFold operator and adds output type configurable stream operators

    Adds support for non-serializable initial fold values by storing the value in a byte array before shipping it. The shipped initial fold value is deserialized on the TM while calling the `open` method.
    
    Furthermore, this PR introduces the `OutputTypeConfigurable` interface which allows stream operators to get to know their output type. The `OutputTypeConfigurable` interface offers the method `setOutputType` which is called by the `StreamGraph` when the `StreamOperator` is added in the `addOperator` method. At the latest at this moment, the concrete output type, whether inferred from the UDF or set manually with `returns`, should be know to the system, because also the input and output type serializers for the vertex are created in the `addOperator` method. All stream operators which need to know their output type should implement the `OutputTypeConfigurable` interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixStreamingFold

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1101
    
----
commit 63951adca0e8bfefd1d81b933017e9fadc5f556f
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-09-07T09:34:48Z

    [FLINK-2631] [streaming] Fixes the StreamFold operator. Adds OutputTypeConfigurable interface to support type injection at StreamGraph creation.
    
    Adds test for non serializable fold type. Adds test to verify proper output type forwarding for OutputTypeConfigurable implementations.
    
    Adds comments

----


> StreamFold operator does not respect returns type and stores non serializable values
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-2631
>                 URL: https://issues.apache.org/jira/browse/FLINK-2631
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>
> The {{StreamFold}} operator stores the initial value of the fold operation for the task deployment. This value does not necessarily have to be serializable. Thus, using the fold operation with a non-serializable initial value will fail the job.
> Moreover, the {{StreamFold}} operator needs to know the output type in order to create a {{TypeSerializer}}. For {{StreamGraphs}} where the output type is not know when the operator is created, as it is the case for the Scala DataStream API which directly sets the output type after creating the operator via the {{returns}} method, this approach will fail. The reason is that the {{StreamFold}} operator does receive the type information set by the {{returns}} method. Therefore, the job will fail at runtime because the operator tries to create a serializer from a {{MissingTypeInformation}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)