You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/07/08 12:26:05 UTC

[jira] [Commented] (FLINK-1005) Add different mutable-object modes to runtime

    [ https://issues.apache.org/jira/browse/FLINK-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054804#comment-14054804 ] 

ASF GitHub Bot commented on FLINK-1005:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/incubator-flink/pull/66

    [FLINK-1005] Add immutable object mode utils and enable it for GroupReduce

    This pull request adds the immutable object mode basics and implements it for the GroupReduce. This allows user code to keep references to values, without problems that the contents gets overwritten for mutable types.
    
    I vote to make this the default mode in future versions.
    
    Code like this used to give unexpected results in the past, because of heavy object reuse in the runtime. With *immutable object mode*, it now gives expected results.
    
    ```
    List<Tuple2<StringValue, IntValue>> all = new ArrayList<Tuple2<StringValue,IntValue>>();
    
    while (values.hasNext()) {
        all.add(values.next());
    }
    
    Tuple2<StringValue, IntValue> result = all.get(0);
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink mutable_immutable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/66.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #66
    
----
commit 0f5c049cc3f91530ca36816d9742c2a04234989f
Author: Stephan Ewen <se...@apache.org>
Date:   2014-07-07T17:39:24Z

    [FLINK-1005] Extend TypeSerializer interface to handle non-mutable object deserialization more efficiently.

commit 618e0b3d7c72b67a555a1e8db6925a7d5d0b4c92
Author: Stephan Ewen <se...@apache.org>
Date:   2014-07-08T09:32:09Z

    [FLINK-1005] Add non-object reusing variants of key-grouped iterator.
    
    Clean minor javadoc errors.

commit 9703593a4b35b148489e840875b46b45e26bf966
Author: Stephan Ewen <se...@apache.org>
Date:   2014-07-08T10:19:43Z

    [FLINK-1005] Make GroupReduce configurable to use either mutable or immutable object mode

----


> Add different mutable-object modes to runtime
> ---------------------------------------------
>
>                 Key: FLINK-1005
>                 URL: https://issues.apache.org/jira/browse/FLINK-1005
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 0.6-incubating
>
>
> Currently, the runtime works strictly with mutable objects. That means that as few objects as possible (typically one or two) are reused for the data records all the time. Objects are cloned/restored, though, at various places to ensure that the contents is fresh at every call.
> The rational behind this was to reduce pressure on the garbage collector. In fact, you can run programs where no garbage collection happens (if the UDFs are written to reuse objects as well).
> It can, however, lead to bugs in not-carefully written user code.
> I propose to add two modes to the runtime:
>   - No-object-reuse (default) mode. New objects for every record. Safe but potentially slower.
>   - Object-reusing mode - All objects are reused, without backup copies.. The UDFs must be careful to not keep any objects as state or not to modify the objects,



--
This message was sent by Atlassian JIRA
(v6.2#6252)