You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by greghogan <gi...@git.apache.org> on 2015/09/10 17:03:12 UTC

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/1115

    [FLINK-2653] Enable object reuse in MergeIterator

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 2653_mergeiterator_object_reuse

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1115
    
----
commit a9806678ed2dddb56b291282283cf5aea31eba87
Author: Greg Hogan <co...@greghogan.com>
Date:   2015-09-10T13:35:39Z

    [FLINK-2653] Enable object reuse in MergeIterator

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1115


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-139558118
  
    Stephan, while looking for the `UnionWithTempOperator` bug elsewhere, I noticed that `AllReduceDriver.run()`, `ReduceDriver.run()`, and `ReduceCombinerDriver.sortAndCombine()` locally create new objects before every call to `MutableObjectIterator.next(T)`. This looks to be double allocating new objects in the case that the `TypeSerializer` does not support object reuse.
    
    Also, `ReduceCombinerDriver.sortAndCombine.run()` is not switching on `objectReuseEnabled`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-142955571
  
    +1 to merge this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-142549737
  
    Looks good to me.
    
    The CI report seems to refer to a different commit, though, I assume that is due to rebasing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-144015985
  
    Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-142463454
  
    @StephanEwen, the requested changes are now in a second commit for this pull request. I'm not seeing any messages through JIRA. The only location I was unable to access the configuration (and therefore had to hard code `false`) was with the `DataSinkProcessor` for Tez.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-142556889
  
    I did not rebase.
      https://github.com/greghogan/flink/commits/2653_mergeiterator_object_reuse
    
    Travis CI did build and test for my repo when I pushed the updated branch.
      https://travis-ci.org/greghogan/flink/builds/81660560


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-139276623
  
    Thank, Greg, for this contribution, this is a good idea.
    
    I think we need to extend it a bit. We observed that the object reuse mode does not work with all types, especially certain user-defined types that are serialized with Kryo. This is due to problems with empty instantiations and object-to-object copies. For that reason, we made the non-reusing mode the default, and the reusing mode optional.
    
    All parts were one could not configure reuse/non-reuse were realized with non-reuse, to be on the safe side.
    
    To keep this safe, I would like to use by default the non-reusing mode and the reusing mode when object reuse has been activated. For that, the reuse flag would have to be passed to the `UnilateralSortMerger`, and interpreted in the intermediate merge (line 1588ff).
    
    If you could add this, it would be perfect and on the safe side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2653] Enable object reuse in MergeItera...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1115#issuecomment-141041498
  
    You are right, in those drivers, it is handled in the wrong way. Probably an artifact from the time before the `MultableObjectIterator` had both variants of the `next()` method. It used to have only the reusing variant.
    
    Clearly, this should be fixed.
    
    Here is the guide that I try to follow when working with the mutable objects:
    
      - All `MultableObjectIterator`s have two variants of the `next()` method - one for reuse, one without.
      - The variant without reuse it crucial, as not every situation can work with object reuse.
      - The variant with reuse is optional, but should be implemented where possible for performance.
      - The task drivers (AllReduceDriver, ...) or algorithms (sorter / hasher) know whether they want to work with reuse or not, and call the respective method in that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---