You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by greghogan <gi...@git.apache.org> on 2016/02/09 22:02:15 UTC

[GitHub] flink pull request: [FLINK-3335] [runtime] Fix DataSourceTask obje...

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/1616

    [FLINK-3335] [runtime] Fix DataSourceTask object reuse when disabled

    When object reuse is disabled, `DataSourceTask` now copies objects received from the `InputFormat` to prevent the collection of reused objects.
    
    An example where this is necessary is a `DataSet` created from a user implementation of `Iterator` which reuses a local object returned from `Iterator.next`.
    
    Also, when object reuse is enabled, the cycling among three objects has been removed. I had added this a few months ago when starting to resolve an issue with reduce drivers.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 3335_fix_datasourcetask_object_reuse_when_disabled

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1616
    
----
commit 2678b9315a28ce27d888c7be53e5cce13b1afb35
Author: Greg Hogan <co...@greghogan.com>
Date:   2016-02-09T13:18:28Z

    [FLINK-3335] [runtime] Fix DataSourceTask object reuse when disabled
    
    When object reuse is disabled, DataSourceTask now copies objects received from
    the InputFormat to prevent the collection of reused objects.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3335] [runtime] Fix DataSourceTask obje...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1616#issuecomment-182390682
  
    I am very hesitant to introduce an "always copy" step in the data sources.
    Some data sources create Avro or Thrift types which are incredibly expensive to copy.
    
    I personally would prefer to accept that inconsistency at this point and make sure it is documented well (that source implementers should not emit the same internal elements multiple times).
    
    A user that wants to optimize the data source can actually check the object reuse config flag (the ExecutionConfig can be obtained from the RuntimeContext) and depending on that flag reuse objects differently in the code.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3335] [runtime] Fix DataSourceTask obje...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan closed the pull request at:

    https://github.com/apache/flink/pull/1616


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3335] [runtime] Fix DataSourceTask obje...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1616#issuecomment-187240850
  
    Considering this and other recent discussions I agree that introducing a high-level copy would be detrimental and that there are better solutions. I would still like to include the other cleanup but will do so under a new ticket so that we can close this idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---