You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by greghogan <gi...@git.apache.org> on 2015/12/16 23:09:41 UTC

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/1464

    [FLINK-3166] [runtime] The first program in ObjectReuseITCase has the wrong expected result, and it succeeds

    - TestEnvironment now honors configuration of object reuse
    - Fixed reduce transformations to allow the user to modify and return either input
    
    This is neither clean nor complete but I wanted to get some feedback before continuing.
    
    ObjectReuseITCase was forcing object reuse, this was fixed by updating TestEnvironment.
    
    ReduceCombineDriver and ReduceDriver both requests the next record and call the user's reduce() and so can handle whichever of the two inputs objects the user might write to.
    
    In the second test case DataSourceTask (which requests the next record) calls ChainedAllReduceDriver.collect (which calls the user's reduce()). There is no feedback so DataSourceTask must cycle through `n`+1 reusable objects for unknown `n`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 3166_fix_objectreuseitcase

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1464
    
----
commit 78dc073440ed8ecfec51fbe33ce665562786bf20
Author: Greg Hogan <co...@greghogan.com>
Date:   2015-12-16T19:42:06Z

    [FLINK-3166] [runtime] The first program in ObjectReuseITCase has the wrong expected result, and it succeeds
    - TestEnvironment now honors configuration of object reuse
    - Fixed reduce transformations to allow the user to modify and return either input

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-167351562
  
    Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165516598
  
    ooohhh...that's a problem. alrighty, your solution for that looks good. actually, would you mind moving that into a separate PR so we can get it in faster? this should be fixed ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165501540
  
    "Object reuse is configured in JavaProgramTestBase but must be set in the ExecutionEnvironmentFactory by TestEnvironment. I don't see that any of the many tests subclassing JavaProgramTestBase were testing with object reuse enabled."
    
    Does this (L.112 JavaProgramTestBase)
    ```
    // prepare the test environment
    TestEnvironment env = new TestEnvironment(this.executor, this.parallelism);
    env.getConfig().enableObjectReuse();
    env.setAsContext();
    ```
    
    not enable (and as such force) object reuse for every test that extends JavaProgramTestBase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1464#discussion_r48017305
  
    --- Diff: flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/ObjectReuseITCase.java ---
    @@ -124,26 +125,30 @@ public static String runProgram(int progId, String resultPath) throws Exception
     				env.execute();
     
     				// return expected result
    -				return "a,100\n";
    +				return "a,60\n";
     
     			}
     
     			case 2: {
    +				// Global reduce
     
     				final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    -				env.getConfig().enableObjectReuse();
     
     				DataSet<Tuple2<String, Integer>> input = env.readCsvFile(inReducePath).types(String.class, Integer.class).setParallelism(1);
     
    -				DataSet<Tuple2<String, Integer>> result = input
    -						.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
    +				DataSet<Tuple2<String, Integer>> result = input.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
     
     							@Override
     							public Tuple2<String, Integer> reduce(
     									Tuple2<String, Integer> value1,
     									Tuple2<String, Integer> value2) throws Exception {
    -								value2.f1 += value1.f1;
    -								return value2;
    +								if (value1.f1 % 2 == 0) {
    --- End diff --
    
    How about changing this to `% 3`, so that it does not "accidentally" work by perfectly following the rotation of the "reuse1" and "reuse2" variables in the driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165506005
  
    TestEnvironment was only setting the parallelism when creating and establishing the ExecutionEnvironmentFactory:
    
    ```
    	public void setAsContext() {
    		ExecutionEnvironmentFactory factory = new ExecutionEnvironmentFactory() {
    			@Override
    			public ExecutionEnvironment createExecutionEnvironment() {
    				lastEnv = new TestEnvironment(executor, getParallelism());
    				return lastEnv;
    			}
    		};
    
    		initializeContextEnvironment(factory);
    	}
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165406987
  
    A few comments:
    * the change to the TestEnvironment is unnecessary; ObjectReuse is enabled/disabled in the JavaProgramTestBase. it doesn't fix anything, removing the call to enabledObjectReuse in the test already did the trick.
    * The ObjectReuseITCase is supposed to verify that objects are reused. Now, the results are the same regardless of whether they are reused (except in case 3), making it little more then a reduce test. (There is also little point of running this test with ObjectReuse disabled, which is why it forced it)
    * A little bit above your changes in Reduce(Combine)Driver is a comment as to why we only need 2 objects when reusing objects, this will need to be adjusted.
    * I like the changes in DataSourceTask/Reduce(Combine)Driver. They are a seemingly simple solution for a common problem. In fact, But i am a bit apprehensive about them, because i have a hard time imagining that no one else could think of them; there might be a bigger issue with changes that i can't see at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165753852
  
    Good catch. I think all of these changes look actually good.
    
    Minor comment in-line, otherwise +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3166] [runtime] The first program in Ob...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1464#issuecomment-165499126
  
    I haven't found a prohibition on returning the second input to a user-defined reduce function. I think fixing this in code is preferable to documenting the problem or throwing an error, especially since the problem only exists with object reuse enabled.
    
    Object reuse is configured in JavaProgramTestBase but must be set in the ExecutionEnvironmentFactory by TestEnvironment. I don't see that any of the many tests subclassing JavaProgramTestBase were testing with object reuse enabled.
    
    It is rather difficult to test for object reuse. ObjectReuseITCase simply captured a few isolated instances of strange behavior (with the original tests 3 and 4 being perfect copies). I would look to duplicate the first two tests in the appropriate JavaProgramTestBase subclass and delete ObjectReuseITCase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---