You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by StephanEwen <gi...@git.apache.org> on 2014/10/02 11:18:54 UTC

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/incubator-flink/pull/139

    [FLINK-1110] Add collection-based execution mode to flink APIs

    This modes executes programs straightforwardly as collections, with minimal resource footprint and virtually no latency (such as starting threads, RPC services, actor systems, ...)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink collections

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #139
    
----
commit 51f0d287172cdec848f267ee75213192161bd89b
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T13:12:58Z

    [FLINK-1110] Framework for collection-based execution

commit 7ac4501a8b8ba9846a54d85d6ee7a9635496c0e4
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2014-09-22T15:47:19Z

    [FLINK-1110] Add Collection-Based execution for Reduce Operators
    
    Also fix some bugs resulting from moving stuff between packages

commit 916f6f003c43ad2d71d55c4b60fd4dc500a8b544
Author: Ufuk Celebi <uc...@apache.org>
Date:   2014-09-22T15:54:34Z

    [FLINK-1110] Add execution on collections for flatMap

commit a8ba39348e8263c7579f3239e2753b42c38b6e60
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T16:12:58Z

    [FLINK-1110] Fix MapOperator execution and added simple test

commit 79ffa693ed814e5ba83bfdff3e99f6900432dd2b
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T17:18:33Z

    [FLINK-1110] Implement collection-based Cross

commit f5d3b02167c081af7e2e23de97f8678ac8dc443d
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T17:18:54Z

    [FLINK-1110] Implement broadcast variables for collection-based execution

commit c997e65f0c793cb706e63c6cc63bef1eec5b5e4e
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T21:23:17Z

    [FLINK-1110] Implement collection-based execution for mapPartition.
    
    Make groupReduce code compliant with pre-java-8 versions, fix java8 tests with moved type information classes.
    
    Fix Various warnings.

commit e39963a912d0d29d0c572a1554495939681b6cc9
Author: Till Rohrmann <tr...@apache.org>
Date:   2014-09-22T15:30:14Z

    [FLINK-1110] Started implementing the JoinOperatorBase.
    
    Implemented JoinOperatorBase and test cases.

commit 9ad87ea5d3c14bf486f6e85af64fb0c191190570
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-22T23:50:05Z

    [FLINK-1110] Implement collection-based execution for bulk iterations

commit 97b5da1b14e91ef0c47ab4cb838ab51fe8544a39
Author: Ufuk Celebi <uc...@apache.org>
Date:   2014-09-23T12:48:07Z

    [FLINK-1110] Implement collection-based execution for coGroup

commit 1873df6633017370bc7e45619f142abe0728b835
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2014-09-23T13:16:33Z

    [FLINK-1110] Add createCollectionEnvironment for Scala

commit b13cb2fdab7a9c9caf4d4cba780dcc2d4ab14e4d
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-23T21:47:59Z

    [FLINK-1110] Adjusted test base to run programs both with local executor and collection executor

commit e8f5f8d1728c5bdaff0d34d2770c9aa3937e6165
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2014-09-26T14:37:00Z

    [FLINK-1110] Implement Collection-Based Execution for Delta Iterations

commit 9a9e95511d26c10e32e9693089478739e158f091
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-30T19:32:54Z

    [FLINK-1110] By default, collection-based execution behaves mutable-object safe.

commit 485272c3fabeb9dc1c84d77782fdb9d9c620a18c
Author: Stephan Ewen <se...@apache.org>
Date:   2014-09-30T21:14:53Z

    [FLINK-1110] Adjust tests and fix various issues in the collection-based execution.

commit 3fef9e502ae9fc3ecd2a5bf28f7532a4449988ec
Author: Ufuk Celebi <uc...@apache.org>
Date:   2014-10-01T09:26:40Z

    [FLINK-1110] Fix IndexOutOfBoundsException in ListKeyGroupedIterator
    
    Occured when calling nextKey() and iterator not consumed with *last* key group.

commit a2784f6740a7b03d837138bd18955b07f9ed37c6
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2014-10-01T15:43:38Z

    [FLINK-1110] Fix Reduce and GroupReduce Test Failures

commit 4bc5fb9c8a46f05f848771263a5d78b882ca1e13
Author: Stephan Ewen <se...@apache.org>
Date:   2014-10-01T18:11:52Z

    [FLINK-1110] Fix mutable object safe mode for data sources in collection-based execution

commit fc19fb3fa3e2086d046cc455b4428f36c72ffbb9
Author: Stephan Ewen <se...@apache.org>
Date:   2014-10-01T19:42:12Z

    [FLINK-1110] Add iteration aggregators to collection based execution

commit 0a405541939895906e4a18eae67bdc07e7d7a2df
Author: Stephan Ewen <se...@apache.org>
Date:   2014-10-01T23:54:26Z

    [FLINK-1110] Fix HadoopInputFormat t work without prior serialization.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/139#issuecomment-57937260
  
    Manually merged in 65bf092da77ca0d416a3abbcac21b641cf038101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen closed the pull request at:

    https://github.com/apache/incubator-flink/pull/139


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/139#discussion_r18523098
  
    --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/example/java/environments/CollectionExecutionExample.java ---
    @@ -0,0 +1,48 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.example.java.environments;
    +
    +import org.apache.flink.api.common.functions.MapFunction;
    +import org.apache.flink.api.java.CollectionEnvironment;
    +
    +/**
    + * This example shows how to use Flink's collection execution functionality.
    + * Collection-based execution is an extremely lightweight, non-parallel way to
    + * execute programs on small data: The programs are s
    + * 
    + * Because this method of execution spawns no background threads, managed memory, 
    + * coordinator, or parallel worker, it has a minimal execution footprint. 
    + */
    +public class CollectionExecutionExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		
    +		CollectionEnvironment env = new CollectionEnvironment();
    +		
    +		env.fromElements("A", "B", "C", "D")
    +			.map(new MapFunction<String, String>() {
    +				public String map(String value) {
    +					return value + " " + 1;
    +				};
    +			})
    +		.print();
    --- End diff --
    
    You can use the `LocalCollectorOutputFormat` as a workaround.
    
    We should start adding such "distributed-to-local" operations to the API in the next version. The runtime is adding functionality for that.
    With the blob-manager in place, we should be able to do this nicely and robustly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/139#discussion_r18518034
  
    --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/example/java/environments/CollectionExecutionExample.java ---
    @@ -0,0 +1,48 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.example.java.environments;
    +
    +import org.apache.flink.api.common.functions.MapFunction;
    +import org.apache.flink.api.java.CollectionEnvironment;
    +
    +/**
    + * This example shows how to use Flink's collection execution functionality.
    + * Collection-based execution is an extremely lightweight, non-parallel way to
    + * execute programs on small data: The programs are s
    + * 
    + * Because this method of execution spawns no background threads, managed memory, 
    + * coordinator, or parallel worker, it has a minimal execution footprint. 
    + */
    +public class CollectionExecutionExample {
    +
    +	public static void main(String[] args) throws Exception {
    +		
    +		CollectionEnvironment env = new CollectionEnvironment();
    +		
    +		env.fromElements("A", "B", "C", "D")
    +			.map(new MapFunction<String, String>() {
    +				public String map(String value) {
    +					return value + " " + 1;
    +				};
    +			})
    +		.print();
    --- End diff --
    
    Is there a facility to access the results? (a collection from a `DataSet`) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1110] Add collection-based ex...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/incubator-flink/pull/139#issuecomment-57782459
  
    Good to merge. All operator tests are working.
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---