You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by mariemayadi <gi...@git.apache.org> on 2014/07/24 16:21:01 UTC

[GitHub] incubator-flink pull request: Add Pi approximation Java example

GitHub user mariemayadi opened a pull request:

    https://github.com/apache/incubator-flink/pull/78

    Add Pi approximation Java example

    Hadoop and Spark have a Pi approximation basic example (similarly to WordCount, PageRank ...), so for homogeneity it is worth having it in Flink as well.
    
    Note:
    The final result, count, which in this case is the approximation of Pi had to be assigned the type DataSet<Double> instead of just Double (due to Flink limitation)
    So aside from executing a summation, the reduce (at the collector level) also does a little math operation, which could have not been applied on type DataSet.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mariemayadi/incubator-flink JavaPiExample

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/78.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #78
    
----
commit f7d14a40750251d8c07f88c441aaf4e992662dc8
Author: Mariem Ayadi <ma...@smith.edu>
Date:   2014-07-24T12:32:01Z

    Add Pi approximation Java example

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-52603018
  
    I think you can even further simplify it and
    
      - Either drop the mapper that converts the numbers to 1, do the filter directly and then count, or
      - Do the mapper that uses the filter's logic to convert the value to either 0 or 1, depending on the "within the unit circle" condition. Then simply sum.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-flink/pull/78


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-52229213
  
    The example uses a fairly inefficient way to create the points; It creates a list (serial data source) which also may easily become huge in the heap spaces.
    
    Why not use `env.generateSequence(from to)`. This function needs no memory for the data source, and execures in parallel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by mariemayadi <gi...@git.apache.org>.
Github user mariemayadi commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-50126545
  
    I really like the filter suggestion. It is indeed a more suitable fit for the records emission.
    +1 for ReduceFunction Vs GroupReduceFunction. Fair point. (I was trying to avoid having to use an additional MapFunction :)
    
    Will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by mariemayadi <gi...@git.apache.org>.
Github user mariemayadi commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-52554396
  
    Much better that way. Thanks for the feed-back.
    Just pushed the changes.
    
    
    On Thu, Aug 14, 2014 at 9:10 PM, Stephan Ewen <no...@github.com>
    wrote:
    
    > It also allows you to drop the workaround with the blocks and simply work
    > with the default parallelism, further simplifying the example.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/incubator-flink/pull/78#issuecomment-52229325>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-50073451
  
    Nice one!
    I'd prefer to use a ReduceFunction instead of a GroupReduceFunction because it is automatically combinable (which is usually important if you do a Reduce without groupBy) and shorter.
    However, you would need to add another MapFunction for the final division.
    
    You could also replace the first MapFunction by a FilterFunction and emit only a record if the test is passed. This would mean that no records with 0-valued ints are emitted and therefore the summing becomes cheaper.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-52229325
  
    It also allows you to drop the workaround with the blocks and simply work with the default parallelism, further simplifying the example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/78#discussion_r15434359
  
    --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/example/java/pi/PiEstimation.java ---
    @@ -0,0 +1,127 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.example.java.pi;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.flink.api.java.DataSet;
    +import org.apache.flink.api.java.ExecutionEnvironment;
    +import org.apache.flink.api.java.functions.*;
    +
    +
    +/** 
    + * Estimates the value of Pi using the Monte Carlo method.
    + * The area of a circle is Pi * R^2, R being the radius of the circle 
    + * The area of a square is 4 * R^2, where the length of the square's edge is 2*R.
    + * 
    + * Thus Pi = 4 * (area of circle / area of square).
    + * 
    + * The idea is to find a way to estimate the circle to square area ratio.
    + * The Monte Carlo method suggests collecting random points (within the square)
    + * ```
    + * x = Math.random() * 2 - 1
    + * y = Math.random() * 2 - 1
    + * ```
    + * then counting the number of points that fall within the circle 
    + * ```
    + * x * x + y * y < 1
    + * ```
    + */
    +public class PiEstimation {
    +	
    +	static int n;
    +	
    +	public static void main(String[] args) throws Exception {
    +		
    +	  	int blocks = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
    +	  	n = 100000 * blocks;
    +	  	List<Integer> l = new ArrayList<Integer>(n);
    +	  	for (int i = 0; i < n; i++) {
    +	  		l.add(1);
    +	  	}
    +	  	
    +	  	//Sets up the execution environment
    +	  	final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +	  	DataSet<Integer> dataSet = env.fromCollection(l);
    +
    +	  	
    +	  	DataSet<Double> count = dataSet
    +	  			.filter(new PiFilter())
    +	  			.setParallelism(blocks)
    +	  			.reduce(new PiReducer())
    +	  			.map(new PiMapper());
    +	  			
    +	  	System.out.println("We estimate Pi to be:");
    +	  	count.print();
    +	  	
    +	  	env.execute();
    +  }
    +  
    +  
    +  //*************************************************************************
    +  //     USER FUNCTIONS
    +  //*************************************************************************
    +	
    +	// FilterFunction that filters out all Integers smaller than zero.
    +	
    +	/** 
    +	 * PiFilter randomly emits points that fall within a square of edge 2*x = 2*y = 2.
    +	 * It calculates the distance to the center of a virtually centered circle of radius x = y = 1
    +	 * If the distance is less than 1, then and only then does it return a value (in this case 1, a list's value)
    +	 */
    +	public static class PiFilter extends FilterFunction<Integer> {
    +		private static final long serialVersionUID = 1L;
    +
    +		@Override
    +		public boolean filter(Integer value) throws Exception{
    +			double x = Math.random() * 2 - 1;
    +			double y = Math.random() * 2 - 1;
    +			return (x * x + y * y) < 1;
    +		}
    +	}
    +
    +	
    +	/** 
    +	 * PiReducer takes over the filter. It goes through the selected 1s and returns the sum.
    +	 */
    +	public static final class PiReducer extends ReduceFunction<Integer>{
    +		private static final long serialVersionUID = 1L;
    +
    +		@Override
    +		public Integer reduce(Integer value1, Integer value2) throws Exception {
    +			return value1 + value2;
    +		}
    +	}
    +	
    +	
    +	/** 
    +	 * The PiMapper's role is to apply one final operation on the count thus returning the estimated Pi value.
    +	 */
    +	public static final class PiMapper extends MapFunction<Integer,Double> {
    +		private static final long serialVersionUID = 1L;
    +
    +		@Override
    +		public Double map(Integer intSum) throws Exception {
    +			return intSum*4.0 / n;
    --- End diff --
    
    I think `n` will be 0 if you run the Job in a distributed environment on a cluster.
    You'll probably have to pass it either as a configuration value or via the constructor of the class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Add Pi approximation Java example

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/78#issuecomment-50237717
  
    Thank you for your pull request!
    
    Travis indicates that your code is not compliant to our coding guidelines (http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/coding_guidelines.html): https://s3.amazonaws.com/archive.travis-ci.org/jobs/30827003/log.txt
    The problems are listed here
    ``` 
    [INFO] There are 18 checkstyle errors.
    [ERROR] PiEstimation.java[26:n/a] Using the '.*' form of import should be avoided - org.apache.flink.api.java.functions.*.
    [ERROR] PiEstimation.java[53:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[54:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[55:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[56:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[57:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[58:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[61:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[62:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[65:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[66:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[67:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[68:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[69:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[71:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[72:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[74:n/a] Line has leading space characters; indentation should be performed with tabs only.
    [ERROR] PiEstimation.java[75:n/a] Line has leading space characters; indentation should be performed with tabs only.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---