You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by ktzoumas <gi...@git.apache.org> on 2014/07/31 21:28:07 UTC

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

GitHub user ktzoumas opened a pull request:

    https://github.com/apache/incubator-flink/pull/87

    Reflect recent changes to Java API documentation

    Adjust documentation to reflect Java API changes (interfaces instead of classes)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ktzoumas/incubator-flink java_api_docs_changes_sam_interfaces

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/87.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #87
    
----
commit ac88f029b5e25b3692242b6cbbd624b4e0f79e3d
Author: Kostas Tzoumas <ko...@gmail.com>
Date:   2014-07-31T19:21:35Z

    Reflect recent changes to Java API documentation
    
    change text to String

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/87#discussion_r15665104
  
    --- Diff: docs/java_api_transformations.md ---
    @@ -431,6 +435,30 @@ DataSet<Tuple2<String, Double>>
                        .with(new PointWeighter());
     ```
     
    +#### Join with FlatJoinFunction
    +
    +Analogous to Map and FlatMap, a FlatJoin function behaves in the same
    +way as a JoinFunction, but instead of returning one element, it can
    +return (collect), zero, one, or more elements
    +
    +{% highlight java %}
    +public class PointWeighter
    +         implements FlatJoinFunction<Rating, Tuple2<String, Double>, Tuple2<String, Double>> {
    +
    +  @Override
    +  public void join(Rating rating, Tuple2<String, Double> weight,
    +
    +	  Collector<Tuple2<String, Double>> out) {
    +	if (weight.f1 > 0.1)
    +		out.collect(new Tuple2<String, Double>(rating.name, rating.points * weight.f1));
    --- End diff --
    
    we enforce inside the main code to always use brackets around statements. (I think its the famous `goto fail;`).
    I think its better to have them in the documentation as well (to educate users that might become committers)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/87#issuecomment-51313702
  
    I squashed to two commits together and merged it to master and release-0.6: https://git-wip-us.apache.org/repos/asf?p=incubator-flink.git;a=commit;h=e8a0857b


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/87#discussion_r15664989
  
    --- Diff: docs/java_api_transformations.md ---
    @@ -431,6 +435,30 @@ DataSet<Tuple2<String, Double>>
                        .with(new PointWeighter());
     ```
     
    +#### Join with FlatJoinFunction
    +
    +Analogous to Map and FlatMap, a FlatJoin function behaves in the same
    +way as a JoinFunction, but instead of returning one element, it can
    +return (collect), zero, one, or more elements
    +
    +{% highlight java %}
    +public class PointWeighter
    +         implements FlatJoinFunction<Rating, Tuple2<String, Double>, Tuple2<String, Double>> {
    +
    +  @Override
    +  public void join(Rating rating, Tuple2<String, Double> weight,
    +
    +	  Collector<Tuple2<String, Double>> out) {
    --- End diff --
    
    This empty line is a bit ugly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/87#discussion_r15664277
  
    --- Diff: docs/java_api_guide.md ---
    @@ -427,6 +403,88 @@ DataSet<Tuple2<String, Integer>> out = in.project(2,0).types(String.class, Integ
     Defining Keys
     -------------
     
    +One transformation (join, coGroup) require that a key is defined on
    +its argument DataSets, and other transformations (Reduce, GroupReduce,
    +Aggregate) allow that the DataSet is grouped on a key before they are
    +applied.
    +
    +A DataSet is grouped as
    +{% highlight java %}
    +DataSet<...> input = // [...]
    +DataSet<...> reduced = input
    +	.groupBy(/*define key here*/)
    +	.reduceGroup(/*do something*/);
    +{% endhighlight %}
    +
    +The data model of Flink is not based on key-value pairs. Therefore,
    +you do not need to physically pack the data set types into keys and
    +values. Keys are "virtual": they are defined as functions over the
    +actual data to guide the grouping operator.
    +
    +The simplest case is grouping a data set of Tuples on one or more
    +fields of the Tuple:
    +{% highlight java %}
    +DataSet<Tuple3<Integer,String,Long>> input = // [...]
    +DataSet<Tuple3<Integer,String,Long> grouped = input
    +	.groupBy(1)
    --- End diff --
    
    Aren't the fields 0-indexed? So `groupBy(1)` is the second field (String) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/87#discussion_r15678326
  
    --- Diff: docs/java_api_guide.md ---
    @@ -435,6 +493,85 @@ Defining Keys
     Functions
     ---------
     
    +You can define a user-defined function and pass it to the DataSet
    +transformations in several ways:
    +
    +#### Implementing an interface
    +
    +The most basic way is to implement one of the provided interfaces:
    +
    +{% highlight java %}
    +class MyMapFunction implements MapFunction<String, Integer> {
    +  public Integer map(String value) { return Integer.parseInt(value); }
    +});
    +data.map (new MyMapFunction());
    +{% endhighlight %}
    +
    +#### Anonymous classes
    +
    +You can pass a function as an anonmymous class:
    +{% highlight java %}
    +data.map(new MapFunction<String, Integer> () {
    +  public Integer map(String value) { return Integer.parseInt(value); }
    +});
    +{% endhighlight %}
    +
    +#### Java 8 Lambdas
    +
    +***Warning: Lambdas are currently only supported for filter and reduce
    +   transformations***
    +
    +{% highlight java %}
    +DataSet<String> data = // [...]
    +data.filter(s -> s.startsWith("http://"));
    +{% endhighlight %}
    +
    +{% highlight java %}
    +DataSet<Integer> data = // [...]
    +data.reduce((i1,i2) -> i1 + i2);
    +{% endhighlight %}
    +
    +#### Rich functions
    +
    +All transformations that take as argument a user-defined function can
    +instead take as argument a *rich* function. For example, instead of
    +{% highlight java %}
    +class MyMapFunction implements MapFunction<String, Integer> {
    +  public Integer map(String value) { return Integer.parseInt(value); }
    +});
    +{% endhighlight %}
    +you can write
    +{% highlight java %}
    +class MyMapFunction extends RichMapFunction<String, Integer> {
    +  public Integer map(String value) { return Integer.parseInt(value); }
    +});
    +{% endhighlight %}
    +and pass the function as usual to a `map` transformation:
    +{% highlight java %}
    +data.map(new MyMapFunction());
    +{% endhighlight %}
    +
    +Rich functions can also be defined as an anonymous class:
    +{% highlight java %}
    +data.map (new RichMapFunction<String, Integer>() {
    +  public Integer map(String value) { return Integer.parseInt(value); }
    +});
    +{% endhighlight %}
    +
    +Rich functions provide, in addition to the user-defined function (map,
    +reduce, etc), four methods: `open`, `close`, `getRuntimeContext`,
    +`getIterationRuntimeContext`, and `setRuntimeContext`. These are
    --- End diff --
    
    I would skip the `getIterationRuntimeContext()` method. Then it becomes actually four methods, as mentioned in the text.
    
    It would be nice to regular RuntimeContext and the IterationRuntimeContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: Reflect recent changes to Java API d...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-flink/pull/87


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---