You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by StephanEwen <gi...@git.apache.org> on 2014/07/29 17:52:07 UTC

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/incubator-flink/pull/84

    [FLINK-1023] Switch group-at-a-time function to Iterable (from Iterator)

    This patch allows the *GroupReduce* and the *CoGroup* to use the beautiful *foreach* loop syntax.
    
    ```java
    public void reduce(Iterable<Long> values, Collector<Long> out)
        long sum = 0L;
        
        for (Long num : values) {
            sum += num;
        }
    }
    ```
    
    Since the data behind the iterable is transient, you cannot iterate over it multiple times. The next time you request an iterator, it will pick up where the previous left off, potentially returning an empty iterator:
    
    ```java
    public void reduce(Iterable<Long> values, Collector<Long> out)
        for (Long num : values) {
            // do something
        }
    
        for (Long num : values) {
            // empty loop, will never be entered
        }
    }
    ```
    
    Some functions need iterator behavior, which can still be used 
    ```java
    public void coGroup(Iterable<Long> values1, Iterable<Long> values2, Collector<Long> out)
        if (values2.iterator().hasNext()) {
            // do something
        }
    }
    
    public void coGroup(Iterable<Long> values1, Iterable<Long> values2, Collector<Long> out)
        Iterator<Long> iter = values2.iterator();
        if (values2.iterator().hasNext()) {
            do {
                // something
            } while (iter.hasNext());
        }
    }
    ```
    
    ----
    
    Note:
    
    The *Iterable* and the *Iterator* are currently strictly in sync. That means one can pick up where the other left and vice versa. While I would not encourage it to use this pattern, you can actually mix then and do something like the following (which is equivalent to the first example):
    ```java
    public void reduce(Iterable<Long> values, Collector<Long> out)
        Long sum = values.iterator().next();
        
        for (Long num : values) {
            sum += num;
        }
    }
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink iterable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/84.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #84
    
----
commit d4f2a65fc4faa013181ceabcc8e3c0975c99156e
Author: Stephan Ewen <se...@apache.org>
Date:   2014-07-29T13:58:44Z

    [FLINK-1023] Switch group-at-a-time function to java.lang.Iterable (from java.util.Iterator)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50512992
  
    Very nice change!
    
    While reading through the examples, I was wondering if it would make sense to throw an exception if the user is trying to create a second iterator(). Or maybe an exception if hasNext() == false and the user wants to create a second iterator.
    I don't know if  a) there are usecases for creating two iterators
    and b) users will be confused by the behavior (similar to the mutable objects)
    
    In addition to that, we should copy the nicely written comment of the pull request into the documentation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-flink/pull/84


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50518588
  
    Throwing an error for a second `iterator()` call would be possible, but quite some change.
    
    For example in CoGroup, I use the simply `Collections.emptySet()` as the Iterable for empty sides. The `KeyGroupedIterable` for the reduce function also simply returns itself as the iterator, which means the original lightweight grouping logic can stay the same.
     
    Also, the some of old internal tests (Record-based) make use of the possibility to mix the *Iterable* and the *Iterator*. I could change that, if needed.
    
    We could make this change, but I would like to hear some opinion on whether this is actually more a confusing trap ("Why the heck do I get an empty set the second time I iterate over the input"), or whether this is actually understood that these inputs are what Scala calls a *TraversableOnce*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50535230
  
    I think throwing a meaningful exception when accessing the iterator a second time is the more intuitive and user-friendly option. I think it is easier to get a function right, if an exception explicitly tells you what you are doing wrong rather than debugging the code and wondering why the stupid iterator is empty...
    
    +1 to throw an exception when the iterator is requested the second time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50600322
  
    i feel like we are going in circles^^
    
    take the ```IterableIterator``` class, rename it to ```Iterator```, add into docu/javadocs "Iterator on which the for-each statement can be used. the for-each loop starts at the element that next() would return".
    
    traversable-once characteristic is obvious, supports for-each statement, solves these odd exception issues andd the differences to a normal iterator can be summed up in 2 lines. bonus: less API breaking, since its just an import that has to be changed.
    
    we cant use a name that makes all properties obvious (because ```IterableIterator``` is an odd name). the for-each statement not being obviously usable is imo a lot more desirable than then user creating faulty programs hitting exceptions based on false premises how the ```Iterable```works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50589778
  
    I agree with @StephanEwen that the Iterator makes it more obvious.
    
    Still, I vote for Iterable and the exception in order to support foreach loops.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50582983
  
    In that sense, having an `Iterator`rather than an `Iterable` makes things more obvious (the traversable-once characteristic). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50836623
  
    I have addressed the comments and made the Iterables traversable for a single time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1023] Switch group-at-a-time ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/84#issuecomment-50675228
  
    I have implemented the suggestion to let the `Iterables` return the `Iterator` once, and throw a meaningful exception otherwise.
    
    The old record API remains as it was before.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---