You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by greghogan <gi...@git.apache.org> on 2016/05/25 19:21:44 UTC

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/2036

    [FLINK-3806] [gelly] Revert use of DataSet.count()

    This leaves the Graph API unchanged but GatherSumApplyIteration and ScatterGatherIteration now use broadcast variables to share the numberOfVertices count. The PageRanks have been updated to use this feature.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 3806_revert_use_of_dataset_count

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2036
    
----
commit c56cbf997ceaec7e54f5d0847e960337ca1d84ba
Author: Greg Hogan <co...@greghogan.com>
Date:   2016-05-25T15:06:01Z

    [FLINK-3806] [gelly] Revert use of DataSet.count()

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64723638
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    @sewen thanks for the pointers.
    
    What are your thoughts on adding a hasBroadcastVariable method to RuntimeContext? And elsewhere in that class?
    
    The javadocs for getBroadcastVariable could note the exception as for getAccumulator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/2036#issuecomment-221811872
  
    Thanks for the PR @greghogan! I left one comment.
    The HITS implementation also makes use of the number of vertices in the same way PageRank did. We can update that library method too. Finally, could you update the usage docs of PageRank to remove `numVertices` from the parameters?
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/2036
  
    Thanks @greghogan! It looks like you have some checkstyle violation, otherwise +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2036: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2036


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64709459
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -327,6 +343,11 @@ private SumUdf(SumFunction<VV, EV, M> sumFunction, TypeInformation<Tuple2<K, M>>
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.sumFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    Likewise here and below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/2036
  
    All comments should now be addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2036: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2036
  
    Will merge this ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64718337
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    Could also add a hasBroadcastVariable method to RuntimeContext. I had first expected that getBroadcastVariable would return an empty Collection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64709426
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    I don't think we should leave the catch block empty here. Can we provide a meaningful message instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/2036
  
    Yes. If I allow IntelliJ to handle automatically organize imports then they can be reordered due to a lack of consistency and project-wide guidelines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64721168
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    The BatchAPI usually throws exceptions when people build a seemingly incorrect program, for example by using a broadcast variable that does not exist. It seemed to help people to catch bugs faster.
    
    The exception makes it obvious that something is wrong, the empty set would make various programs run but return a wrong result. That is much harder to recognize and debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.count()

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/2036
  
    Travis CI is now green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64749372
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    I think that makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3806] [gelly] Revert use of DataSet.cou...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2036#discussion_r64721441
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java ---
    @@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M> gatherFunction, TypeInformation<Tupl
     
     		@Override
     		public void open(Configuration parameters) throws Exception {
    +			try {
    +				Collection<LongValue> numberOfVertices = getRuntimeContext().getBroadcastVariable("number of vertices");
    +				this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
    +			} catch (Exception e) {
    +			}
    --- End diff --
    
    BTW: In the IntelliJ code style, they encourage to call ignored exceptions "ignored", or to at least add a comment why the exception is ignored:
    ```java
    try {
        ...
    } catch (Exception ignored) {}
    ```
    or
    ```java
    try {
        ...
    } catch (Exception e) {
        // thrown if BV does not exist, ignore here
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---