You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by shghatge <gi...@git.apache.org> on 2015/06/10 13:25:48 UTC

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

GitHub user shghatge opened a pull request:

    https://github.com/apache/flink/pull/818

    [FLINK-2093][gelly] Added difference Method

    Tasks given on 5th June:
    Add a difference function to the Graph.java
    Modify the docs 'gelly-guide.md'
    Add the test case for difference() method to GraphMutationsITCase.java

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shghatge/flink difference_new

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #818
    
----
commit e8a5250b4588326b606b2f29d2f2c2f6e4554925
Author: Shivani <sh...@gmail.com>
Date:   2015-06-10T11:22:37Z

    [FLINK-2093][gelly] Added difference Method

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by shghatge <gi...@git.apache.org>.
Github user shghatge commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-114983698
  
    Updated PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r32374715
  
    --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -1234,6 +1234,18 @@ public void coGroup(Iterable<Edge<K, EV>> edge, Iterable<Edge<K, EV>> edgeToBeRe
     	}
     
     	/**
    +	 * Performs Difference on the vertex and edge sets of the input graphs
    +	 * removes common vertices and edges. If a source/target vertex is removed, its corresponding edge will also be removed
    +	 * @param graph the graph to perform difference with
    +	 * @return a new graph where the common vertices and edges have been removed
    +	 */
    +	public Graph<K,VV,EV> difference(Graph<K,VV,EV> graph) throws java.lang.Exception{
    +		DataSet<Vertex<K,VV>> removeVerticesData = graph.getVertices();
    +		final List<Vertex<K,VV>> removeVerticesList = removeVerticesData.collect();
    --- End diff --
    
    I don't think we should use `collect()` here.. Keep in mind that (1) `collect()` will trigger program execution and (2) should not be used to collect large DataSets and input graph might have lots of vertices.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112095145
  
    Hi @andralungu, @shghatge,
    I'm a bit confused by your changes.. Why did you add a new `removeVertices` API method for DataSet?
    I thought we had agreed that we'll have add/remove* methods for Lists and union/difference for DataSets, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-115215460
  
    Thank you @shghatge! I'll merge this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by shghatge <gi...@git.apache.org>.
Github user shghatge commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-113150859
  
    Updated the docs accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112814353
  
    Hey,
    
    difference in not a fancy way of removing vertices, no :)
    If you recall our previous conversation in #678, we decided to have add/remove methods for small mutations (i.e. can be stored in a List)  and union/difference for large mutations (for large data in the form of Graph).
    
    The `collect()` method is not "unsafe". It is simply not advised to be used for large datasets. If you think about it, it needs to retrieve the data (which might be distributed over several nodes) from to local JVM and trigger the program execution.
    
    If you found that the only way to implement `difference` is to overload `removeVertices`, then we should also consider what this means in terms of the API. If we have a remove* method for DataSets, don't we also need a corresponding add*? You see how this goes back to #678 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-110739263
  
    Hi @shghatge, 
    
    Apart from the minor cosmetic suggestions I made, everything looks well. 
    
    @vasia, could you double check this? After Shivani simplifies the difference method a bit, I would call this good to merge. 
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by shghatge <gi...@git.apache.org>.
Github user shghatge commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112827357
  
    Hello @vasia 
    Could you please elaborate on what should be the future course of action?
    Should I put the content of removeVertices(DataSet<>) method in the Difference() method so as to not add a new public method? Or is it better to overload the add methods for DataSets  too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r32415176
  
    --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -1151,6 +1151,23 @@ public void coGroup(Iterable<Vertex<K, VV>> vertex, Iterable<Vertex<K, VV>> vert
     		}
     	}
     
    +
    --- End diff --
    
    always add Javadoc to new methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112086337
  
    Looks good to me. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-113149078
  
    Hi @shghatge ,
    
    Don't forget to remove the definition for the public removeVertices(DataSet) from the documentation.  
    
    Up for discussion: should we keep the name removeVertices for the private, helper method or should we call it something else, like removeVerticesAndEdges... Names are not my strongest point, but I guess you got the idea :) Personally, I am fine with the current name!  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r32374703
  
    --- Diff: docs/libs/gelly_guide.md ---
    @@ -240,6 +240,7 @@ Graph<Long, Double, Double> networkWithWeights = network.joinWithEdgesOnSource(v
         <img alt="Union Transformation" width="50%" src="fig/gelly-union.png"/>
     </p>
     
    +* <strong>Difference</strong>: Gelly's `difference()` method performs a difference on the vertex and edge sets of the input graphs. The resultant graph is formed by removing the vertices and edges from the graph that are common with the second graph.
    --- End diff --
    
    we can rephrase this a bit.. there is one input graph and no "second" graph... I guess you copied from the union description above (which should also be changed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-111759442
  
    Hi @shghatge! Thank you for the PR!
    I left you a couple of comments. Let us know if you have questions :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-114430160
  
    Hi @shghatge!
    Thank you for the quick update. Apart from my minor comment, this looks good now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r32415140
  
    --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -1151,6 +1151,23 @@ public void coGroup(Iterable<Vertex<K, VV>> vertex, Iterable<Vertex<K, VV>> vert
     		}
     	}
     
    +
    +	public Graph<K, VV, EV> removeVertices(DataSet<Vertex<K, VV>> verticesToBeRemoved){
    --- End diff --
    
    Yes, this is definitely the idea, but right now you are duplicating a lot of code. Can we find a smarter way (i.e. that requires as little code duplication as possible) :)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r33026720
  
    --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -1234,6 +1248,17 @@ public void coGroup(Iterable<Edge<K, EV>> edge, Iterable<Edge<K, EV>> edgeToBeRe
     	}
     
     	/**
    +	 * Performs Difference on the vertex and edge sets of the input graphs
    +	 * removes common vertices and edges. If a source/target vertex is removed, its corresponding edge will also be removed
    +	 * @param graph the graph to perform difference with
    +	 * @return a new graph where the common vertices and edges have been removed
    +	 */
    +	public Graph<K,VV,EV> difference(Graph<K,VV,EV> graph) throws java.lang.Exception{
    --- End diff --
    
    why the `throws` declaration?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by vasia <gi...@git.apache.org>.
Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112923172
  
    Hi @shghatge!
    
    For this issue, I think we should only add the `difference` method. If you want to avoid duplicating code, you can always define a private helper method.
    If we decide that we need overloaded public methods for additions and removals, we should discuss this in a separate JIRA in my view.
    Let me know if you have questions!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on a diff in the pull request:

    https://github.com/apache/flink/pull/818#discussion_r32414981
  
    --- Diff: docs/libs/gelly_guide.md ---
    @@ -240,6 +240,7 @@ Graph<Long, Double, Double> networkWithWeights = network.joinWithEdgesOnSource(v
         <img alt="Union Transformation" width="50%" src="fig/gelly-union.png"/>
     </p>
     
    +* <strong>Difference</strong>: Gelly's `difference()` method performs a difference on the vertex and edge sets of the input graphs. The resultant graph is formed by removing the common vertices and edges from the graph.
    --- End diff --
    
    I think @vasia also wanted you to update the description for union ^^
    
    Now, this still looks a bit unclear. It seems that there are two input graphs. You should make it obvious that the current graph gets differentiated with an input graph. That way, you won't leave room for comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by andralungu <gi...@git.apache.org>.
Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-112099290
  
    Hi @vasia ,
    
    In essence the `difference` method is just a fancy way of removing vertices, right?
    When you remove a vertex, you also remove the edge for which it was a source/target.
    
    Since the add/remove vertices methods work just for lists and collect is "unsafe", we mutually agreed to overload `removeVertices` to work for data sets. 
    This way you would duplicate the least amount of code. Otherwise, you would take the exact code in the DataSet removeVertices and duplicate it in difference. That's not very practical IMO.
    
    Also, it may occur that a user has a DataSet of elements to remove. An extra removeVertices won't really hurt then, would it? 
    
    But if you have suggestions on how to improve this, we are more than eager to hear about them :)
    
    -Andra


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by shghatge <gi...@git.apache.org>.
Github user shghatge commented on the pull request:

    https://github.com/apache/flink/pull/818#issuecomment-113147138
  
    Updated the PR by changing the removeVertices(DataSet<Vertex<K, VV>>) method  access from Public to Private. It is only used as a helper function for the difference method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---