You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by fobeligi <gi...@git.apache.org> on 2016/06/28 16:24:08 UTC

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

GitHub user fobeligi opened a pull request:

    https://github.com/apache/flink/pull/2178

    [Flink-1815] Add methods to read and write a Graph as adjacency list

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [ ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [ ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fobeligi/incubator-flink FLINK-1815

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2178
    
----
commit 3a9502da61b7758e1383803d5141a16fe3a5777a
Author: fobeligi <fa...@gmail.com>
Date:   2016-06-22T16:11:23Z

    [FLINK-1815] Add GraphAdjacencyListReader class to read an Adjacency List formatted text file. Moreover, add writeAsAdjacencyList method to Graph. Test cases are also added for each new method.

commit 8aab5b40e031b132c46782a5908d58cc6290892f
Author: fobeligi <fa...@gmail.com>
Date:   2016-06-28T08:49:03Z

    [FLINK-1815] Add fromAdjacencyListFile and writeAsAdjacencyList methods to Graph scala API. Tests are also added.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by greghogan <gi...@git.apache.org>.

Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68832549
  
    --- Diff: flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala ---
    @@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) {
        *
        * @param analytic the analytic to run on the Graph
        */
    -  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T]):
    -  GraphAnalytic[K, VV, EV, T] = {
    +  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T])= {
    --- End diff --
    
    Was this change intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by greghogan <gi...@git.apache.org>.

Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68832940
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen
     	}
     
     	/**
    +	 * Creates a graph from a Adjacency List text file  with Vertex Key values. Edges will be created automatically.
    +	 *
    +	 * @param filePath a path to an Adjacency List text file with the Vertex data
    +	 * @param context  the execution environment.
    +	 * @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader},
    +	 * on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph.
    +	 */
    +	public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) {
    +		return new GraphAdjacencyListReader(filePath, context);
    +	}
    +
    +	/**
    +	 * Writes a graph as an Adjacency List formatted text file in a user specified folder.
    +	 *
    +	 * @param filePath   the path that the Adjacency List formatted text file should be written in
    +	 * @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text
    +	 *                   file. Delimiters should be provided with the following order:
    +	 *                   NEIGHBOR_DELIMITER : separating source from its neighbors
    +	 *                   VERTICES_DELIMITER : separating the different neighbors of a source vertex
    +	 *                   VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the
    +	 *                   target vertex-ids from the edge value.
    +	 */
    +	public void writeAsAdjacencyList(String filePath, String... delimiters) {
    +
    +		final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t";
    +
    +		final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ",";
    +
    +		final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-";
    +
    +
    +		DataSet<Tuple2<K, VV>> vertices = this.getVerticesAsTuple2();
    +
    +		DataSet<Tuple3<K, K, EV>> edgesNValues = this.getEdgesAsTuple3();
    --- End diff --
    
    Do we need to convert the vertex and edge sets to tuples?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by greghogan <gi...@git.apache.org>.

Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68832491
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen
     	}
     
     	/**
    +	 * Creates a graph from a Adjacency List text file  with Vertex Key values. Edges will be created automatically.
    +	 *
    +	 * @param filePath a path to an Adjacency List text file with the Vertex data
    +	 * @param context  the execution environment.
    +	 * @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader},
    +	 * on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph.
    +	 */
    +	public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) {
    +		return new GraphAdjacencyListReader(filePath, context);
    +	}
    +
    +	/**
    +	 * Writes a graph as an Adjacency List formatted text file in a user specified folder.
    +	 *
    +	 * @param filePath   the path that the Adjacency List formatted text file should be written in
    +	 * @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text
    +	 *                   file. Delimiters should be provided with the following order:
    +	 *                   NEIGHBOR_DELIMITER : separating source from its neighbors
    +	 *                   VERTICES_DELIMITER : separating the different neighbors of a source vertex
    +	 *                   VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the
    +	 *                   target vertex-ids from the edge value.
    +	 */
    +	public void writeAsAdjacencyList(String filePath, String... delimiters) {
    +
    +		final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t";
    +
    +		final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ",";
    +
    +		final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-";
    --- End diff --
    
    Test length against "2".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by fobeligi <gi...@git.apache.org>.

Github user fobeligi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68848469
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen
     	}
     
     	/**
    +	 * Creates a graph from a Adjacency List text file  with Vertex Key values. Edges will be created automatically.
    +	 *
    +	 * @param filePath a path to an Adjacency List text file with the Vertex data
    +	 * @param context  the execution environment.
    +	 * @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader},
    +	 * on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph.
    +	 */
    +	public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) {
    +		return new GraphAdjacencyListReader(filePath, context);
    +	}
    +
    +	/**
    +	 * Writes a graph as an Adjacency List formatted text file in a user specified folder.
    +	 *
    +	 * @param filePath   the path that the Adjacency List formatted text file should be written in
    +	 * @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text
    +	 *                   file. Delimiters should be provided with the following order:
    +	 *                   NEIGHBOR_DELIMITER : separating source from its neighbors
    +	 *                   VERTICES_DELIMITER : separating the different neighbors of a source vertex
    +	 *                   VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the
    +	 *                   target vertex-ids from the edge value.
    +	 */
    +	public void writeAsAdjacencyList(String filePath, String... delimiters) {
    +
    +		final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t";
    +
    +		final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ",";
    +
    +		final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-";
    +
    +
    +		DataSet<Tuple2<K, VV>> vertices = this.getVerticesAsTuple2();
    +
    +		DataSet<Tuple3<K, K, EV>> edgesNValues = this.getEdgesAsTuple3();
    --- End diff --
    
    As I see now, we don't have to convert the vertex set to tuple2 set, so I already changed that.
    
    Regarding the edges dataset, in order to write the Adjacency List file, I use the coGroup transformation to the Vertex dataset and EdgesAsTuple3 dataset, where the vertexId equals the source of the edge. 
    
    In that case, even when a Vertex is source to no edges (e.g. has only incoming edges), I can still have the vertexId in the "coGrouped" dataset (I couldn't do that with a join).
    
    I can't think how I could use the Edge dataset in a coGroup or similar transformation. 
    Please let me know if you have any suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by fobeligi <gi...@git.apache.org>.

Github user fobeligi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68848969
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
    @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen
     	}
     
     	/**
    +	 * Creates a graph from a Adjacency List text file  with Vertex Key values. Edges will be created automatically.
    +	 *
    +	 * @param filePath a path to an Adjacency List text file with the Vertex data
    +	 * @param context  the execution environment.
    +	 * @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader},
    +	 * on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph.
    +	 */
    +	public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) {
    +		return new GraphAdjacencyListReader(filePath, context);
    +	}
    +
    +	/**
    +	 * Writes a graph as an Adjacency List formatted text file in a user specified folder.
    +	 *
    +	 * @param filePath   the path that the Adjacency List formatted text file should be written in
    +	 * @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text
    +	 *                   file. Delimiters should be provided with the following order:
    +	 *                   NEIGHBOR_DELIMITER : separating source from its neighbors
    +	 *                   VERTICES_DELIMITER : separating the different neighbors of a source vertex
    +	 *                   VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the
    +	 *                   target vertex-ids from the edge value.
    +	 */
    +	public void writeAsAdjacencyList(String filePath, String... delimiters) {
    +
    +		final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t";
    +
    +		final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ",";
    +
    +		final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-";
    --- End diff --
    
    You mean the error in this declaration: 
    ```java
    final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-";
    ```
    and not to check directly for length greater than two, because in that way the user will have to provide all three delimiters or none.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

Posted by fobeligi <gi...@git.apache.org>.

Github user fobeligi commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2178#discussion_r68846112
  
    --- Diff: flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala ---
    @@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) {
        *
        * @param analytic the analytic to run on the Graph
        */
    -  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T]):
    -  GraphAnalytic[K, VV, EV, T] = {
    +  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T])= {
    --- End diff --
    
    No, I will revert the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---