You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2012/10/31 00:42:12 UTC

[jira] [Updated] (GIRAPH-155) Allow creation of graph by adding edges that span multiple workers

     [ https://issues.apache.org/jira/browse/GIRAPH-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Presta updated GIRAPH-155:
-------------------------------------

    Attachment: GIRAPH-155.patch

This solution exploits the code we already have for vertex mutations.

We introduce the EdgeInputFormat class that produces edges from input splits.
For convenience, we also introduce the VertexValueInputFormat, a subclass of VertexInputFormat that doesn't produce edges.

A user can use an EdgeInputFormat in conjunction with a Vertex{Value}InputFormat, or only one of the two.

If only an EdgeInputFormat is used, the graph is built only based on the edges, and vertices are initialized with default values.
If both are used, their input is combined.

Corresponding text-based input formats are included, and they are supported by InternalVertexRunner.

I had to add Giraph{File/Text}InputFormat in order to deal with multiple sources of input (vertex data and edges).

A few caveats:
- only works with mutable vertices for now; we can support immutable ones too by modifying VertexResolver to use setEdges() when needed
- not integrated into GiraphRunner yet
- I had to bypass a couple Checkstyle violations
- there's more code duplication than I would like, but I saw no good way to extract a common base for vertex- and edge-related code
- the vertex mutation code is pretty old, so there might be possible performance improvements

Future work:
- add corresponding HCatalog input formats
- support immutable vertex classes
- integrate in GiraphRunner
- analyze performance of VertexResolver

Will post some perf results soon.
                
> Allow creation of graph by adding edges that span multiple workers
> ------------------------------------------------------------------
>
>                 Key: GIRAPH-155
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-155
>             Project: Giraph
>          Issue Type: New Feature
>          Components: graph, lib
>    Affects Versions: 0.1.0
>            Reporter: Dionysios Logothetis
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-155.patch
>
>
> Currently a graph is created only be adding vertices. The typical way is to read input text files line-by-line with each line describing a vertex (its value, its edges etc). The current API allows for the creation of a vertex only if all the information for the vertex is available in a single line.
> However, it's common to have graphs described in the form of edges. Edges might span multiple lines in an input file or even span multiple workers. The current API doesn't allow this. In the input superstep, a vertex must be created by a single worker.
> Instead, it should be possible for multiple workers to mutate the graph during the input superstep.
> This has the following implications:
> 1) Instead of just instantiating a vertex, a vertex reader should be able to do vertex addition and edge addition requests.
> 2) Multiple workers might try to create the same vertex. Any conflicts should be handled with a VertexResolver. So the resolver has to be instantiated before load time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira