You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by "G.W." <gw...@gmail.com> on 2015/04/02 06:48:06 UTC
Re: Undirected Vertex Definition and Reflexivity

Hi guys,

Here's a follow-up on this based on everyone's positive feedback.

I was able to load edges and vertices separately, leveraging the -eip (and
-eif) and -vip (and -eip) flags.

Implementation wise, since I may have disconnected vertices, I went ahead
with creating one EdgeReader and one VertexReader. Those readers are
initialised in their respective InputFormat classes.

Once done and tested, I have implemented reversibility,
using IntNullReverseTextEdgeInputFormat as an example, by extending the
edge input format class.

I am now able to load edges and vertices in two files:

vertex_id, property_A, property_B, property_C

for vertices definition

and

vertex_id_source, vertex_id_target

for edges, with implicitly creating a reverse edge for each edge.

Works like a charm!


On 11 March 2015 at 19:23, Matthew Saltz <sa...@gmail.com> wrote:

> Hi,
>
> I believe the answer to your question is yes, though I've never done it.
> If you use only the edge reader, only the vertices in your graph that have
> at least one edge attached to them will be present in your graph. So, if
> you have vertices that are entirely disconnected that you want included,
> you'd need to do both a VertexReader and an Edge Reader (though I've never
> done this). If you don't have disconnected vertices, you don't need the
> VertexReader because Giraph will automatically add all of the vertices in
> your edge file to the graph (I think this can be disabled). Use the -eip
> flag to specify the edge file.
>
> Best,
> Matthew Saltz
>
> On Wed, Mar 11, 2015 at 1:54 AM, G.W. <gw...@gmail.com> wrote:
>
>> Thanks for that!
>>
>> This is the right idea, however I was only using a VertexReader until now
>> – IntNullReverseTextEdgeInputFormat calls for an EdgeReader.
>>
>> I am not sure this is the way it works but I like the idea of segregating
>> edge and vertex definitions.
>>
>> *That leads to the following questions: can Giraph support the use of a
>> VertexReader and EdgeReader at the same time, that is through the -vif and
>> -eif arguments? *
>>
>> If that works, the idea would be to refactor my input files with:
>>
>> Vertices:
>> vertex_id, vertex_type, ...
>>
>> Edges
>> source_id, target_id
>>
>> with the edge reader working in "reverse" mode.
>>
>> Thanks!
>>
>>
>>
>>
>> On 10 March 2015 at 20:02, Matthew Saltz <sa...@gmail.com> wrote:
>>
>>> Have a look at IntNullReverseTextEdgeInputFormat
>>> <https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html>.
>>> It automatically creates reverse edges, but it expects the file format
>>>
>>> <source_id, target_id>
>>>
>>> on each line. If you need to convert it to use longs you can change the
>>> code pretty easily.
>>>
>>> Best,
>>> Matthew
>>>
>>> On Tue, Mar 10, 2015 at 5:37 AM, Young Han <yo...@uwaterloo.ca>
>>> wrote:
>>>
>>>> The input is assumed to be the vertex followed by a set of *directed*
>>>> edges. So, in your example, leaving out E2 means that the final graph will
>>>> not have the directed edge from V2 to V1. To get an undirected edge, you
>>>> need a pair of directed edges.
>>>>
>>>> Internally, Giraph stores the out-edges of each vertex as an adjacency
>>>> list at that vertex. So, for example, your undirected graph becomes a
>>>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an
>>>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency
>>>> list {V2} and a vertex V2 with an empty adjacency list {}.
>>>>
>>>> There's no easy way for Giraph to infer that V2's adjacency list should
>>>> contain V1, because V2 does not track its in-edges. To get around this, you
>>>> can either (1) use an undirected input file with both pairs of edges
>>>> present; (2) have, in your algorithms, all vertices broadcast their ids to
>>>> their out-edge neighbours and then perform mutations to add the missing
>>>> edges in the first two superstep; or (3) modify the code in
>>>> org.apache.giraph.io.* (in giraph-core) to cache and add missing edges
>>>> (i.e., add a new "type" of input format). I'm fairly certain that there
>>>> doesn't already exist an "assume undirected graph" input reader, but I'm
>>>> not too familiar with the code paths and options there so I could be wrong.
>>>>
>>>> Young
>>>>
>>>> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <gw...@gmail.com> wrote:
>>>>
>>>>> Hi Giraph Mailing List,
>>>>>
>>>>> I am writing about an undirected graph I am trying to move to Giraph.
>>>>> I have a question about the assumption Giraph makes when processing an
>>>>> input.
>>>>>
>>>>> Let V1 and V2, two vertices connected with a common edge. E1 defines
>>>>> an edge from V1 to V2. E2 defines an edge from V2 to V1.
>>>>>
>>>>> Simply put, these are defined in an input file as:
>>>>> V1, E1
>>>>> V2, E2
>>>>>
>>>>> This is working fine, I can process the graph accordingly.
>>>>>
>>>>> I was trying to see what would happen if I was to simplify the input
>>>>> file:
>>>>> V1, E1
>>>>> V2
>>>>>
>>>>> When would come the time that V2 is processed in a superstep, Giraph
>>>>> would not suggest E1 as an  outgoing edge. My questions is why, knowing
>>>>> that E1 defines an edge from V1 to V2. The graph being undirected (although
>>>>> there is no provision for that in my Giraph computation), shouldn't Giraph
>>>>> assume that V2 is connected to V1?
>>>>>
>>>>> Down the road, the idea would be to streamline the input format, hence
>>>>> my question.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>