You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by José Luis Larroque <la...@gmail.com> on 2015/08/11 03:54:06 UTC

Re: Giraph best's Vertex Input format, for an input file with Vertex ids of type String

I solved this adapting my own file to fit in
org.apache.giraph.io.formats.TextDoubleDoubleAdjacencyListVertexInputFormat
. My original file should be like this:

Portada 0.0     Sugerencias     1.0
Proverbios      0.0
Neil    0.0     Luna    1.0     ideal   1.0     verdad  1.0
Categoria:Ingenieros    2.0     Categoria:Estadounidenses       2.0
 Categoria:Astronautas   2.0
Categoria:Ingenieros    1.0     Neil    2.0
Categoria:Estadounidenses       1.0     Neil    2.0
Categoria:Astronautas   1.0     Neil    2.0

Those spaces between the data are tab spaces ('\t'), because this format
has that option as predetermined token value for spliting the original
lines into several strings.

Thanks to all and specially to nishant ghandi for the help.

Bye!

Jose Luis Larroque

2015-07-30 1:03 GMT-03:00 José Luis Larroque <la...@gmail.com>:

> I leave here the shortest path algorithm that i'm using (maybe it's
> necesary for a better understanding of my question), it's an adaptation of
> the Original Shortest Path Example of Giraph:
>
> @Algorithm(name = "Shortest paths", description = "Finds all shortest
> paths from a selected vertex")
> public class SimpleShortestPathComputationMio extends
>         BasicComputation<Text, DoubleWritable, DoubleWritable,
> DoubleWritable> {
>     /** The shortest paths id */
>     public static final LongConfOption SOURCE_ID = new LongConfOption(
>             "SimpleShortestPathsVertex.sourceId", 1, "The shortest paths
> id");
>     /** Class logger */
>     private static final Logger LOG = Logger
>             .getLogger(SimpleShortestPathComputationMio.class);
>
>     /**
>      * Is this vertex the source id?
>      *
>      * @param vertex
>      *            Vertex
>      * @return True if the source id
>      */
>     private boolean isSource(Vertex<Text, ?, ?> vertex) {
>         return vertex.getId().equals(SOURCE_ID.get(getConf()));
>     }
>
>     @Override
>     public void compute(Vertex<Text, DoubleWritable, DoubleWritable>
> vertex,
>             Iterable<DoubleWritable> messages) throws IOException {
>         if (getSuperstep() == 0) {
>             vertex.setValue(new DoubleWritable(Double.MAX_VALUE));
>         }
>         Double minDist = isSource(vertex) ? Double.valueOf(0)
>                 : Double.MAX_VALUE;
>         for (DoubleWritable message : messages) {
>             minDist = Math.min(minDist, message.get());
>         }
>         if (LOG.isDebugEnabled()) {
>             LOG.debug("Vertex " + vertex.getId() + " got minDist = " +
> minDist
>                     + " vertex value = " + vertex.getValue());
>         }
>         if (minDist < vertex.getValue().get()) {
>             vertex.setValue(new DoubleWritable(minDist));
>             for (Edge<Text, DoubleWritable> edge : vertex.getEdges()) {
>                 Double distance = minDist + edge.getValue().get();
>                 if (LOG.isDebugEnabled()) {
>                     LOG.debug("Vertex " + vertex.getId() + " sent to "
>                             + edge.getTargetVertexId() + " = " + distance);
>                 }
>                 sendMessage(edge.getTargetVertexId(), new DoubleWritable(
>                         distance));
>             }
>         }
>         vertex.voteToHalt();
>     }
>
> }
>
> 2015-07-29 23:11 GMT-03:00 José Luis Larroque <la...@gmail.com>:
>
>> Hi everyone, i'm Jose from Argentina, and i'm working with Giraph for my
>> thesis. I'm stuck in the same point weeks ago, so finally i decided to come
>> and ask for help !
>>
>> I have a multinode giraph cluster working properly in my PC. I executed
>> the SimpleShortestPathExample from Giraph and was executed fine.
>>
>> This algorithm was ran with this file (tiny_graph.txt):
>>
>> [0,0,[[1,1],[3,3]]]
>> [1,0,[[0,1],[2,2],[3,1]]]
>> [2,0,[[1,2],[4,4]]]
>> [3,0,[[0,3],[1,1],[4,4]]]
>> [4,0,[[3,4],[2,4]]]
>>
>> This file has the following input format:
>>
>> [source_id,source_value,[[dest_id, edge_value],...]]
>>
>> Now, I’m trying to execute this same algorithm, in this same cluster, but
>> with an input file different from the original. My own file is like this:
>>
>> [Portada,0,[[Sugerencias para la cita del día,1]]]
>> [Proverbios españoles,0,[]]
>> [Neil Armstrong,0,[[Luna,1][ideal,1][verdad,1][Categoria:Ingenieros,2,[Categoria:Estadounidenses,2][Categoria:Astronautas,2]]]
>> [Categoria:Ingenieros,1,[[Neil Armstrong,2]]]
>> [Categoria:Estadounidenses,1,[[Neil Armstrong,2]]]
>> [Categoria:Astronautas,1,[[Neil Armstrong,2]]]
>>
>> It's very similar to the original, but the id's are String and the vertex
>> and edges values are Long. My question it's which VertexInputFormat should
>> i use for this kind of format, because i already try with
>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>> and
>> org.apache.giraph.io.formats.TextDoubleDoubleAdjacencyListVertexInputFormat
>> and i couldn't get this working.
>>
>> With this problem solved, i could adapt the original shortest path
>> example algorithm and let it work for my file, but until i get a solution
>> for this i can't reach to that point.
>>
>> If this format it's not a good decision, i could adapt it maybe, but i
>> don't know which it's my best option, my knowledge from Vertex Text Input
>> and Output Format in giraph it's really bad, that's why i'me here asking
>> for some advice.
>>
>>
>> Thanks in advance!!
>>
>>
>> Jose
>>
>
>