You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Martin Neumann <mn...@spotify.com> on 2014/02/27 10:48:37 UTC

Different num supersteps

Hej,

I have modified the connected component example to fit my input data. I
expect it to be deterministic.

But when I run it multiple times it takes a different number of Super
steps. This only happens on the complete dataset and not on my small test
dataset.
 (So I cannot check the output for correctness in a simple way)

Has anyone an Idea how this could happen?

cheers Martin




In case its useful here the computation class:

@Override
    public void compute(Vertex<Text, Text, NullWritable> vertex,
            Iterable<Text> inmessage) throws IOException {
        boolean changed = false;

// first superstep && setup
        if (getSuperstep() == 0) {
            //initialize value
            vertex.setValue(vertex.getId());
            //cheating by checking the neighbors ID's (cuts down 1
iteration)
            for (Edge<Text, NullWritable> e : vertex.getEdges()) {
                Text candidate = e.getTargetVertexId();
                if (candidate.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(candidate);
                }
            }
        }

        // other superstep
        else {
            // read all messages and compare with own state
            for (Text message : inmessage) {
                if (message.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(message);
                }
            }
        }

        // if state has changed send a message to all neighbors
        if (changed && getSuperstep() < limiter) {
            sendMessageToAllEdges(vertex, vertex.getValue());
        }

        vertex.voteToHalt();
    }

Re: Different num supersteps

Posted by Martin Neumann <mn...@spotify.com>.
Hi,

I managed to fix it even if I'm still not entirely sure what happened.
The fix is to make a new Text object every time a Text is required as input
(Text does not implement Cloneable). I guess it

So instead of:
  Text candidate = e.getTargetVertexId();
  ...
  vertex.setValue(candidate))

The working version is:
  Text candidate = e.getTargetVertexId();
  ...
  vertex.setValue(new Text(candidate.toString()))
















@Override
    public void compute(Vertex<Text, Text, NullWritable> vertex,
            Iterable<Text> inmessage) throws IOException {

        boolean changed = false;

        // first superstep && setup
        if (getSuperstep() == 0) {
            //initialize value
            vertex.setValue(new Text(vertex.getId().toString()));
            //cheating by checking the neighbors ID's (cuts down 1
iteration)
            for (Edge<Text, NullWritable> e : vertex.getEdges()) {
                Text candidate = e.getTargetVertexId();
                if (candidate.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(new Text(candidate.toString()));
                }
            }
        }

        // other superstep
        else {
            // read all messages and compare with own state
            for (Text message : inmessage) {
                if (message.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(new Text(message.toString()));
                }
            }
        }

        // if state has changed send a message to all neighbors
        if (changed && getSuperstep() < limiter) {
            sendMessageToAllEdges(vertex, new
Text(vertex.getValue().toString()));
        }

        vertex.voteToHalt();
    }




On Mon, Mar 3, 2014 at 2:19 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Martin,
>
> I'm not sure wether we require InputFormats to be threadsafe. Can someone
> answer that question?
>
> Maybe thats the reason you see this behavior.
>
> --sebastian
>
>
>
> On 03/03/2014 10:05 AM, Martin Neumann wrote:
>
>> I checked the input just creating the graph and comparing it. While I cant
>> say the graph is correct (its to big) its at least consistent.
>>
>> So the only things where the different output can come from is the
>> connected component part (see code further down). I'm completely stomped,
>> the code is basically the example code Giraph ships with all I changed was
>> replacing the IntWriteable Id's with Text.
>>
>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>
>> cheers Martin
>>
>>
>>
>> @Override
>>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>>              Iterable<Text> inmessage) throws IOException {
>>          boolean changed = false;
>>
>> // first superstep && setup
>>          if (getSuperstep() == 0) {
>>              //initialize value
>>              vertex.setValue(vertex.getId()
>> );
>>              //cheating by checking the neighbors ID's (cuts down 1
>> iteration)
>>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>>                  Text candidate = e.getTargetVertexId();
>>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(candidate);
>>                  }
>>              }
>>          }
>>
>>          // other superstep
>>          else {
>>              // read all messages and compare with own state
>>              for (Text message : inmessage) {
>>                  if (message.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(message);
>>                  }
>>              }
>>          }
>>
>>          // if state has changed send a message to all neighbors
>>          if (changed && getSuperstep() < limiter) {
>>              sendMessageToAllEdges(vertex, vertex.getValue());
>>          }
>>          vertex.voteToHalt();
>>      }
>>
>>
>>
>> On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com>
>> wrote:
>>
>>  Hey,
>>>
>>> sorry, yes I intended to send it to the mailing-list, thanks for the hint
>>> (I repost it to the list). Guess I should stop working in the middle of
>>> the
>>> night. :-)
>>>
>>> I tested the input format and its output is consistent, I always get the
>>> same graph when running it. So the bug has to be in the connected
>>> component
>>> part since that one gives me different outputs for the same input graph.
>>>
>>> cheers Martin
>>>
>>>
>>> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>>
>>>  Martin,
>>>>
>>>> can you write a MapReduce job that creates your graph and run it with a
>>>> simpler inputformat?
>>>>
>>>> I really suspect that the bug lies somewhere in your input format.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>>>
>>>>  I checked the input just creating the graph and comparing it. While I
>>>>> cant
>>>>> say the graph is correct (for its to big) its at least consistent.
>>>>>
>>>>> So the only things where the different output can come from is the
>>>>> connected component part (see code in the first mail). I'm completely
>>>>> stomped, the code is basically the example code Giraph ships with all I
>>>>> changed was replacing the IntWriteable Id's with Text.
>>>>>
>>>>> Anyone has any Idea what the problem could be, I'm running out of
>>>>> Idea's.
>>>>>
>>>>> cheers Martin
>>>>>
>>>>>
>>>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>>>> wrote:
>>>>>
>>>>>   Hi Martin
>>>>>
>>>>>>
>>>>>> I don't think that there are problems with comparing and sorting Text
>>>>>> writables as Hadoop is basically a big external sorting system.
>>>>>>
>>>>>> I'm not sure I understand your edge input reader, it looks very
>>>>>> complex,
>>>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>>>> using
>>>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>>>> problems still occur.
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>>
>>>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>>>
>>>>>>   Hm
>>>>>>
>>>>>>>
>>>>>>> I ran the job 5 times and made a diff between the outputs and they
>>>>>>> are
>>>>>>> not
>>>>>>> the same. I cant find anything in the code that could lead to this
>>>>>>> behaviour.
>>>>>>>
>>>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>>>> anyone experience with String identifier?
>>>>>>> Is a possible that there are problems with comparing and sorting
>>>>>>> TextWritables?
>>>>>>>
>>>>>>> cheers Martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Different num supersteps

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Martin,

I'm not sure wether we require InputFormats to be threadsafe. Can 
someone answer that question?

Maybe thats the reason you see this behavior.

--sebastian


On 03/03/2014 10:05 AM, Martin Neumann wrote:
> I checked the input just creating the graph and comparing it. While I cant
> say the graph is correct (its to big) its at least consistent.
>
> So the only things where the different output can come from is the
> connected component part (see code further down). I'm completely stomped,
> the code is basically the example code Giraph ships with all I changed was
> replacing the IntWriteable Id's with Text.
>
> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>
> cheers Martin
>
>
>
> @Override
>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>              Iterable<Text> inmessage) throws IOException {
>          boolean changed = false;
>
> // first superstep && setup
>          if (getSuperstep() == 0) {
>              //initialize value
>              vertex.setValue(vertex.getId()
> );
>              //cheating by checking the neighbors ID's (cuts down 1
> iteration)
>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>                  Text candidate = e.getTargetVertexId();
>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(candidate);
>                  }
>              }
>          }
>
>          // other superstep
>          else {
>              // read all messages and compare with own state
>              for (Text message : inmessage) {
>                  if (message.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(message);
>                  }
>              }
>          }
>
>          // if state has changed send a message to all neighbors
>          if (changed && getSuperstep() < limiter) {
>              sendMessageToAllEdges(vertex, vertex.getValue());
>          }
>          vertex.voteToHalt();
>      }
>
>
>
> On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com> wrote:
>
>> Hey,
>>
>> sorry, yes I intended to send it to the mailing-list, thanks for the hint
>> (I repost it to the list). Guess I should stop working in the middle of the
>> night. :-)
>>
>> I tested the input format and its output is consistent, I always get the
>> same graph when running it. So the bug has to be in the connected component
>> part since that one gives me different outputs for the same input graph.
>>
>> cheers Martin
>>
>>
>> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org> wrote:
>>
>>> Martin,
>>>
>>> can you write a MapReduce job that creates your graph and run it with a
>>> simpler inputformat?
>>>
>>> I really suspect that the bug lies somewhere in your input format.
>>>
>>> --sebastian
>>>
>>>
>>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>>
>>>> I checked the input just creating the graph and comparing it. While I
>>>> cant
>>>> say the graph is correct (for its to big) its at least consistent.
>>>>
>>>> So the only things where the different output can come from is the
>>>> connected component part (see code in the first mail). I'm completely
>>>> stomped, the code is basically the example code Giraph ships with all I
>>>> changed was replacing the IntWriteable Id's with Text.
>>>>
>>>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>>>
>>>> cheers Martin
>>>>
>>>>
>>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>>> wrote:
>>>>
>>>>   Hi Martin
>>>>>
>>>>> I don't think that there are problems with comparing and sorting Text
>>>>> writables as Hadoop is basically a big external sorting system.
>>>>>
>>>>> I'm not sure I understand your edge input reader, it looks very complex,
>>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>>> using
>>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>>> problems still occur.
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>>
>>>>>   Hm
>>>>>>
>>>>>> I ran the job 5 times and made a diff between the outputs and they are
>>>>>> not
>>>>>> the same. I cant find anything in the code that could lead to this
>>>>>> behaviour.
>>>>>>
>>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>>> anyone experience with String identifier?
>>>>>> Is a possible that there are problems with comparing and sorting
>>>>>> TextWritables?
>>>>>>
>>>>>> cheers Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Different num supersteps

Posted by Martin Neumann <mn...@spotify.com>.
I checked the input just creating the graph and comparing it. While I cant
say the graph is correct (its to big) its at least consistent.

So the only things where the different output can come from is the
connected component part (see code further down). I'm completely stomped,
the code is basically the example code Giraph ships with all I changed was
replacing the IntWriteable Id's with Text.

Anyone has any Idea what the problem could be, I'm running out of Idea's.

cheers Martin



@Override
    public void compute(Vertex<Text, Text, NullWritable> vertex,
            Iterable<Text> inmessage) throws IOException {
        boolean changed = false;

// first superstep && setup
        if (getSuperstep() == 0) {
            //initialize value
            vertex.setValue(vertex.getId()
);
            //cheating by checking the neighbors ID's (cuts down 1
iteration)
            for (Edge<Text, NullWritable> e : vertex.getEdges()) {
                Text candidate = e.getTargetVertexId();
                if (candidate.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(candidate);
                }
            }
        }

        // other superstep
        else {
            // read all messages and compare with own state
            for (Text message : inmessage) {
                if (message.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(message);
                }
            }
        }

        // if state has changed send a message to all neighbors
        if (changed && getSuperstep() < limiter) {
            sendMessageToAllEdges(vertex, vertex.getValue());
        }
        vertex.voteToHalt();
    }



On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com> wrote:

> Hey,
>
> sorry, yes I intended to send it to the mailing-list, thanks for the hint
> (I repost it to the list). Guess I should stop working in the middle of the
> night. :-)
>
> I tested the input format and its output is consistent, I always get the
> same graph when running it. So the bug has to be in the connected component
> part since that one gives me different outputs for the same input graph.
>
> cheers Martin
>
>
> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org> wrote:
>
>> Martin,
>>
>> can you write a MapReduce job that creates your graph and run it with a
>> simpler inputformat?
>>
>> I really suspect that the bug lies somewhere in your input format.
>>
>> --sebastian
>>
>>
>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>
>>> I checked the input just creating the graph and comparing it. While I
>>> cant
>>> say the graph is correct (for its to big) its at least consistent.
>>>
>>> So the only things where the different output can come from is the
>>> connected component part (see code in the first mail). I'm completely
>>> stomped, the code is basically the example code Giraph ships with all I
>>> changed was replacing the IntWriteable Id's with Text.
>>>
>>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>>
>>> cheers Martin
>>>
>>>
>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>>
>>>  Hi Martin
>>>>
>>>> I don't think that there are problems with comparing and sorting Text
>>>> writables as Hadoop is basically a big external sorting system.
>>>>
>>>> I'm not sure I understand your edge input reader, it looks very complex,
>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>> using
>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>> problems still occur.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>
>>>>  Hm
>>>>>
>>>>> I ran the job 5 times and made a diff between the outputs and they are
>>>>> not
>>>>> the same. I cant find anything in the code that could lead to this
>>>>> behaviour.
>>>>>
>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>> anyone experience with String identifier?
>>>>> Is a possible that there are problems with comparing and sorting
>>>>> TextWritables?
>>>>>
>>>>> cheers Martin
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Different num supersteps

Posted by Sebastian Schelter <ss...@apache.org>.
Martin,

can you write a MapReduce job that creates your graph and run it with a 
simpler inputformat?

I really suspect that the bug lies somewhere in your input format.

--sebastian

On 03/02/2014 09:48 PM, Martin Neumann wrote:
> I checked the input just creating the graph and comparing it. While I cant
> say the graph is correct (for its to big) its at least consistent.
>
> So the only things where the different output can come from is the
> connected component part (see code in the first mail). I'm completely
> stomped, the code is basically the example code Giraph ships with all I
> changed was replacing the IntWriteable Id's with Text.
>
> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>
> cheers Martin
>
>
> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org> wrote:
>
>> Hi Martin
>>
>> I don't think that there are problems with comparing and sorting Text
>> writables as Hadoop is basically a big external sorting system.
>>
>> I'm not sure I understand your edge input reader, it looks very complex,
>> maybe there's a bug somewhere. You could try to preprocess your data using
>> Hadoop so that you can use a simple VertexInputFormat and see if your
>> problems still occur.
>>
>> --sebastian
>>
>>
>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>
>>> Hm
>>>
>>> I ran the job 5 times and made a diff between the outputs and they are not
>>> the same. I cant find anything in the code that could lead to this
>>> behaviour.
>>>
>>> The only idea where to look a the moment would be the identifier. Has
>>> anyone experience with String identifier?
>>> Is a possible that there are problems with comparing and sorting
>>> TextWritables?
>>>
>>> cheers Martin
>>>
>>>
>>
>


Re: Different num supersteps

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Martin

I don't think that there are problems with comparing and sorting Text 
writables as Hadoop is basically a big external sorting system.

I'm not sure I understand your edge input reader, it looks very complex, 
maybe there's a bug somewhere. You could try to preprocess your data 
using Hadoop so that you can use a simple VertexInputFormat and see if 
your problems still occur.

--sebastian

On 02/27/2014 04:41 PM, Martin Neumann wrote:
> Hm
>
> I ran the job 5 times and made a diff between the outputs and they are not
> the same. I cant find anything in the code that could lead to this
> behaviour.
>
> The only idea where to look a the moment would be the identifier. Has
> anyone experience with String identifier?
> Is a possible that there are problems with comparing and sorting
> TextWritables?
>
> cheers Martin
>


Re: Different num supersteps

Posted by Martin Neumann <mn...@spotify.com>.
Hm

I ran the job 5 times and made a diff between the outputs and they are not
the same. I cant find anything in the code that could lead to this
behaviour.

The only idea where to look a the moment would be the identifier. Has
anyone experience with String identifier?
Is a possible that there are problems with comparing and sorting
TextWritables?

cheers Martin

Re: Different num supersteps

Posted by Martin Neumann <mn...@spotify.com>.
The data I have as input is not in a Graph-Format so I use an
EdgeInputFormat to create a Graph. Its also deterministic so the same Graph
should be build with the same input.

Each line in the input is a set of connected vertices.
I create edges in a way that they form a star around the vertex with the
lowest ID. Every vertex has an edge to that vertex and it has an edge to
all other vertices. See the code further down.

I will see if I can sort the output and then do a diff to see if they have
the same results. Is there a simple way to make sure that the same graph
was build? (The full dataset has around a billion vertices so I can't
visualize it)



private class ConCompInReader extends TextEdgeReader {
        private Text[] nodeArray = null;
        private int otherNodeId = -1;
        NullWritable edgeValue = NullWritable.get();
        // flag used to create edges in both directions
        private boolean flipped = true;
        private boolean done = true;

        @Override
        public final Text getCurrentSourceId() throws IOException,
                InterruptedException {
            if (flipped)
                return nodeArray[otherNodeId];
            return nodeArray[0];
        }

        @Override
        public final Edge<Text, NullWritable> getCurrentEdge()
                throws IOException, InterruptedException {
            if (flipped)
                return EdgeFactory.create(nodeArray[0], edgeValue);
            return EdgeFactory.create(nodeArray[otherNodeId], edgeValue);
        }

        @Override
        public final boolean nextEdge() throws IOException,
                InterruptedException {

            // flipp direction of the edge
            if(!flipped){
                flipped = true;
            }

            // line not yet done and already created the flipped version
            else if(!done){
                otherNodeId++;
                flipped = false;
                if (otherNodeId >= nodeArray.length){
                    done = true;
                }
            }

            // TODO this version ignores singels switch with version below
if needed
            // if done with this line read the next one if there is one
            // if current line is consumed it checks if there is a next one
            if (done) {
                boolean hasNext = getRecordReader().nextKeyValue();
                while (done && hasNext) {

                    String[] stringNodeArray =
getRecordReader().getCurrentValue().toString().split(
                            "\\s+");

                    // found line with more than one URL
                    if (stringNodeArray.length >= 2) {
                        otherNodeId = 1;
                        done = false;
                        flipped = false;
                        Arrays.sort(stringNodeArray);
                        nodeArray = new Text[stringNodeArray.length];
                        for (int i = 0; i < stringNodeArray.length; i++) {
                            nodeArray[i] = new Text(stringNodeArray[i]);
                        }
                    }
                    // if not ignore line and read a new one
                    else {
                        hasNext = getRecordReader().nextKeyValue();
                    }
                }
            }

            // TODO version that does not ignore singels
            /*if (done && getRecordReader().nextKeyValue()) {
                arrayNode = getRecordReader().getCurrentValue();
                String[] stringNodeArray =
arrayNode.toString().split("\\s+");

                otherNodeId = 1;
                done = false;
                flipped = false;
                Arrays.sort(stringNodeArray);
                nodeArray = new Text[stringNodeArray.length];
                for (int i = 0; i < stringNodeArray.length; i++) {
                    nodeArray[i] = new Text(stringNodeArray[i]);
                }

            }*/

            return !done;
        }
    }


On Thu, Feb 27, 2014 at 10:57 AM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Martin,
>
> You are right, this should not happen, your code looks correct. One way to
> check the output would be to simply count the number vertices per component
> and see if that number stays stable.
>
> Do you supply all vertices in your input data or are some vertices created
> during the computation? Maybe there's a bug/race condition with the vertex
> creation (just wildly guessing)
>
> Best,
> Sebastian
>
>
>
> On 02/27/2014 10:48 AM, Martin Neumann wrote:
>
>> Hej,
>>
>> I have modified the connected component example to fit my input data. I
>> expect it to be deterministic.
>>
>> But when I run it multiple times it takes a different number of Super
>> steps. This only happens on the complete dataset and not on my small test
>> dataset.
>>   (So I cannot check the output for correctness in a simple way)
>>
>> Has anyone an Idea how this could happen?
>>
>> cheers Martin
>>
>>
>>
>>
>> In case its useful here the computation class:
>>
>> @Override
>>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>>              Iterable<Text> inmessage) throws IOException {
>>          boolean changed = false;
>>
>> // first superstep && setup
>>          if (getSuperstep() == 0) {
>>              //initialize value
>>              vertex.setValue(vertex.getId());
>>              //cheating by checking the neighbors ID's (cuts down 1
>> iteration)
>>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>>                  Text candidate = e.getTargetVertexId();
>>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(candidate);
>>                  }
>>              }
>>          }
>>
>>          // other superstep
>>          else {
>>              // read all messages and compare with own state
>>              for (Text message : inmessage) {
>>                  if (message.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(message);
>>                  }
>>              }
>>          }
>>
>>          // if state has changed send a message to all neighbors
>>          if (changed && getSuperstep() < limiter) {
>>              sendMessageToAllEdges(vertex, vertex.getValue());
>>          }
>>
>>          vertex.voteToHalt();
>>      }
>>
>>
>

Re: Different num supersteps

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Martin,

You are right, this should not happen, your code looks correct. One way 
to check the output would be to simply count the number vertices per 
component and see if that number stays stable.

Do you supply all vertices in your input data or are some vertices 
created during the computation? Maybe there's a bug/race condition with 
the vertex creation (just wildly guessing)

Best,
Sebastian


On 02/27/2014 10:48 AM, Martin Neumann wrote:
> Hej,
>
> I have modified the connected component example to fit my input data. I
> expect it to be deterministic.
>
> But when I run it multiple times it takes a different number of Super
> steps. This only happens on the complete dataset and not on my small test
> dataset.
>   (So I cannot check the output for correctness in a simple way)
>
> Has anyone an Idea how this could happen?
>
> cheers Martin
>
>
>
>
> In case its useful here the computation class:
>
> @Override
>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>              Iterable<Text> inmessage) throws IOException {
>          boolean changed = false;
>
> // first superstep && setup
>          if (getSuperstep() == 0) {
>              //initialize value
>              vertex.setValue(vertex.getId());
>              //cheating by checking the neighbors ID's (cuts down 1
> iteration)
>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>                  Text candidate = e.getTargetVertexId();
>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(candidate);
>                  }
>              }
>          }
>
>          // other superstep
>          else {
>              // read all messages and compare with own state
>              for (Text message : inmessage) {
>                  if (message.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(message);
>                  }
>              }
>          }
>
>          // if state has changed send a message to all neighbors
>          if (changed && getSuperstep() < limiter) {
>              sendMessageToAllEdges(vertex, vertex.getValue());
>          }
>
>          vertex.voteToHalt();
>      }
>