You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Martin Neumann <mn...@spotify.com> on 2014/03/03 10:05:27 UTC

Re: Different num supersteps

I checked the input just creating the graph and comparing it. While I cant
say the graph is correct (its to big) its at least consistent.

So the only things where the different output can come from is the
connected component part (see code further down). I'm completely stomped,
the code is basically the example code Giraph ships with all I changed was
replacing the IntWriteable Id's with Text.

Anyone has any Idea what the problem could be, I'm running out of Idea's.

cheers Martin



@Override
    public void compute(Vertex<Text, Text, NullWritable> vertex,
            Iterable<Text> inmessage) throws IOException {
        boolean changed = false;

// first superstep && setup
        if (getSuperstep() == 0) {
            //initialize value
            vertex.setValue(vertex.getId()
);
            //cheating by checking the neighbors ID's (cuts down 1
iteration)
            for (Edge<Text, NullWritable> e : vertex.getEdges()) {
                Text candidate = e.getTargetVertexId();
                if (candidate.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(candidate);
                }
            }
        }

        // other superstep
        else {
            // read all messages and compare with own state
            for (Text message : inmessage) {
                if (message.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(message);
                }
            }
        }

        // if state has changed send a message to all neighbors
        if (changed && getSuperstep() < limiter) {
            sendMessageToAllEdges(vertex, vertex.getValue());
        }
        vertex.voteToHalt();
    }



On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com> wrote:

> Hey,
>
> sorry, yes I intended to send it to the mailing-list, thanks for the hint
> (I repost it to the list). Guess I should stop working in the middle of the
> night. :-)
>
> I tested the input format and its output is consistent, I always get the
> same graph when running it. So the bug has to be in the connected component
> part since that one gives me different outputs for the same input graph.
>
> cheers Martin
>
>
> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org> wrote:
>
>> Martin,
>>
>> can you write a MapReduce job that creates your graph and run it with a
>> simpler inputformat?
>>
>> I really suspect that the bug lies somewhere in your input format.
>>
>> --sebastian
>>
>>
>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>
>>> I checked the input just creating the graph and comparing it. While I
>>> cant
>>> say the graph is correct (for its to big) its at least consistent.
>>>
>>> So the only things where the different output can come from is the
>>> connected component part (see code in the first mail). I'm completely
>>> stomped, the code is basically the example code Giraph ships with all I
>>> changed was replacing the IntWriteable Id's with Text.
>>>
>>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>>
>>> cheers Martin
>>>
>>>
>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>>
>>>  Hi Martin
>>>>
>>>> I don't think that there are problems with comparing and sorting Text
>>>> writables as Hadoop is basically a big external sorting system.
>>>>
>>>> I'm not sure I understand your edge input reader, it looks very complex,
>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>> using
>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>> problems still occur.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>
>>>>  Hm
>>>>>
>>>>> I ran the job 5 times and made a diff between the outputs and they are
>>>>> not
>>>>> the same. I cant find anything in the code that could lead to this
>>>>> behaviour.
>>>>>
>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>> anyone experience with String identifier?
>>>>> Is a possible that there are problems with comparing and sorting
>>>>> TextWritables?
>>>>>
>>>>> cheers Martin
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Different num supersteps

Posted by Martin Neumann <mn...@spotify.com>.
Hi,

I managed to fix it even if I'm still not entirely sure what happened.
The fix is to make a new Text object every time a Text is required as input
(Text does not implement Cloneable). I guess it

So instead of:
  Text candidate = e.getTargetVertexId();
  ...
  vertex.setValue(candidate))

The working version is:
  Text candidate = e.getTargetVertexId();
  ...
  vertex.setValue(new Text(candidate.toString()))
















@Override
    public void compute(Vertex<Text, Text, NullWritable> vertex,
            Iterable<Text> inmessage) throws IOException {

        boolean changed = false;

        // first superstep && setup
        if (getSuperstep() == 0) {
            //initialize value
            vertex.setValue(new Text(vertex.getId().toString()));
            //cheating by checking the neighbors ID's (cuts down 1
iteration)
            for (Edge<Text, NullWritable> e : vertex.getEdges()) {
                Text candidate = e.getTargetVertexId();
                if (candidate.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(new Text(candidate.toString()));
                }
            }
        }

        // other superstep
        else {
            // read all messages and compare with own state
            for (Text message : inmessage) {
                if (message.compareTo(vertex.getValue()) < 0) {
                    changed = true;
                    vertex.setValue(new Text(message.toString()));
                }
            }
        }

        // if state has changed send a message to all neighbors
        if (changed && getSuperstep() < limiter) {
            sendMessageToAllEdges(vertex, new
Text(vertex.getValue().toString()));
        }

        vertex.voteToHalt();
    }




On Mon, Mar 3, 2014 at 2:19 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Martin,
>
> I'm not sure wether we require InputFormats to be threadsafe. Can someone
> answer that question?
>
> Maybe thats the reason you see this behavior.
>
> --sebastian
>
>
>
> On 03/03/2014 10:05 AM, Martin Neumann wrote:
>
>> I checked the input just creating the graph and comparing it. While I cant
>> say the graph is correct (its to big) its at least consistent.
>>
>> So the only things where the different output can come from is the
>> connected component part (see code further down). I'm completely stomped,
>> the code is basically the example code Giraph ships with all I changed was
>> replacing the IntWriteable Id's with Text.
>>
>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>
>> cheers Martin
>>
>>
>>
>> @Override
>>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>>              Iterable<Text> inmessage) throws IOException {
>>          boolean changed = false;
>>
>> // first superstep && setup
>>          if (getSuperstep() == 0) {
>>              //initialize value
>>              vertex.setValue(vertex.getId()
>> );
>>              //cheating by checking the neighbors ID's (cuts down 1
>> iteration)
>>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>>                  Text candidate = e.getTargetVertexId();
>>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(candidate);
>>                  }
>>              }
>>          }
>>
>>          // other superstep
>>          else {
>>              // read all messages and compare with own state
>>              for (Text message : inmessage) {
>>                  if (message.compareTo(vertex.getValue()) < 0) {
>>                      changed = true;
>>                      vertex.setValue(message);
>>                  }
>>              }
>>          }
>>
>>          // if state has changed send a message to all neighbors
>>          if (changed && getSuperstep() < limiter) {
>>              sendMessageToAllEdges(vertex, vertex.getValue());
>>          }
>>          vertex.voteToHalt();
>>      }
>>
>>
>>
>> On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com>
>> wrote:
>>
>>  Hey,
>>>
>>> sorry, yes I intended to send it to the mailing-list, thanks for the hint
>>> (I repost it to the list). Guess I should stop working in the middle of
>>> the
>>> night. :-)
>>>
>>> I tested the input format and its output is consistent, I always get the
>>> same graph when running it. So the bug has to be in the connected
>>> component
>>> part since that one gives me different outputs for the same input graph.
>>>
>>> cheers Martin
>>>
>>>
>>> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>>
>>>  Martin,
>>>>
>>>> can you write a MapReduce job that creates your graph and run it with a
>>>> simpler inputformat?
>>>>
>>>> I really suspect that the bug lies somewhere in your input format.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>>>
>>>>  I checked the input just creating the graph and comparing it. While I
>>>>> cant
>>>>> say the graph is correct (for its to big) its at least consistent.
>>>>>
>>>>> So the only things where the different output can come from is the
>>>>> connected component part (see code in the first mail). I'm completely
>>>>> stomped, the code is basically the example code Giraph ships with all I
>>>>> changed was replacing the IntWriteable Id's with Text.
>>>>>
>>>>> Anyone has any Idea what the problem could be, I'm running out of
>>>>> Idea's.
>>>>>
>>>>> cheers Martin
>>>>>
>>>>>
>>>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>>>> wrote:
>>>>>
>>>>>   Hi Martin
>>>>>
>>>>>>
>>>>>> I don't think that there are problems with comparing and sorting Text
>>>>>> writables as Hadoop is basically a big external sorting system.
>>>>>>
>>>>>> I'm not sure I understand your edge input reader, it looks very
>>>>>> complex,
>>>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>>>> using
>>>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>>>> problems still occur.
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>>
>>>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>>>
>>>>>>   Hm
>>>>>>
>>>>>>>
>>>>>>> I ran the job 5 times and made a diff between the outputs and they
>>>>>>> are
>>>>>>> not
>>>>>>> the same. I cant find anything in the code that could lead to this
>>>>>>> behaviour.
>>>>>>>
>>>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>>>> anyone experience with String identifier?
>>>>>>> Is a possible that there are problems with comparing and sorting
>>>>>>> TextWritables?
>>>>>>>
>>>>>>> cheers Martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Different num supersteps

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Martin,

I'm not sure wether we require InputFormats to be threadsafe. Can 
someone answer that question?

Maybe thats the reason you see this behavior.

--sebastian


On 03/03/2014 10:05 AM, Martin Neumann wrote:
> I checked the input just creating the graph and comparing it. While I cant
> say the graph is correct (its to big) its at least consistent.
>
> So the only things where the different output can come from is the
> connected component part (see code further down). I'm completely stomped,
> the code is basically the example code Giraph ships with all I changed was
> replacing the IntWriteable Id's with Text.
>
> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>
> cheers Martin
>
>
>
> @Override
>      public void compute(Vertex<Text, Text, NullWritable> vertex,
>              Iterable<Text> inmessage) throws IOException {
>          boolean changed = false;
>
> // first superstep && setup
>          if (getSuperstep() == 0) {
>              //initialize value
>              vertex.setValue(vertex.getId()
> );
>              //cheating by checking the neighbors ID's (cuts down 1
> iteration)
>              for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>                  Text candidate = e.getTargetVertexId();
>                  if (candidate.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(candidate);
>                  }
>              }
>          }
>
>          // other superstep
>          else {
>              // read all messages and compare with own state
>              for (Text message : inmessage) {
>                  if (message.compareTo(vertex.getValue()) < 0) {
>                      changed = true;
>                      vertex.setValue(message);
>                  }
>              }
>          }
>
>          // if state has changed send a message to all neighbors
>          if (changed && getSuperstep() < limiter) {
>              sendMessageToAllEdges(vertex, vertex.getValue());
>          }
>          vertex.voteToHalt();
>      }
>
>
>
> On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mn...@spotify.com> wrote:
>
>> Hey,
>>
>> sorry, yes I intended to send it to the mailing-list, thanks for the hint
>> (I repost it to the list). Guess I should stop working in the middle of the
>> night. :-)
>>
>> I tested the input format and its output is consistent, I always get the
>> same graph when running it. So the bug has to be in the connected component
>> part since that one gives me different outputs for the same input graph.
>>
>> cheers Martin
>>
>>
>> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <ss...@apache.org> wrote:
>>
>>> Martin,
>>>
>>> can you write a MapReduce job that creates your graph and run it with a
>>> simpler inputformat?
>>>
>>> I really suspect that the bug lies somewhere in your input format.
>>>
>>> --sebastian
>>>
>>>
>>> On 03/02/2014 09:48 PM, Martin Neumann wrote:
>>>
>>>> I checked the input just creating the graph and comparing it. While I
>>>> cant
>>>> say the graph is correct (for its to big) its at least consistent.
>>>>
>>>> So the only things where the different output can come from is the
>>>> connected component part (see code in the first mail). I'm completely
>>>> stomped, the code is basically the example code Giraph ships with all I
>>>> changed was replacing the IntWriteable Id's with Text.
>>>>
>>>> Anyone has any Idea what the problem could be, I'm running out of Idea's.
>>>>
>>>> cheers Martin
>>>>
>>>>
>>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <ss...@apache.org>
>>>> wrote:
>>>>
>>>>   Hi Martin
>>>>>
>>>>> I don't think that there are problems with comparing and sorting Text
>>>>> writables as Hadoop is basically a big external sorting system.
>>>>>
>>>>> I'm not sure I understand your edge input reader, it looks very complex,
>>>>> maybe there's a bug somewhere. You could try to preprocess your data
>>>>> using
>>>>> Hadoop so that you can use a simple VertexInputFormat and see if your
>>>>> problems still occur.
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote:
>>>>>
>>>>>   Hm
>>>>>>
>>>>>> I ran the job 5 times and made a diff between the outputs and they are
>>>>>> not
>>>>>> the same. I cant find anything in the code that could lead to this
>>>>>> behaviour.
>>>>>>
>>>>>> The only idea where to look a the moment would be the identifier. Has
>>>>>> anyone experience with String identifier?
>>>>>> Is a possible that there are problems with comparing and sorting
>>>>>> TextWritables?
>>>>>>
>>>>>> cheers Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>