You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by SWP <co...@unister-gmbh.de> on 2012/04/27 16:56:48 UTC

How to check whether the target node of an edge exists?

I am dealing with the PageRank example
from  hama-dist-0.5.0-incubating-source.tar.gz RC2
which I downloaded from http://people.apache.org/~edwardyoon/dist/
a few days ago.

My input graph has some "dangling edges", that is, edges pointing to 
non-existing nodes.
Here are the adjacencies of a small example. The format is:
source target1 target2 target3 ...

0    1    2
1    1    2
2    1    2    3
5

Your see that 2 has an edge directed to 3 but there is no adjacency list 
given for 3.

Now, when I run this example through pagerank-text2seq and then the 
pagerank examle, I get a NullPointerException:

12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
java.lang.NullPointerException
     at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:96)
     at 
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
     at 
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
     at 
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:1)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

The problem appears to be that when GraphJobRunner's bsp() method  looks 
up the vertex to which the message is addressed,
it is not found in the vertices map.
(By the way, if you replace 5 with 3 in the example, it works - because 
then the target vertex can be looked up.)

See the vertices.get(e.getKey()) statement in the code snippet below.
Of course one can avoid the exception by adding a check in 
GraphJobRunner.java (at line about 95) like this:

         if(vertices.containsKey(e.getKey()))
         {
             vertices.get(e.getKey()).compute(msgs.iterator());
         } else {
             System.out.println("Ignoring message(s) '" + msgs + "' sent 
to vertex '" + e.getKey() +"'");
         }

However, what I really want is:
check within PageRank.PageRankVertex's compute() method whether the 
target vertex exists
before sending out a message to it.

That is, in PageRank.java (line 60) , instead of

                 sendMessageToNeighbors(new 
DoubleWritable(this.getValue().get() / numEdges));

I would like to send messages only  to "existing" vertices, that is,  
those which have an adjacency list in the input.

Any hints how this can be achieved?
I appears that I am not supposed to access the vertices field of 
GraphJobRunner class in some way from within the PageRank.PageRankVertex 
class?

I concede that my example graph may qualify as invalid input ... but on 
the other hand: how could I add those missing vertices after a first 
pass through the adjacency lists input?

Clemens Gröpl

-- 

Semantic Web Project, IT

Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig

Telefon: +49 (0)341 49288 4496
contact-semweb@unister-gmbh.de <ma...@unister-gmbh.de>
www.unister.de <http://www.unister.de>

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056


Re: How to check whether the target node of an edge exists?

Posted by SWP <co...@unister-gmbh.de>.
Hi Thomas,

Getting back to this, I see you have created issue
https://issues.apache.org/jira/browse/HAMA-571
for this, thanks!

See my next posting.
Clemens


On 27.04.2012 20:00, Thomas Jungblut wrote:
> Well, no you got it wrong.
>
> Hama's graph package is designed to have a easy to develop API for simple
> graph problems with a vertex-centric paradigm.
> However, and that is where we distinguish from giraph, we provide the low
> level access to the underlying BSP methods and implementations.
>
> How to fix the problem you have without the need of preprocessing your data?
> You can rework the GraphJobRunner class, to check in the setup method if
> the vertices map is correctly filled.
> A simple pseudo code could look like this:
>
> // after load vertices into the map and edges have been setup
>> for(Vertex v : adjacencyList)
>>   for(Edge neighbor : v.getOutEdges())
>>      // do the hash partitioning trick
>>     peer.send(peer.getAllPeerNames()[Math.abs( neighbor.getDestVertexID()
>> % peer.getNumPeers())], new Text(neighbor.getDestVertexID()));
>> // sync
>> peer.sync();
>> Text vertexName = null;
>> while((vertexName = (Text) peer.getCurrentMessage()) != null)
>>   // check if the vertexName is contained in that peer's vertices map
>>   if(!vertices.containsKey(vertexName.get()))
>>      vertices.put(vertexName.get(), new Vertex(vertexName.get()));
>
>
> That is all, and solve the current problems with your dangling nodes.
> Good question why we haven't implemented it yet, but maybe because it adds
> overhead at the beginning of a job. And we also assume that the resulting
> structure in the graph is complete.
> We want to provide a real framework where you can plug all the stuff
> arround together and modify single parts of it very easy.
>
> Hope you got the gist, if you need a patch or stuff, I can implement it
> correctly for you. (we should add this anyways as a configurable
> preprocessing).
> Thanks for pointing to it!
>
> Regards,
> Thomas
>
>

-- 

Semantic Web Project, IT

Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig

Telefon: +49 (0)341 49288 4496
contact-semweb@unister-gmbh.de <ma...@unister-gmbh.de>
www.unister.de <http://www.unister.de>

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056


Re: How to check whether the target node of an edge exists?

Posted by Thomas Jungblut <th...@googlemail.com>.
Well, no you got it wrong.

Hama's graph package is designed to have a easy to develop API for simple
graph problems with a vertex-centric paradigm.
However, and that is where we distinguish from giraph, we provide the low
level access to the underlying BSP methods and implementations.

How to fix the problem you have without the need of preprocessing your data?
You can rework the GraphJobRunner class, to check in the setup method if
the vertices map is correctly filled.
A simple pseudo code could look like this:

// after load vertices into the map and edges have been setup
> for(Vertex v : adjacencyList)
>   for(Edge neighbor : v.getOutEdges())
>      // do the hash partitioning trick
>     peer.send(peer.getAllPeerNames()[Math.abs( neighbor.getDestVertexID()
> % peer.getNumPeers())], new Text(neighbor.getDestVertexID()));
> // sync
> peer.sync();
> Text vertexName = null;
> while((vertexName = (Text) peer.getCurrentMessage()) != null)
>   // check if the vertexName is contained in that peer's vertices map
>   if(!vertices.containsKey(vertexName.get()))
>      vertices.put(vertexName.get(), new Vertex(vertexName.get()));



That is all, and solve the current problems with your dangling nodes.
Good question why we haven't implemented it yet, but maybe because it adds
overhead at the beginning of a job. And we also assume that the resulting
structure in the graph is complete.
We want to provide a real framework where you can plug all the stuff
arround together and modify single parts of it very easy.

Hope you got the gist, if you need a patch or stuff, I can implement it
correctly for you. (we should add this anyways as a configurable
preprocessing).
Thanks for pointing to it!

Regards,
Thomas

Am 27. April 2012 19:34 schrieb SWP <co...@unister-gmbh.de>:

> Hi Thomas,
>
> so ... just to make sure I understood you right:
>
> You are saying that:
> fiddling with the input in a way that adds a couple of nodes having empty
> adjacency lists
> is below the level of abstraction that Hama intends to provide, correct?
>
> Sure we will find a way to adjust our data, but I think the question is
> somewhat at the borderline.
> I will think about a way to add those missing nodes when the input is
> read, before the BSP starts.
>
> Thank you very much.
> Clemens
>
>
> Am 27.04.2012 17:24, schrieb Thomas Jungblut:
>
>> Oh sorry I have not read the other half of your mail.
>> I have made a mapreduce preprocessing step (yes mapreduce is the right
>> answer for that here) for that which can be found here:
>>
>> https://github.com/**thomasjungblut/thomasjungblut-**
>> common/blob/master/src/de/**jungblut/crawl/processing/**
>> WebGraphProcessingJob.java<https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java>
>> It traverses the graph and the reducer makes the right output which can be
>> processed by the job or the TextToSeq utility as a preprocessing for
>> pagerank, just have a look into the example package.
>>
>> I appears that I am not supposed to access the vertices field of
>>
>>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>>> class?
>>>
>>>  Yes in this new Pregel-Like API this falls under information-hiding. If
>> you
>> have a look into the 4.0 release, there is the hardcore version of
>> pagerank
>> with plain BSP, there you can access and modify all the stuff you want.
>> (but it is more complicated)
>>
>> Hope it helps, if you have additional problems, don't hesitate to ask
>> them.
>>
>>
>> Am 27. April 2012 17:10 schrieb Thomas Jungblut<
>> thomas.jungblut@googlemail.com**>:
>>
>>  You have to "address" dangling nodes on your adjacency list.
>>>
>>> So your input must look like:
>>>
>>>
>>> 0    1    2
>>> 1    1    2
>>> 2    1    2    3
>>> 3<-- this one was missing causing the Null Pointer Exception.
>>> 5
>>>
>>> See http://wiki.apache.org/hama/**PageRank<http://wiki.apache.org/hama/PageRank>under "Submit your own Webgraph".
>>>
>>>  This piece of text will adjacent Site1 to Site2 and Site3, Site2 to
>>>> Site3
>>>> and Site3 is a dangling node. As you can see a site is always on the
>>>> leftmost side (we call it the key-site), and the outlinks are seperated
>>>> by
>>>> tabs (\t) as the following elements.
>>>> Make sure that every site's outlink can somewhere be found in the file
>>>> as
>>>> a key-site. Otherwise it will result in weird NullPointerExceptions<
>>>> http://**wiki.apache.org/hama/**NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>
>>>> >.
>>>>
>>>>
>>> Good luck.
>>>
>>> Am 27. April 2012 16:56 schrieb SWP<co...@unister-gmbh.de>
>>> >:
>>>
>>> I am dealing with the PageRank example
>>>
>>>> from  hama-dist-0.5.0-incubating-****source.tar.gz RC2
>>>> which I downloaded from http://people.apache.org/~****edwardyoon/dist/<http://people.apache.org/~**edwardyoon/dist/>
>>>> <http://**people.apache.org/%**7Eedwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
>>>> >
>>>>
>>>> a few days ago.
>>>>
>>>> My input graph has some "dangling edges", that is, edges pointing to
>>>> non-existing nodes.
>>>> Here are the adjacencies of a small example. The format is:
>>>> source target1 target2 target3 ...
>>>>
>>>> 0    1    2
>>>> 1    1    2
>>>> 2    1    2    3
>>>> 5
>>>>
>>>> Your see that 2 has an edge directed to 3 but there is no adjacency list
>>>> given for 3.
>>>>
>>>> Now, when I run this example through pagerank-text2seq and then the
>>>> pagerank examle, I get a NullPointerException:
>>>>
>>>> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP
>>>> execution!
>>>> java.lang.NullPointerException
>>>>    at org.apache.hama.graph.****GraphJobRunner.bsp(**
>>>> GraphJobRunner.java:96)
>>>>    at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.run(****
>>>> LocalBSPRunner.java:256)
>>>>    at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.call(****
>>>> LocalBSPRunner.java:286)
>>>>    at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.call(****
>>>> LocalBSPRunner.java:1)
>>>>    at java.util.concurrent.****FutureTask$Sync.innerRun(**
>>>> FutureTask.java:303)
>>>>    at java.util.concurrent.****FutureTask.run(FutureTask.****java:138)
>>>>    at java.util.concurrent.****ThreadPoolExecutor$Worker.**
>>>> runTask(ThreadPoolExecutor.****java:886)
>>>>    at java.util.concurrent.****ThreadPoolExecutor$Worker.run(****
>>>> ThreadPoolExecutor.java:908)
>>>>    at java.lang.Thread.run(Thread.****java:662)
>>>>
>>>>
>>>> The problem appears to be that when GraphJobRunner's bsp() method  looks
>>>> up the vertex to which the message is addressed,
>>>> it is not found in the vertices map.
>>>> (By the way, if you replace 5 with 3 in the example, it works - because
>>>> then the target vertex can be looked up.)
>>>>
>>>> See the vertices.get(e.getKey()) statement in the code snippet below.
>>>> Of course one can avoid the exception by adding a check in
>>>> GraphJobRunner.java (at line about 95) like this:
>>>>
>>>>        if(vertices.containsKey(e.****getKey()))
>>>>        {
>>>>            vertices.get(e.getKey()).****compute(msgs.iterator());
>>>>
>>>>        } else {
>>>>            System.out.println("Ignoring message(s) '" + msgs + "' sent
>>>> to
>>>> vertex '" + e.getKey() +"'");
>>>>        }
>>>>
>>>> However, what I really want is:
>>>> check within PageRank.PageRankVertex's compute() method whether the
>>>> target vertex exists
>>>> before sending out a message to it.
>>>>
>>>> That is, in PageRank.java (line 60) , instead of
>>>>
>>>>                sendMessageToNeighbors(new DoubleWritable(this.getValue()
>>>> ***
>>>>
>>>> *.get() / numEdges));
>>>>
>>>> I would like to send messages only  to "existing" vertices, that is,
>>>>  those which have an adjacency list in the input.
>>>>
>>>> Any hints how this can be achieved?
>>>> I appears that I am not supposed to access the vertices field of
>>>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>>>> class?
>>>>
>>>> I concede that my example graph may qualify as invalid input ... but on
>>>> the other hand: how could I add those missing vertices after a first
>>>> pass
>>>> through the adjacency lists input?
>>>>
>>>> Clemens Gröpl
>>>>
>>>> --
>>>>
>>>> Semantic Web Project, IT
>>>>
>>>> Unister GmbH
>>>> Barfußgäßchen 11 | 04109 Leipzig
>>>>
>>>> Telefon: +49 (0)341 49288 4496
>>>> contact-semweb@unister-gmbh.de**<mailto:%20contact-semweb@**
>>>> unister-gmbh.de<20...@unister-gmbh.de>
>>>> >>
>>>>
>>>> www.unister.de<http://www.**unister.de <http://www.unister.de>>
>>>>
>>>> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
>>>> Amtsgericht Leipzig, HRB: 19056
>>>>
>>>>
>>>>
>>> --
>>> Thomas Jungblut
>>> Berlin<thomas.jungblut@gmail.**com <th...@gmail.com>>
>>>
>>>
>>
>>
>
> --
>
> Semantic Web Project, IT
>
> Unister GmbH
> Barfußgäßchen 11 | 04109 Leipzig
>
> Telefon: +49 (0)341 49288 4496
> contact-semweb@unister-gmbh.de <ma...@unister-gmbh.de>
> >
> www.unister.de <http://www.unister.de>
>
> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
> Amtsgericht Leipzig, HRB: 19056
>
>


-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: How to check whether the target node of an edge exists?

Posted by SWP <co...@unister-gmbh.de>.
Hi Thomas,

so ... just to make sure I understood you right:

You are saying that:
fiddling with the input in a way that adds a couple of nodes having 
empty adjacency lists
is below the level of abstraction that Hama intends to provide, correct?

Sure we will find a way to adjust our data, but I think the question is 
somewhat at the borderline.
I will think about a way to add those missing nodes when the input is 
read, before the BSP starts.

Thank you very much.
Clemens


Am 27.04.2012 17:24, schrieb Thomas Jungblut:
> Oh sorry I have not read the other half of your mail.
> I have made a mapreduce preprocessing step (yes mapreduce is the right
> answer for that here) for that which can be found here:
>
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java
> It traverses the graph and the reducer makes the right output which can be
> processed by the job or the TextToSeq utility as a preprocessing for
> pagerank, just have a look into the example package.
>
> I appears that I am not supposed to access the vertices field of
>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>> class?
>>
> Yes in this new Pregel-Like API this falls under information-hiding. If you
> have a look into the 4.0 release, there is the hardcore version of pagerank
> with plain BSP, there you can access and modify all the stuff you want.
> (but it is more complicated)
>
> Hope it helps, if you have additional problems, don't hesitate to ask them.
>
>
> Am 27. April 2012 17:10 schrieb Thomas Jungblut<
> thomas.jungblut@googlemail.com>:
>
>> You have to "address" dangling nodes on your adjacency list.
>>
>> So your input must look like:
>>
>>
>> 0    1    2
>> 1    1    2
>> 2    1    2    3
>> 3<-- this one was missing causing the Null Pointer Exception.
>> 5
>>
>> See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph".
>>
>>> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3
>>> and Site3 is a dangling node. As you can see a site is always on the
>>> leftmost side (we call it the key-site), and the outlinks are seperated by
>>> tabs (\t) as the following elements.
>>> Make sure that every site's outlink can somewhere be found in the file as
>>> a key-site. Otherwise it will result in weird NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>.
>>>
>>
>> Good luck.
>>
>> Am 27. April 2012 16:56 schrieb SWP<co...@unister-gmbh.de>:
>>
>> I am dealing with the PageRank example
>>> from  hama-dist-0.5.0-incubating-**source.tar.gz RC2
>>> which I downloaded from http://people.apache.org/~**edwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
>>> a few days ago.
>>>
>>> My input graph has some "dangling edges", that is, edges pointing to
>>> non-existing nodes.
>>> Here are the adjacencies of a small example. The format is:
>>> source target1 target2 target3 ...
>>>
>>> 0    1    2
>>> 1    1    2
>>> 2    1    2    3
>>> 5
>>>
>>> Your see that 2 has an edge directed to 3 but there is no adjacency list
>>> given for 3.
>>>
>>> Now, when I run this example through pagerank-text2seq and then the
>>> pagerank examle, I get a NullPointerException:
>>>
>>> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP
>>> execution!
>>> java.lang.NullPointerException
>>>     at org.apache.hama.graph.**GraphJobRunner.bsp(**
>>> GraphJobRunner.java:96)
>>>     at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(**
>>> LocalBSPRunner.java:256)
>>>     at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>>> LocalBSPRunner.java:286)
>>>     at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>>> LocalBSPRunner.java:1)
>>>     at java.util.concurrent.**FutureTask$Sync.innerRun(**
>>> FutureTask.java:303)
>>>     at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>>>     at java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>> runTask(ThreadPoolExecutor.**java:886)
>>>     at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>> ThreadPoolExecutor.java:908)
>>>     at java.lang.Thread.run(Thread.**java:662)
>>>
>>> The problem appears to be that when GraphJobRunner's bsp() method  looks
>>> up the vertex to which the message is addressed,
>>> it is not found in the vertices map.
>>> (By the way, if you replace 5 with 3 in the example, it works - because
>>> then the target vertex can be looked up.)
>>>
>>> See the vertices.get(e.getKey()) statement in the code snippet below.
>>> Of course one can avoid the exception by adding a check in
>>> GraphJobRunner.java (at line about 95) like this:
>>>
>>>         if(vertices.containsKey(e.**getKey()))
>>>         {
>>>             vertices.get(e.getKey()).**compute(msgs.iterator());
>>>         } else {
>>>             System.out.println("Ignoring message(s) '" + msgs + "' sent to
>>> vertex '" + e.getKey() +"'");
>>>         }
>>>
>>> However, what I really want is:
>>> check within PageRank.PageRankVertex's compute() method whether the
>>> target vertex exists
>>> before sending out a message to it.
>>>
>>> That is, in PageRank.java (line 60) , instead of
>>>
>>>                 sendMessageToNeighbors(new DoubleWritable(this.getValue()*
>>> *.get() / numEdges));
>>>
>>> I would like to send messages only  to "existing" vertices, that is,
>>>   those which have an adjacency list in the input.
>>>
>>> Any hints how this can be achieved?
>>> I appears that I am not supposed to access the vertices field of
>>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>>> class?
>>>
>>> I concede that my example graph may qualify as invalid input ... but on
>>> the other hand: how could I add those missing vertices after a first pass
>>> through the adjacency lists input?
>>>
>>> Clemens Gröpl
>>>
>>> --
>>>
>>> Semantic Web Project, IT
>>>
>>> Unister GmbH
>>> Barfußgäßchen 11 | 04109 Leipzig
>>>
>>> Telefon: +49 (0)341 49288 4496
>>> contact-semweb@unister-gmbh.de<mailto:%20contact-semweb@**
>>> unister-gmbh.de<20...@unister-gmbh.de>>
>>> www.unister.de<http://www.unister.de>
>>>
>>> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
>>> Amtsgericht Leipzig, HRB: 19056
>>>
>>>
>>
>> --
>> Thomas Jungblut
>> Berlin<th...@gmail.com>
>>
>
>


-- 

Semantic Web Project, IT

Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig

Telefon: +49 (0)341 49288 4496
contact-semweb@unister-gmbh.de <ma...@unister-gmbh.de>
www.unister.de <http://www.unister.de>

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056


Re: How to check whether the target node of an edge exists?

Posted by Thomas Jungblut <th...@googlemail.com>.
Oh sorry I have not read the other half of your mail.
I have made a mapreduce preprocessing step (yes mapreduce is the right
answer for that here) for that which can be found here:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java
>

It traverses the graph and the reducer makes the right output which can be
processed by the job or the TextToSeq utility as a preprocessing for
pagerank, just have a look into the example package.

I appears that I am not supposed to access the vertices field of
> GraphJobRunner class in some way from within the PageRank.PageRankVertex
> class?
>

Yes in this new Pregel-Like API this falls under information-hiding. If you
have a look into the 4.0 release, there is the hardcore version of pagerank
with plain BSP, there you can access and modify all the stuff you want.
(but it is more complicated)

Hope it helps, if you have additional problems, don't hesitate to ask them.


Am 27. April 2012 17:10 schrieb Thomas Jungblut <
thomas.jungblut@googlemail.com>:

> You have to "address" dangling nodes on your adjacency list.
>
> So your input must look like:
>
>
> 0    1    2
> 1    1    2
> 2    1    2    3
> 3 <-- this one was missing causing the Null Pointer Exception.
> 5
>
> See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph".
>
>> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3
>> and Site3 is a dangling node. As you can see a site is always on the
>> leftmost side (we call it the key-site), and the outlinks are seperated by
>> tabs (\t) as the following elements.
>> Make sure that every site's outlink can somewhere be found in the file as
>> a key-site. Otherwise it will result in weird NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>.
>>
>
>
> Good luck.
>
> Am 27. April 2012 16:56 schrieb SWP <co...@unister-gmbh.de>:
>
> I am dealing with the PageRank example
>> from  hama-dist-0.5.0-incubating-**source.tar.gz RC2
>> which I downloaded from http://people.apache.org/~**edwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
>> a few days ago.
>>
>> My input graph has some "dangling edges", that is, edges pointing to
>> non-existing nodes.
>> Here are the adjacencies of a small example. The format is:
>> source target1 target2 target3 ...
>>
>> 0    1    2
>> 1    1    2
>> 2    1    2    3
>> 5
>>
>> Your see that 2 has an edge directed to 3 but there is no adjacency list
>> given for 3.
>>
>> Now, when I run this example through pagerank-text2seq and then the
>> pagerank examle, I get a NullPointerException:
>>
>> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP
>> execution!
>> java.lang.NullPointerException
>>    at org.apache.hama.graph.**GraphJobRunner.bsp(**
>> GraphJobRunner.java:96)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(**
>> LocalBSPRunner.java:256)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>> LocalBSPRunner.java:286)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>> LocalBSPRunner.java:1)
>>    at java.util.concurrent.**FutureTask$Sync.innerRun(**
>> FutureTask.java:303)
>>    at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>>    at java.util.concurrent.**ThreadPoolExecutor$Worker.**
>> runTask(ThreadPoolExecutor.**java:886)
>>    at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>> ThreadPoolExecutor.java:908)
>>    at java.lang.Thread.run(Thread.**java:662)
>>
>> The problem appears to be that when GraphJobRunner's bsp() method  looks
>> up the vertex to which the message is addressed,
>> it is not found in the vertices map.
>> (By the way, if you replace 5 with 3 in the example, it works - because
>> then the target vertex can be looked up.)
>>
>> See the vertices.get(e.getKey()) statement in the code snippet below.
>> Of course one can avoid the exception by adding a check in
>> GraphJobRunner.java (at line about 95) like this:
>>
>>        if(vertices.containsKey(e.**getKey()))
>>        {
>>            vertices.get(e.getKey()).**compute(msgs.iterator());
>>        } else {
>>            System.out.println("Ignoring message(s) '" + msgs + "' sent to
>> vertex '" + e.getKey() +"'");
>>        }
>>
>> However, what I really want is:
>> check within PageRank.PageRankVertex's compute() method whether the
>> target vertex exists
>> before sending out a message to it.
>>
>> That is, in PageRank.java (line 60) , instead of
>>
>>                sendMessageToNeighbors(new DoubleWritable(this.getValue()*
>> *.get() / numEdges));
>>
>> I would like to send messages only  to "existing" vertices, that is,
>>  those which have an adjacency list in the input.
>>
>> Any hints how this can be achieved?
>> I appears that I am not supposed to access the vertices field of
>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>> class?
>>
>> I concede that my example graph may qualify as invalid input ... but on
>> the other hand: how could I add those missing vertices after a first pass
>> through the adjacency lists input?
>>
>> Clemens Gröpl
>>
>> --
>>
>> Semantic Web Project, IT
>>
>> Unister GmbH
>> Barfußgäßchen 11 | 04109 Leipzig
>>
>> Telefon: +49 (0)341 49288 4496
>> contact-semweb@unister-gmbh.de <mailto:%20contact-semweb@**
>> unister-gmbh.de <20...@unister-gmbh.de>>
>> www.unister.de <http://www.unister.de>
>>
>> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
>> Amtsgericht Leipzig, HRB: 19056
>>
>>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: How to check whether the target node of an edge exists?

Posted by Thomas Jungblut <th...@googlemail.com>.
You have to "address" dangling nodes on your adjacency list.

So your input must look like:

0    1    2
1    1    2
2    1    2    3
3 <-- this one was missing causing the Null Pointer Exception.
5

See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph".

> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3
> and Site3 is a dangling node. As you can see a site is always on the
> leftmost side (we call it the key-site), and the outlinks are seperated by
> tabs (\t) as the following elements.
> Make sure that every site's outlink can somewhere be found in the file as
> a key-site. Otherwise it will result in weird NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>.
>


Good luck.

Am 27. April 2012 16:56 schrieb SWP <co...@unister-gmbh.de>:

> I am dealing with the PageRank example
> from  hama-dist-0.5.0-incubating-**source.tar.gz RC2
> which I downloaded from http://people.apache.org/~**edwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
> a few days ago.
>
> My input graph has some "dangling edges", that is, edges pointing to
> non-existing nodes.
> Here are the adjacencies of a small example. The format is:
> source target1 target2 target3 ...
>
> 0    1    2
> 1    1    2
> 2    1    2    3
> 5
>
> Your see that 2 has an edge directed to 3 but there is no adjacency list
> given for 3.
>
> Now, when I run this example through pagerank-text2seq and then the
> pagerank examle, I get a NullPointerException:
>
> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
> java.lang.NullPointerException
>    at org.apache.hama.graph.**GraphJobRunner.bsp(**GraphJobRunner.java:96)
>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(**
> LocalBSPRunner.java:256)
>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
> LocalBSPRunner.java:286)
>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
> LocalBSPRunner.java:1)
>    at java.util.concurrent.**FutureTask$Sync.innerRun(**
> FutureTask.java:303)
>    at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>    at java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>    at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.**java:662)
>
> The problem appears to be that when GraphJobRunner's bsp() method  looks
> up the vertex to which the message is addressed,
> it is not found in the vertices map.
> (By the way, if you replace 5 with 3 in the example, it works - because
> then the target vertex can be looked up.)
>
> See the vertices.get(e.getKey()) statement in the code snippet below.
> Of course one can avoid the exception by adding a check in
> GraphJobRunner.java (at line about 95) like this:
>
>        if(vertices.containsKey(e.**getKey()))
>        {
>            vertices.get(e.getKey()).**compute(msgs.iterator());
>        } else {
>            System.out.println("Ignoring message(s) '" + msgs + "' sent to
> vertex '" + e.getKey() +"'");
>        }
>
> However, what I really want is:
> check within PageRank.PageRankVertex's compute() method whether the target
> vertex exists
> before sending out a message to it.
>
> That is, in PageRank.java (line 60) , instead of
>
>                sendMessageToNeighbors(new DoubleWritable(this.getValue()**.get()
> / numEdges));
>
> I would like to send messages only  to "existing" vertices, that is,
>  those which have an adjacency list in the input.
>
> Any hints how this can be achieved?
> I appears that I am not supposed to access the vertices field of
> GraphJobRunner class in some way from within the PageRank.PageRankVertex
> class?
>
> I concede that my example graph may qualify as invalid input ... but on
> the other hand: how could I add those missing vertices after a first pass
> through the adjacency lists input?
>
> Clemens Gröpl
>
> --
>
> Semantic Web Project, IT
>
> Unister GmbH
> Barfußgäßchen 11 | 04109 Leipzig
>
> Telefon: +49 (0)341 49288 4496
> contact-semweb@unister-gmbh.de <ma...@unister-gmbh.de>
> >
> www.unister.de <http://www.unister.de>
>
> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
> Amtsgericht Leipzig, HRB: 19056
>
>


-- 
Thomas Jungblut
Berlin <th...@gmail.com>