You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by Shuo Wang <ec...@gmail.com> on 2012/10/24 08:49:54 UTC

PageRank Experiment Iteration

Hi,

I have run the pagerank on HAMA, I set the max iteration to 20, but it run
48 supersteps. Why?

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

Hi,

I have changed the program  that creates random input for SSSP to generate
random graph for pagerank.My cluster has 10 nodes,each node has 8G
memory;45 tasks,each task has 512M memory;I set the groom memory to 2000M.
Now the largest data I can run is 133M, the larger data will output
"OUTOFMEMORY" error.

What' more.I find it also relates with the number of the Vertices and
Edges.For example,500000 vertices,32 edges can run and have right
result;1000000 vertices,4 edges run failed or the result is NULL or
infinity.

2012/10/24 Thomas Jungblut <th...@gmail.com>

> 512mb is rough. normally a datanodes consumes 1gb of memory. So if you
> start a groom on that, there is not much room for it. So I don't think it
> will run on these machines pretty well (if ever).
> Pretty much nothing is disk based here, so you need memory to scale out
> (unlike in mapreduce). However we want to enable this, but it takes more
> time to get through it.
>
> I have written a rough sketch of what should be done to make Hama more
> scalable:
>
> https://docs.google.com/document/d/1Fud5zSFuKDAEz3E8T59ldZtg1H-IMx2CQGbn_bib_eA/edit
>
> But this is future work and maybe not all of it really scales well for many
> more machines.
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > I have tried it on our cluster as you say,but the result is wrong,there
> is
> > no score of the nodes. The same error as I have before.
> >
> > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >
> > > Thank you,let me try!
> > >
> > >
> > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >
> > >> Yes I generated it for an algorithm from movie actors (to calculate
> > Kevin
> > >> Bacon numbers).
> > >> However like I already told you, you can rewrite the generator
> mapreduce
> > >> job that creates random input for SSSP:
> > >>
> > >>
> >
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java
> > >>
> > >> Basically you have to remove the weights from outputting RandomMapper.
> > >> So instead of
> > >>
> > >> s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
> > >> >
> > >> > You would do:
> > >>
> > >> > s += Long.toString(rowId) + "\t";
> > >> >
> > >> >  Of course you can also use a Stringbuilder instead of +=, but
> String
> > >> concat usually isn't a bottleneck in MapReduce ;))
> > >>
> > >> 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >>
> > >> > Do you generate the data yourself? Can you provide the data
> generator
> > >> for
> > >> > me?
> > >> >
> > >> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >> >
> > >> > > 12 gigs, it uses several more (up to 10?) times the memory than
> the
> > >> > dataset
> > >> > > size.
> > >> > >
> > >> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >> > >
> > >> > > > How large your data is? Our cluster has 10 nodes, 45 tasks, each
> > >> task
> > >> > has
> > >> > > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY
> > >> failure.
> > >> > > >
> > >> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >> > > >
> > >> > > > > Sure it does run, if you have enough ram ;)
> > >> > > > >
> > >> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >> > > > >
> > >> > > > > > How much data have you run the pagerank on HAMA? Does it
> run?
> > I
> > >> > want
> > >> > > to
> > >> > > > > run
> > >> > > > > > large data for pagerank on HAMA, but it always fails.
> > >> > > > > >
> > >> > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >> > > > > >
> > >> > > > > > > Yes it works on any directed graph.
> > >> > > > > > > The best format to use is
> > >> > > > > > >
> > >> > > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > >> > > > > > >
> > >> > > > > > > So you have a adjacency list, and a vertex is represented
> by
> > >> each
> > >> > > > line.
> > >> > > > > > > This is splittable, which the web-google dataset is not.
> > >> > > > > > >
> > >> > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >> > > > > > >
> > >> > > > > > > > Thanks! Does the pagerank work on any web graph? I
> > generate
> > >> a
> > >> > > > random
> > >> > > > > > web
> > >> > > > > > > > graph just like the data type of web-Google.txt, but the
> > >> result
> > >> > > is
> > >> > > > > > > > infinity.
> > >> > > > > > > >
> > >> > > > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >> > > > > > > >
> > >> > > > > > > > > Because graph iterations != supersteps. You have to
> take
> > >> the
> > >> > > > > > > partitioning
> > >> > > > > > > > > into account, the time to accumulate the number of
> > >> vertices.
> > >> > > > > Pagerank
> > >> > > > > > > > > requires an additional superstep to run aggregators.
> > >> > > > > > > > >
> > >> > > > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I have run the pagerank on HAMA, I set the max
> > >> iteration to
> > >> > > 20,
> > >> > > > > but
> > >> > > > > > > it
> > >> > > > > > > > > run
> > >> > > > > > > > > > 48 supersteps. Why?
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

512mb is rough. normally a datanodes consumes 1gb of memory. So if you
start a groom on that, there is not much room for it. So I don't think it
will run on these machines pretty well (if ever).
Pretty much nothing is disk based here, so you need memory to scale out
(unlike in mapreduce). However we want to enable this, but it takes more
time to get through it.

I have written a rough sketch of what should be done to make Hama more
scalable:
https://docs.google.com/document/d/1Fud5zSFuKDAEz3E8T59ldZtg1H-IMx2CQGbn_bib_eA/edit

But this is future work and maybe not all of it really scales well for many
more machines.

2012/10/24 Shuo Wang <ec...@gmail.com>

> I have tried it on our cluster as you say,but the result is wrong,there is
> no score of the nodes. The same error as I have before.
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > Thank you,let me try!
> >
> >
> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >
> >> Yes I generated it for an algorithm from movie actors (to calculate
> Kevin
> >> Bacon numbers).
> >> However like I already told you, you can rewrite the generator mapreduce
> >> job that creates random input for SSSP:
> >>
> >>
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java
> >>
> >> Basically you have to remove the weights from outputting RandomMapper.
> >> So instead of
> >>
> >> s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
> >> >
> >> > You would do:
> >>
> >> > s += Long.toString(rowId) + "\t";
> >> >
> >> >  Of course you can also use a Stringbuilder instead of +=, but String
> >> concat usually isn't a bottleneck in MapReduce ;))
> >>
> >> 2012/10/24 Shuo Wang <ec...@gmail.com>
> >>
> >> > Do you generate the data yourself? Can you provide the data generator
> >> for
> >> > me?
> >> >
> >> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >> >
> >> > > 12 gigs, it uses several more (up to 10?) times the memory than the
> >> > dataset
> >> > > size.
> >> > >
> >> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >> > >
> >> > > > How large your data is? Our cluster has 10 nodes, 45 tasks, each
> >> task
> >> > has
> >> > > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY
> >> failure.
> >> > > >
> >> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >> > > >
> >> > > > > Sure it does run, if you have enough ram ;)
> >> > > > >
> >> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >> > > > >
> >> > > > > > How much data have you run the pagerank on HAMA? Does it run?
> I
> >> > want
> >> > > to
> >> > > > > run
> >> > > > > > large data for pagerank on HAMA, but it always fails.
> >> > > > > >
> >> > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >> > > > > >
> >> > > > > > > Yes it works on any directed graph.
> >> > > > > > > The best format to use is
> >> > > > > > >
> >> > > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> >> > > > > > >
> >> > > > > > > So you have a adjacency list, and a vertex is represented by
> >> each
> >> > > > line.
> >> > > > > > > This is splittable, which the web-google dataset is not.
> >> > > > > > >
> >> > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >> > > > > > >
> >> > > > > > > > Thanks! Does the pagerank work on any web graph? I
> generate
> >> a
> >> > > > random
> >> > > > > > web
> >> > > > > > > > graph just like the data type of web-Google.txt, but the
> >> result
> >> > > is
> >> > > > > > > > infinity.
> >> > > > > > > >
> >> > > > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >> > > > > > > >
> >> > > > > > > > > Because graph iterations != supersteps. You have to take
> >> the
> >> > > > > > > partitioning
> >> > > > > > > > > into account, the time to accumulate the number of
> >> vertices.
> >> > > > > Pagerank
> >> > > > > > > > > requires an additional superstep to run aggregators.
> >> > > > > > > > >
> >> > > > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >> > > > > > > > >
> >> > > > > > > > > > Hi,
> >> > > > > > > > > >
> >> > > > > > > > > > I have run the pagerank on HAMA, I set the max
> >> iteration to
> >> > > 20,
> >> > > > > but
> >> > > > > > > it
> >> > > > > > > > > run
> >> > > > > > > > > > 48 supersteps. Why?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

I have tried it on our cluster as you say,but the result is wrong,there is
no score of the nodes. The same error as I have before.

2012/10/24 Shuo Wang <ec...@gmail.com>

> Thank you,let me try!
>
>
> 2012/10/24 Thomas Jungblut <th...@gmail.com>
>
>> Yes I generated it for an algorithm from movie actors (to calculate Kevin
>> Bacon numbers).
>> However like I already told you, you can rewrite the generator mapreduce
>> job that creates random input for SSSP:
>>
>> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java
>>
>> Basically you have to remove the weights from outputting RandomMapper.
>> So instead of
>>
>> s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
>> >
>> > You would do:
>>
>> > s += Long.toString(rowId) + "\t";
>> >
>> >  Of course you can also use a Stringbuilder instead of +=, but String
>> concat usually isn't a bottleneck in MapReduce ;))
>>
>> 2012/10/24 Shuo Wang <ec...@gmail.com>
>>
>> > Do you generate the data yourself? Can you provide the data generator
>> for
>> > me?
>> >
>> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
>> >
>> > > 12 gigs, it uses several more (up to 10?) times the memory than the
>> > dataset
>> > > size.
>> > >
>> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
>> > >
>> > > > How large your data is? Our cluster has 10 nodes, 45 tasks, each
>> task
>> > has
>> > > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY
>> failure.
>> > > >
>> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
>> > > >
>> > > > > Sure it does run, if you have enough ram ;)
>> > > > >
>> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
>> > > > >
>> > > > > > How much data have you run the pagerank on HAMA? Does it run? I
>> > want
>> > > to
>> > > > > run
>> > > > > > large data for pagerank on HAMA, but it always fails.
>> > > > > >
>> > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
>> > > > > >
>> > > > > > > Yes it works on any directed graph.
>> > > > > > > The best format to use is
>> > > > > > >
>> > > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
>> > > > > > >
>> > > > > > > So you have a adjacency list, and a vertex is represented by
>> each
>> > > > line.
>> > > > > > > This is splittable, which the web-google dataset is not.
>> > > > > > >
>> > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
>> > > > > > >
>> > > > > > > > Thanks! Does the pagerank work on any web graph? I generate
>> a
>> > > > random
>> > > > > > web
>> > > > > > > > graph just like the data type of web-Google.txt, but the
>> result
>> > > is
>> > > > > > > > infinity.
>> > > > > > > >
>> > > > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
>> > > > > > > >
>> > > > > > > > > Because graph iterations != supersteps. You have to take
>> the
>> > > > > > > partitioning
>> > > > > > > > > into account, the time to accumulate the number of
>> vertices.
>> > > > > Pagerank
>> > > > > > > > > requires an additional superstep to run aggregators.
>> > > > > > > > >
>> > > > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
>> > > > > > > > >
>> > > > > > > > > > Hi,
>> > > > > > > > > >
>> > > > > > > > > > I have run the pagerank on HAMA, I set the max
>> iteration to
>> > > 20,
>> > > > > but
>> > > > > > > it
>> > > > > > > > > run
>> > > > > > > > > > 48 supersteps. Why?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

Thank you,let me try!

2012/10/24 Thomas Jungblut <th...@gmail.com>

> Yes I generated it for an algorithm from movie actors (to calculate Kevin
> Bacon numbers).
> However like I already told you, you can rewrite the generator mapreduce
> job that creates random input for SSSP:
>
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java
>
> Basically you have to remove the weights from outputting RandomMapper.
> So instead of
>
> s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
> >
> > You would do:
>
> > s += Long.toString(rowId) + "\t";
> >
> >  Of course you can also use a Stringbuilder instead of +=, but String
> concat usually isn't a bottleneck in MapReduce ;))
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > Do you generate the data yourself? Can you provide the data generator for
> > me?
> >
> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >
> > > 12 gigs, it uses several more (up to 10?) times the memory than the
> > dataset
> > > size.
> > >
> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >
> > > > How large your data is? Our cluster has 10 nodes, 45 tasks, each task
> > has
> > > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY
> failure.
> > > >
> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > >
> > > > > Sure it does run, if you have enough ram ;)
> > > > >
> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > >
> > > > > > How much data have you run the pagerank on HAMA? Does it run? I
> > want
> > > to
> > > > > run
> > > > > > large data for pagerank on HAMA, but it always fails.
> > > > > >
> > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > > >
> > > > > > > Yes it works on any directed graph.
> > > > > > > The best format to use is
> > > > > > >
> > > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > > > > > >
> > > > > > > So you have a adjacency list, and a vertex is represented by
> each
> > > > line.
> > > > > > > This is splittable, which the web-google dataset is not.
> > > > > > >
> > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > > >
> > > > > > > > Thanks! Does the pagerank work on any web graph? I generate a
> > > > random
> > > > > > web
> > > > > > > > graph just like the data type of web-Google.txt, but the
> result
> > > is
> > > > > > > > infinity.
> > > > > > > >
> > > > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > > > > >
> > > > > > > > > Because graph iterations != supersteps. You have to take
> the
> > > > > > > partitioning
> > > > > > > > > into account, the time to accumulate the number of
> vertices.
> > > > > Pagerank
> > > > > > > > > requires an additional superstep to run aggregators.
> > > > > > > > >
> > > > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I have run the pagerank on HAMA, I set the max iteration
> to
> > > 20,
> > > > > but
> > > > > > > it
> > > > > > > > > run
> > > > > > > > > > 48 supersteps. Why?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

Yes I generated it for an algorithm from movie actors (to calculate Kevin
Bacon numbers).
However like I already told you, you can rewrite the generator mapreduce
job that creates random input for SSSP:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java

Basically you have to remove the weights from outputting RandomMapper.
So instead of

s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
>
> You would do:

> s += Long.toString(rowId) + "\t";
>
>  Of course you can also use a Stringbuilder instead of +=, but String
concat usually isn't a bottleneck in MapReduce ;))

2012/10/24 Shuo Wang <ec...@gmail.com>

> Do you generate the data yourself? Can you provide the data generator for
> me?
>
> 2012/10/24 Thomas Jungblut <th...@gmail.com>
>
> > 12 gigs, it uses several more (up to 10?) times the memory than the
> dataset
> > size.
> >
> > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >
> > > How large your data is? Our cluster has 10 nodes, 45 tasks, each task
> has
> > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY failure.
> > >
> > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >
> > > > Sure it does run, if you have enough ram ;)
> > > >
> > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > >
> > > > > How much data have you run the pagerank on HAMA? Does it run? I
> want
> > to
> > > > run
> > > > > large data for pagerank on HAMA, but it always fails.
> > > > >
> > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > >
> > > > > > Yes it works on any directed graph.
> > > > > > The best format to use is
> > > > > >
> > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > > > > >
> > > > > > So you have a adjacency list, and a vertex is represented by each
> > > line.
> > > > > > This is splittable, which the web-google dataset is not.
> > > > > >
> > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > >
> > > > > > > Thanks! Does the pagerank work on any web graph? I generate a
> > > random
> > > > > web
> > > > > > > graph just like the data type of web-Google.txt, but the result
> > is
> > > > > > > infinity.
> > > > > > >
> > > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > > > >
> > > > > > > > Because graph iterations != supersteps. You have to take the
> > > > > > partitioning
> > > > > > > > into account, the time to accumulate the number of vertices.
> > > > Pagerank
> > > > > > > > requires an additional superstep to run aggregators.
> > > > > > > >
> > > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have run the pagerank on HAMA, I set the max iteration to
> > 20,
> > > > but
> > > > > > it
> > > > > > > > run
> > > > > > > > > 48 supersteps. Why?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

Do you generate the data yourself? Can you provide the data generator for
me?

2012/10/24 Thomas Jungblut <th...@gmail.com>

> 12 gigs, it uses several more (up to 10?) times the memory than the dataset
> size.
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > How large your data is? Our cluster has 10 nodes, 45 tasks, each task has
> > 512M memory. But when I run the 200M data, it has OUTOFMEMORY failure.
> >
> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >
> > > Sure it does run, if you have enough ram ;)
> > >
> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >
> > > > How much data have you run the pagerank on HAMA? Does it run? I want
> to
> > > run
> > > > large data for pagerank on HAMA, but it always fails.
> > > >
> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > >
> > > > > Yes it works on any directed graph.
> > > > > The best format to use is
> > > > >
> > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > > > >
> > > > > So you have a adjacency list, and a vertex is represented by each
> > line.
> > > > > This is splittable, which the web-google dataset is not.
> > > > >
> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > >
> > > > > > Thanks! Does the pagerank work on any web graph? I generate a
> > random
> > > > web
> > > > > > graph just like the data type of web-Google.txt, but the result
> is
> > > > > > infinity.
> > > > > >
> > > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > > >
> > > > > > > Because graph iterations != supersteps. You have to take the
> > > > > partitioning
> > > > > > > into account, the time to accumulate the number of vertices.
> > > Pagerank
> > > > > > > requires an additional superstep to run aggregators.
> > > > > > >
> > > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I have run the pagerank on HAMA, I set the max iteration to
> 20,
> > > but
> > > > > it
> > > > > > > run
> > > > > > > > 48 supersteps. Why?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

12 gigs, it uses several more (up to 10?) times the memory than the dataset
size.

2012/10/24 Shuo Wang <ec...@gmail.com>

> How large your data is? Our cluster has 10 nodes, 45 tasks, each task has
> 512M memory. But when I run the 200M data, it has OUTOFMEMORY failure.
>
> 2012/10/24 Thomas Jungblut <th...@gmail.com>
>
> > Sure it does run, if you have enough ram ;)
> >
> > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >
> > > How much data have you run the pagerank on HAMA? Does it run? I want to
> > run
> > > large data for pagerank on HAMA, but it always fails.
> > >
> > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >
> > > > Yes it works on any directed graph.
> > > > The best format to use is
> > > >
> > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > > >
> > > > So you have a adjacency list, and a vertex is represented by each
> line.
> > > > This is splittable, which the web-google dataset is not.
> > > >
> > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > >
> > > > > Thanks! Does the pagerank work on any web graph? I generate a
> random
> > > web
> > > > > graph just like the data type of web-Google.txt, but the result is
> > > > > infinity.
> > > > >
> > > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > > >
> > > > > > Because graph iterations != supersteps. You have to take the
> > > > partitioning
> > > > > > into account, the time to accumulate the number of vertices.
> > Pagerank
> > > > > > requires an additional superstep to run aggregators.
> > > > > >
> > > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have run the pagerank on HAMA, I set the max iteration to 20,
> > but
> > > > it
> > > > > > run
> > > > > > > 48 supersteps. Why?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

How large your data is? Our cluster has 10 nodes, 45 tasks, each task has
512M memory. But when I run the 200M data, it has OUTOFMEMORY failure.

2012/10/24 Thomas Jungblut <th...@gmail.com>

> Sure it does run, if you have enough ram ;)
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > How much data have you run the pagerank on HAMA? Does it run? I want to
> run
> > large data for pagerank on HAMA, but it always fails.
> >
> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >
> > > Yes it works on any directed graph.
> > > The best format to use is
> > >
> > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> > >
> > > So you have a adjacency list, and a vertex is represented by each line.
> > > This is splittable, which the web-google dataset is not.
> > >
> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >
> > > > Thanks! Does the pagerank work on any web graph? I generate a random
> > web
> > > > graph just like the data type of web-Google.txt, but the result is
> > > > infinity.
> > > >
> > > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > > >
> > > > > Because graph iterations != supersteps. You have to take the
> > > partitioning
> > > > > into account, the time to accumulate the number of vertices.
> Pagerank
> > > > > requires an additional superstep to run aggregators.
> > > > >
> > > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have run the pagerank on HAMA, I set the max iteration to 20,
> but
> > > it
> > > > > run
> > > > > > 48 supersteps. Why?
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

Sure it does run, if you have enough ram ;)

2012/10/24 Shuo Wang <ec...@gmail.com>

> How much data have you run the pagerank on HAMA? Does it run? I want to run
> large data for pagerank on HAMA, but it always fails.
>
> 2012/10/24 Thomas Jungblut <th...@gmail.com>
>
> > Yes it works on any directed graph.
> > The best format to use is
> >
> > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
> >
> > So you have a adjacency list, and a vertex is represented by each line.
> > This is splittable, which the web-google dataset is not.
> >
> > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >
> > > Thanks! Does the pagerank work on any web graph? I generate a random
> web
> > > graph just like the data type of web-Google.txt, but the result is
> > > infinity.
> > >
> > > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> > >
> > > > Because graph iterations != supersteps. You have to take the
> > partitioning
> > > > into account, the time to accumulate the number of vertices. Pagerank
> > > > requires an additional superstep to run aggregators.
> > > >
> > > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > > >
> > > > > Hi,
> > > > >
> > > > > I have run the pagerank on HAMA, I set the max iteration to 20, but
> > it
> > > > run
> > > > > 48 supersteps. Why?
> > > > >
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

How much data have you run the pagerank on HAMA? Does it run? I want to run
large data for pagerank on HAMA, but it always fails.

2012/10/24 Thomas Jungblut <th...@gmail.com>

> Yes it works on any directed graph.
> The best format to use is
>
> Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.
>
> So you have a adjacency list, and a vertex is represented by each line.
> This is splittable, which the web-google dataset is not.
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > Thanks! Does the pagerank work on any web graph? I generate a random web
> > graph just like the data type of web-Google.txt, but the result is
> > infinity.
> >
> > 2012/10/24 Thomas Jungblut <th...@gmail.com>
> >
> > > Because graph iterations != supersteps. You have to take the
> partitioning
> > > into account, the time to accumulate the number of vertices. Pagerank
> > > requires an additional superstep to run aggregators.
> > >
> > > 2012/10/24 Shuo Wang <ec...@gmail.com>
> > >
> > > > Hi,
> > > >
> > > > I have run the pagerank on HAMA, I set the max iteration to 20, but
> it
> > > run
> > > > 48 supersteps. Why?
> > > >
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

Yes it works on any directed graph.
The best format to use is

Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2 etc.

So you have a adjacency list, and a vertex is represented by each line.
This is splittable, which the web-google dataset is not.

2012/10/24 Shuo Wang <ec...@gmail.com>

> Thanks! Does the pagerank work on any web graph? I generate a random web
> graph just like the data type of web-Google.txt, but the result is
> infinity.
>
> 2012/10/24 Thomas Jungblut <th...@gmail.com>
>
> > Because graph iterations != supersteps. You have to take the partitioning
> > into account, the time to accumulate the number of vertices. Pagerank
> > requires an additional superstep to run aggregators.
> >
> > 2012/10/24 Shuo Wang <ec...@gmail.com>
> >
> > > Hi,
> > >
> > > I have run the pagerank on HAMA, I set the max iteration to 20, but it
> > run
> > > 48 supersteps. Why?
> > >
> >
>

Re: PageRank Experiment Iteration

Posted by Shuo Wang <ec...@gmail.com>.

Thanks! Does the pagerank work on any web graph? I generate a random web
graph just like the data type of web-Google.txt, but the result is
infinity.

2012/10/24 Thomas Jungblut <th...@gmail.com>

> Because graph iterations != supersteps. You have to take the partitioning
> into account, the time to accumulate the number of vertices. Pagerank
> requires an additional superstep to run aggregators.
>
> 2012/10/24 Shuo Wang <ec...@gmail.com>
>
> > Hi,
> >
> > I have run the pagerank on HAMA, I set the max iteration to 20, but it
> run
> > 48 supersteps. Why?
> >
>

Re: PageRank Experiment Iteration

Posted by Thomas Jungblut <th...@gmail.com>.

Because graph iterations != supersteps. You have to take the partitioning
into account, the time to accumulate the number of vertices. Pagerank
requires an additional superstep to run aggregators.

2012/10/24 Shuo Wang <ec...@gmail.com>

> Hi,
>
> I have run the pagerank on HAMA, I set the max iteration to 20, but it run
> 48 supersteps. Why?
>