You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by Shuo Wang <ec...@gmail.com> on 2012/10/23 05:04:20 UTC

PageRank Experiment

HI,

I have done some PageRank experiments on HAMA, but when I used the data
larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each task
has 1G memory.
Here is the output:

12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to process :
45
12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job: job_201210221428_0003
12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.

Re: PageRank Experiment

Posted by Thomas Jungblut <th...@gmail.com>.

Exactly, for pagerank there is none yet.
You can use the mapreduce generator for SSSP though as baseline:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java

2012/10/23 Edward J. Yoon <ed...@apache.org>

> Unfortunately we don't have random matrix/graph generator yet. Do you
> want to implement it?
>
> On Tue, Oct 23, 2012 at 6:38 PM, Shuo Wang <ec...@gmail.com>
> wrote:
> > Do you have a program to generate the pagerank data, if you have, can you
> > provide it for me. It will be best if there are some description about
> how
> > to use the program. I think my generator program has some problems,so it
> > can't generator the right web graph.
> >
> > 2012/10/23 Edward J. Yoon <ed...@apache.org>
> >
> >> Add below property to hama-site.xml and retry please.
> >>
> >>   <property>
> >>     <name>hama.graph.multi.step.partitioning.interval</name>
> >>     <value>3000000</value>
> >>   </property>
> >>
> >> P.S., keep in mind that graph job runs in memory so there's a capacity
> >> limit.
> >>
> >> On Tue, Oct 23, 2012 at 3:57 PM, Shuo Wang <ec...@gmail.com>
> >> wrote:
> >> > Yes! I have used the data on the Stanford SNAP,the largest data is
> 106M,
> >> > they all work.
> >> >
> >> > 2012/10/23 Edward J. Yoon <ed...@apache.org>
> >> >
> >> >> Hi,
> >> >>
> >> >> If data is smaller than 100M, it works?
> >> >>
> >> >> On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <
> ecisp.wangshuo@gmail.com>
> >> >> wrote:
> >> >> > HI,
> >> >> >
> >> >> > I have done some PageRank experiments on HAMA, but when I used the
> >> data
> >> >> > larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks,
> each
> >> task
> >> >> > has 1G memory.
> >> >> > Here is the output:
> >> >> >
> >> >> > 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to
> >> process
> >> >> :
> >> >> > 45
> >> >> > 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job:
> >> >> job_201210221428_0003
> >> >> > 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> >> >> > 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps
> number: 2
> >> >> > 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: PageRank Experiment

Posted by "Edward J. Yoon" <ed...@apache.org>.

Unfortunately we don't have random matrix/graph generator yet. Do you
want to implement it?

On Tue, Oct 23, 2012 at 6:38 PM, Shuo Wang <ec...@gmail.com> wrote:
> Do you have a program to generate the pagerank data, if you have, can you
> provide it for me. It will be best if there are some description about how
> to use the program. I think my generator program has some problems,so it
> can't generator the right web graph.
>
> 2012/10/23 Edward J. Yoon <ed...@apache.org>
>
>> Add below property to hama-site.xml and retry please.
>>
>>   <property>
>>     <name>hama.graph.multi.step.partitioning.interval</name>
>>     <value>3000000</value>
>>   </property>
>>
>> P.S., keep in mind that graph job runs in memory so there's a capacity
>> limit.
>>
>> On Tue, Oct 23, 2012 at 3:57 PM, Shuo Wang <ec...@gmail.com>
>> wrote:
>> > Yes! I have used the data on the Stanford SNAP,the largest data is 106M,
>> > they all work.
>> >
>> > 2012/10/23 Edward J. Yoon <ed...@apache.org>
>> >
>> >> Hi,
>> >>
>> >> If data is smaller than 100M, it works?
>> >>
>> >> On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <ec...@gmail.com>
>> >> wrote:
>> >> > HI,
>> >> >
>> >> > I have done some PageRank experiments on HAMA, but when I used the
>> data
>> >> > larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each
>> task
>> >> > has 1G memory.
>> >> > Here is the output:
>> >> >
>> >> > 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to
>> process
>> >> :
>> >> > 45
>> >> > 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job:
>> >> job_201210221428_0003
>> >> > 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
>> >> > 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
>> >> > 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: PageRank Experiment

Posted by Shuo Wang <ec...@gmail.com>.

Do you have a program to generate the pagerank data, if you have, can you
provide it for me. It will be best if there are some description about how
to use the program. I think my generator program has some problems,so it
can't generator the right web graph.

2012/10/23 Edward J. Yoon <ed...@apache.org>

> Add below property to hama-site.xml and retry please.
>
>   <property>
>     <name>hama.graph.multi.step.partitioning.interval</name>
>     <value>3000000</value>
>   </property>
>
> P.S., keep in mind that graph job runs in memory so there's a capacity
> limit.
>
> On Tue, Oct 23, 2012 at 3:57 PM, Shuo Wang <ec...@gmail.com>
> wrote:
> > Yes! I have used the data on the Stanford SNAP,the largest data is 106M,
> > they all work.
> >
> > 2012/10/23 Edward J. Yoon <ed...@apache.org>
> >
> >> Hi,
> >>
> >> If data is smaller than 100M, it works?
> >>
> >> On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <ec...@gmail.com>
> >> wrote:
> >> > HI,
> >> >
> >> > I have done some PageRank experiments on HAMA, but when I used the
> data
> >> > larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each
> task
> >> > has 1G memory.
> >> > Here is the output:
> >> >
> >> > 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to
> process
> >> :
> >> > 45
> >> > 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job:
> >> job_201210221428_0003
> >> > 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
> >> > 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
> >> > 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: PageRank Experiment

Posted by "Edward J. Yoon" <ed...@apache.org>.

Add below property to hama-site.xml and retry please.

  <property>
    <name>hama.graph.multi.step.partitioning.interval</name>
    <value>3000000</value>
  </property>

P.S., keep in mind that graph job runs in memory so there's a capacity limit.

On Tue, Oct 23, 2012 at 3:57 PM, Shuo Wang <ec...@gmail.com> wrote:
> Yes! I have used the data on the Stanford SNAP,the largest data is 106M,
> they all work.
>
> 2012/10/23 Edward J. Yoon <ed...@apache.org>
>
>> Hi,
>>
>> If data is smaller than 100M, it works?
>>
>> On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <ec...@gmail.com>
>> wrote:
>> > HI,
>> >
>> > I have done some PageRank experiments on HAMA, but when I used the data
>> > larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each task
>> > has 1G memory.
>> > Here is the output:
>> >
>> > 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to process
>> :
>> > 45
>> > 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job:
>> job_201210221428_0003
>> > 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
>> > 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
>> > 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: PageRank Experiment

Posted by Shuo Wang <ec...@gmail.com>.

Yes! I have used the data on the Stanford SNAP,the largest data is 106M,
they all work.

2012/10/23 Edward J. Yoon <ed...@apache.org>

> Hi,
>
> If data is smaller than 100M, it works?
>
> On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <ec...@gmail.com>
> wrote:
> > HI,
> >
> > I have done some PageRank experiments on HAMA, but when I used the data
> > larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each task
> > has 1G memory.
> > Here is the output:
> >
> > 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to process
> :
> > 45
> > 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job:
> job_201210221428_0003
> > 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
> > 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
> > 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: PageRank Experiment

Posted by "Edward J. Yoon" <ed...@apache.org>.

Hi,

If data is smaller than 100M, it works?

On Tue, Oct 23, 2012 at 12:04 PM, Shuo Wang <ec...@gmail.com> wrote:
> HI,
>
> I have done some PageRank experiments on HAMA, but when I used the data
> larger than 110M,it failed! Our cluster has 10 nodes, 45 tasks, each task
> has 1G memory.
> Here is the output:
>
> 12/10/22 14:56:31 INFO bsp.FileInputFormat: Total input paths to process :
> 45
> 12/10/22 14:56:31 INFO bsp.BSPJobClient: Running job: job_201210221428_0003
> 12/10/22 14:56:34 INFO bsp.BSPJobClient: Current supersteps number: 0
> 12/10/22 14:56:43 INFO bsp.BSPJobClient: Current supersteps number: 2
> 12/10/22 14:57:46 INFO bsp.BSPJobClient: Job failed.



-- 
Best Regards, Edward J. Yoon
@eddieyoon