You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2014/01/10 06:34:57 UTC

FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Just FYI,

https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf

-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by song bai <ba...@gmail.com>.
thanks ,I will do my best.

2014/1/10, Tommaso Teofili <to...@gmail.com>:
> cool stuff, looking forward to your contributions!
> Tommaso
>
>
> 2014/1/10 Edward J. Yoon <ed...@apache.org>
>
>> Hello,
>>
>> First of all, please create a Jira id[1] if you don't already have
>> one. then you can create a JIRA ticket for starting to contribute
>> ideas and patches[3].
>>
>> We look forward your contributions!
>>
>> 1. https://issues.apache.org/jira/secure/Signup!default.jspa
>> 2. https://issues.apache.org/jira/browse/HAMA
>> 3. http://wiki.apache.org/hama/HowToContribute
>>
>> On Fri, Jan 10, 2014 at 5:06 PM, song bai <ba...@gmail.com> wrote:
>> > Dear Edward J. Yoon
>> >
>> > I have read and modify most of source code of hama-0.6.0, for example,
>> > 1. add combiner when the peer sends messages to other peers;
>> > 2. the messages send from Superstep i to Superstep (i+1) in the same
>> > BspPeer don't use the default RPC,but through     memory.
>> > 3. I hava implemented some algorithm,such as WCC,topk and Incremental
>> > PageRank.
>> >
>> > Unforunately I have found some mistakes in the source code.I think the
>> > important is that:
>> > In Pregel paper, the job will terminates when all vertices are
>> > simultaneously inactive and there are no messages in transit.
>> > but hama only consider the vertices are active or not.
>> >
>> > I want to be a contributor of hama project.now I am studying the
>> > document
>> > of hama.
>> > can you give me some suggest,waiting for your reply.
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jan 10, 2014 at 1:34 PM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> Just FYI,
>> >>
>> >> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by Tommaso Teofili <to...@gmail.com>.
cool stuff, looking forward to your contributions!
Tommaso


2014/1/10 Edward J. Yoon <ed...@apache.org>

> Hello,
>
> First of all, please create a Jira id[1] if you don't already have
> one. then you can create a JIRA ticket for starting to contribute
> ideas and patches[3].
>
> We look forward your contributions!
>
> 1. https://issues.apache.org/jira/secure/Signup!default.jspa
> 2. https://issues.apache.org/jira/browse/HAMA
> 3. http://wiki.apache.org/hama/HowToContribute
>
> On Fri, Jan 10, 2014 at 5:06 PM, song bai <ba...@gmail.com> wrote:
> > Dear Edward J. Yoon
> >
> > I have read and modify most of source code of hama-0.6.0, for example,
> > 1. add combiner when the peer sends messages to other peers;
> > 2. the messages send from Superstep i to Superstep (i+1) in the same
> > BspPeer don't use the default RPC,but through     memory.
> > 3. I hava implemented some algorithm,such as WCC,topk and Incremental
> > PageRank.
> >
> > Unforunately I have found some mistakes in the source code.I think the
> > important is that:
> > In Pregel paper, the job will terminates when all vertices are
> > simultaneously inactive and there are no messages in transit.
> > but hama only consider the vertices are active or not.
> >
> > I want to be a contributor of hama project.now I am studying the document
> > of hama.
> > can you give me some suggest,waiting for your reply.
> >
> >
> >
> >
> >
> > On Fri, Jan 10, 2014 at 1:34 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Just FYI,
> >>
> >> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hello,

First of all, please create a Jira id[1] if you don't already have
one. then you can create a JIRA ticket for starting to contribute
ideas and patches[3].

We look forward your contributions!

1. https://issues.apache.org/jira/secure/Signup!default.jspa
2. https://issues.apache.org/jira/browse/HAMA
3. http://wiki.apache.org/hama/HowToContribute

On Fri, Jan 10, 2014 at 5:06 PM, song bai <ba...@gmail.com> wrote:
> Dear Edward J. Yoon
>
> I have read and modify most of source code of hama-0.6.0, for example,
> 1. add combiner when the peer sends messages to other peers;
> 2. the messages send from Superstep i to Superstep (i+1) in the same
> BspPeer don't use the default RPC,but through     memory.
> 3. I hava implemented some algorithm,such as WCC,topk and Incremental
> PageRank.
>
> Unforunately I have found some mistakes in the source code.I think the
> important is that:
> In Pregel paper, the job will terminates when all vertices are
> simultaneously inactive and there are no messages in transit.
> but hama only consider the vertices are active or not.
>
> I want to be a contributor of hama project.now I am studying the document
> of hama.
> can you give me some suggest,waiting for your reply.
>
>
>
>
>
> On Fri, Jan 10, 2014 at 1:34 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Just FYI,
>>
>> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by song bai <ba...@gmail.com>.
Dear Edward J. Yoon

I have read and modify most of source code of hama-0.6.0, for example,
1. add combiner when the peer sends messages to other peers;
2. the messages send from Superstep i to Superstep (i+1) in the same
BspPeer don't use the default RPC,but through     memory.
3. I hava implemented some algorithm,such as WCC,topk and Incremental
PageRank.

Unforunately I have found some mistakes in the source code.I think the
important is that:
In Pregel paper, the job will terminates when all vertices are
simultaneously inactive and there are no messages in transit.
but hama only consider the vertices are active or not.

I want to be a contributor of hama project.now I am studying the document
of hama.
can you give me some suggest,waiting for your reply.





On Fri, Jan 10, 2014 at 1:34 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Just FYI,
>
> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by song bai <ba...@gmail.com>.
I also encounter that failures in running hama-0.6.0 . I think there are
two problem in hama.
(1) because hama loads data into memory to process.To large data,that may
cause jvm memory overflow. the sulotion is that you can configure the
"bsp.child.java.opts"
as large as your computer allows in hama-site.xml .for examples,
  <property>
    <name>bsp.child.java.opts</name>
    <value>-Xmx4096m</value>
  </property>

(2) To pagerank, one BSPPeer may send larger messages than SSSP after
finishing one SuperStep(as you can see, to the same data largeEWD,SSSP
is successful but PageRank is failed), and the User-defined combiner is not
used to reduce the message amount when sending messages,so the RPC may
occur error because of large amount messages.
the solution is that you can modify the
org.apache.hama.graph.GraphJobRunner class and  org.apache.hama.graph.Vertex
 to add an combiner when sending messages.

By the above two methods, i have solved the large big data problem.
good luckļ¼


On Mon, Jan 13, 2014 at 12:14 AM, Tommaso Teofili <tommaso.teofili@gmail.com
> wrote:

> by the way: is there anyone aware of what kind of failures were related to
> PageRank failures highlighted in the mentioned slides (or know who can we
> ask)?
>
> Tommaso
>
>
> 2014/1/10 Edward J. Yoon <ed...@apache.org>
>
> > Just FYI,
> >
> > https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by Chia-Hung Lin <cl...@googlemail.com>.
Not very sure, but it seems JUnitBenchmarks can be integrated to Jekins.

On 13 January 2014 17:05, Tommaso Teofili <to...@gmail.com> wrote:
> Thanks Song Bai and Ed for your replies, looking forward to Song's
> contributions and HAMA-843/816 to be done.
>
> Tommaso
>
> p.s.:
> I think we need a way of continuously benchmarking our trunk (e.g. setup 2+
> machines in distributed mode and run tests / benchmarks against them via
> Jenkins, but I don't know if that's really feasible via ASF Jenkins).
>
>
>
> 2014/1/13 Edward J. Yoon <ed...@apache.org>
>
>> Once HAMA-843 is committed, PageRank performance will be dramatically
>> improved.
>>
>> The scalability issue is related with In-Memory VerticesInfo and
>> Queue. DiskVerticesInfo is now available. Disk/Spilling Queue issues
>> will be fixed soon.
>>
>> And also, Graph package's performance can be improved one more time
>> with HAMA-816.
>>
>> On Mon, Jan 13, 2014 at 1:14 AM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>> > by the way: is there anyone aware of what kind of failures were related
>> to
>> > PageRank failures highlighted in the mentioned slides (or know who can we
>> > ask)?
>> >
>> > Tommaso
>> >
>> >
>> > 2014/1/10 Edward J. Yoon <ed...@apache.org>
>> >
>> >> Just FYI,
>> >>
>> >> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by Tommaso Teofili <to...@gmail.com>.
Thanks Song Bai and Ed for your replies, looking forward to Song's
contributions and HAMA-843/816 to be done.

Tommaso

p.s.:
I think we need a way of continuously benchmarking our trunk (e.g. setup 2+
machines in distributed mode and run tests / benchmarks against them via
Jenkins, but I don't know if that's really feasible via ASF Jenkins).



2014/1/13 Edward J. Yoon <ed...@apache.org>

> Once HAMA-843 is committed, PageRank performance will be dramatically
> improved.
>
> The scalability issue is related with In-Memory VerticesInfo and
> Queue. DiskVerticesInfo is now available. Disk/Spilling Queue issues
> will be fixed soon.
>
> And also, Graph package's performance can be improved one more time
> with HAMA-816.
>
> On Mon, Jan 13, 2014 at 1:14 AM, Tommaso Teofili
> <to...@gmail.com> wrote:
> > by the way: is there anyone aware of what kind of failures were related
> to
> > PageRank failures highlighted in the mentioned slides (or know who can we
> > ask)?
> >
> > Tommaso
> >
> >
> > 2014/1/10 Edward J. Yoon <ed...@apache.org>
> >
> >> Just FYI,
> >>
> >> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by "Edward J. Yoon" <ed...@apache.org>.
Once HAMA-843 is committed, PageRank performance will be dramatically improved.

The scalability issue is related with In-Memory VerticesInfo and
Queue. DiskVerticesInfo is now available. Disk/Spilling Queue issues
will be fixed soon.

And also, Graph package's performance can be improved one more time
with HAMA-816.

On Mon, Jan 13, 2014 at 1:14 AM, Tommaso Teofili
<to...@gmail.com> wrote:
> by the way: is there anyone aware of what kind of failures were related to
> PageRank failures highlighted in the mentioned slides (or know who can we
> ask)?
>
> Tommaso
>
>
> 2014/1/10 Edward J. Yoon <ed...@apache.org>
>
>> Just FYI,
>>
>> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

Posted by Tommaso Teofili <to...@gmail.com>.
by the way: is there anyone aware of what kind of failures were related to
PageRank failures highlighted in the mentioned slides (or know who can we
ask)?

Tommaso


2014/1/10 Edward J. Yoon <ed...@apache.org>

> Just FYI,
>
> https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>