You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2009/06/22 08:50:01 UTC

FYI, Large-scale graph computing at Google

http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
-- It sounds like Pregel seems, a computing framework based on dynamic
programming for the graph operations. I guess maybe they removed the
file communications/intermediate files during iterations.

Anyway, What do you think?
-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI, Large-scale graph computing at Google

Posted by Ted Dunning <te...@gmail.com>.

I would find that very interesting as well.

MR graph processing is fine for algorithms that scale in number of MR
iterations as the diameter of the graph if they are applied to small
diameter very large graphs (such as the small world graphs we all see).
Various tricks like keeping first and second order links and duplicating
nodes can cut the constants, but not change the basic asymptotic costs.

I agree that it isn't clear whether pregel is a layer over map-reduce or an
entirely new paradigm.  It would be interesting to know that.  It would also
be interesting to hear how people are attacking the problem of graph
algorithms on large scale clusters, even in advance of getting results.

On Thu, Jun 25, 2009 at 10:39 AM, Patterson, Josh <jp...@tva.gov>wrote:

> I'm a little lost here; Is this a replacement for M/R or is it some new
> code that sits ontop of M/R that runs an iteration over some sort of
> graph's vertexes? My quick scan of Google's article didn't seem to yeild
> a distinction. Either way, I'd say for our data that a graph processing
> lib for M/R would be interesting.
>

Re: FYI, Large-scale graph computing at Google

Posted by Steve Loughran <st...@apache.org>.

Patterson, Josh wrote:
> Steve,
> I'm a little lost here; Is this a replacement for M/R or is it some new
> code that sits ontop of M/R that runs an iteration over some sort of
> graph's vertexes? My quick scan of Google's article didn't seem to yeild
> a distinction. Either way, I'd say for our data that a graph processing
> lib for M/R would be interesting.
> 

I'm thinking of graph algorithms that get implemented as MR jobs; work 
with HDFS, HBase, etc.

RE: FYI, Large-scale graph computing at Google

Posted by "Patterson, Josh" <jp...@tva.gov>.

Steve,
I'm a little lost here; Is this a replacement for M/R or is it some new
code that sits ontop of M/R that runs an iteration over some sort of
graph's vertexes? My quick scan of Google's article didn't seem to yeild
a distinction. Either way, I'd say for our data that a graph processing
lib for M/R would be interesting.

Josh Patterson
TVA

-----Original Message-----
From: Steve Loughran [mailto:stevel@apache.org] 
Sent: Thursday, June 25, 2009 5:57 AM
To: core-user@hadoop.apache.org
Subject: Re: FYI, Large-scale graph computing at Google

Edward J. Yoon wrote:
> What do you think about another new computation framework on HDFS?
> 
> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon
<ed...@apache.org> wrote:
>>
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-a
t-google.html
>> -- It sounds like Pregel seems, a computing framework based on
dynamic
>> programming for the graph operations. I guess maybe they removed the
>> file communications/intermediate files during iterations.
>>
>> Anyway, What do you think?

I have a colleague (paolo) who would be interested in adding a set of 
graph algorithms on top of the MR engine

Re: FYI, Large-scale graph computing at Google

Posted by "Edward J. Yoon" <ed...@apache.org>.

Oh. Thanks. I just realized my typing mistake, I meant Hamburg.

On Mon, Jun 29, 2009 at 7:53 PM, Steve Loughran<st...@apache.org> wrote:
> Edward J. Yoon wrote:
>>
>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
>> Let's discuss about the graph computing framework named Hambrug.
>>
>
> ok, first Q, why the Hambrug. To me that's just Hamburg typed wrong, which
> is going to cause lots of confusion.
>
> What about something more graphy? like "descartes"
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI, Large-scale graph computing at Google

Posted by Steve Loughran <st...@apache.org>.

Edward J. Yoon wrote:
> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
> Let's discuss about the graph computing framework named Hambrug.
> 

ok, first Q, why the Hambrug. To me that's just Hamburg typed wrong, 
which is going to cause lots of confusion.

What about something more graphy? like "descartes"

Re: FYI, Large-scale graph computing at Google

Posted by Steve Loughran <st...@apache.org>.

Delip Rao wrote:
> We've had some success in dealing with locality problems using the adjacency
> list
> representation. This could be serialized using frameworks like Thrift
> or Protocol Buffers.
> For details, please see:
> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf
> 
> I intend to
> continue this line of work and will be very happy to be of any help.

This is a interesting paper. We need to start a wiki page on papers
(pause)

OK, http://wiki.apache.org/hadoop/Papers

I've thinking recently about how Apache could work better with the 
various people doing research on or near Hadoop, you might have some 
opinions there. I'm thinking of
  * mailing list for people doing researchy stuff
  * offering research groups somewhere on SVN
  * offering help to get you integrating with the apache development 
processes, with the goal being to make it easier for your research to 
get back in to the codebase.

This is separate from offering cluster-time on any of the datacentres 
out there, that's something you need to work with the various providers 
for, though apache may be able to help there whenever it knows useful 
contacts

I'm off on holiday/vacation shortly, but this is something I'd like to 
follow up on when I get back

-steve

Re: FYI, Large-scale graph computing at Google

Posted by Ankur Goel <ga...@yahoo-inc.com>.

Cool stuff! In my past experience dealing with user-click histories I have tried to model out the clicked-items as a large connected graph. The strength of the connection between any two items is determined by the number of times they co-occurred across all user's click histories. 

The idea is simple and works quite well when processing large datasets on a medium sized cluster. I also have a patch uploaded to Mahout http://issues.apache.org/jira/browse/MAHOUT-103 though I haven't had enough time to get it into a commitable shape.

Regards
-Ankur

----- Original Message -----
From: "Michal Laclavik" <la...@savba.sk>
To: common-user@hadoop.apache.org
Sent: Thursday, July 2, 2009 4:51:36 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
Subject: Re: FYI, Large-scale graph computing at Google

very interesting discussion ...

we are dealing with processing social networks from email
communication with connection to other objects extracted from the
email.

Extraction of the network works very fine on hadoop, but processing of
the graph  it it is not that easy.
We would like to implement spread activation algorithm over MR but it
is quite dificult.

Anyone tried something with spread activation on Hadoop?


Michal



On Thu, Jul 2, 2009 at 12:36 PM, Edward J. Yoon<ed...@apache.org> wrote:
> Thanks. BTW this link seems broken. Could you send me a paper? ;)
>
> And, We've just begun to design the Hamburg --
> http://wiki.apache.org/hadoop/Hamburg -- any comments are welcome.
>
> On Wed, Jul 1, 2009 at 1:38 AM, Delip Rao<de...@gmail.com> wrote:
>> We've had some success in dealing with locality problems using the adjacency
>> list
>> representation. This could be serialized using frameworks like Thrift
>> or Protocol Buffers.
>> For details, please see:
>> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf
>>
>> I intend to
>> continue this line of work and will be very happy to be of any help.
>>
>> On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
>>> Let's discuss about the graph computing framework named Hambrug.
>>>
>>> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org>
>>> wrote:
>>> > To be honest, I was thought the BigTable (HBase) for the map/reduce
>>> > based graph/matrix operations. The main problems of performance were
>>> > the sequential algorithm, the cost for MR job building in iterations.
>>> > and, the locality of adjacent components. As mentioned on Pregel, If
>>> > some algorithm requires small resources to get result, the BSP model
>>> > based another computing framework on HDFS can be useful for us.
>>> >
>>> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com>
>>> wrote:
>>> >> I've been working on some graph stuff using MR as well. I'd be more than
>>> >> interested to chip in as well..
>>> >>
>>> >> I remember exchanging a few mails with Paolo about having an RDF store
>>> over
>>> >> HBase and developing graph algorithms over it.
>>> >>
>>> >>
>>> >> Amandeep Khurana
>>> >> Computer Science Graduate Student
>>> >> University of California, Santa Cruz
>>> >>
>>> >>
>>> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org>
>>> wrote:
>>> >>
>>> >>> Edward J. Yoon wrote:
>>> >>>
>>> >>>> What do you think about another new computation framework on HDFS?
>>> >>>>
>>> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>> wrote:
>>> >>>>
>>> >>>>>
>>> >>>>>
>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>> >>>>> -- It sounds like Pregel seems, a computing framework based on
>>> dynamic
>>> >>>>> programming for the graph operations. I guess maybe they removed the
>>> >>>>> file communications/intermediate files during iterations.
>>> >>>>>
>>> >>>>> Anyway, What do you think?
>>> >>>>>
>>> >>>>
>>> >>> I have a colleague (paolo) who would be interested in adding a set of
>>> graph
>>> >>> algorithms on top of the MR engine
>>> >>>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon @ NHN, corp.
>>> > edwardyoon@apache.org
>>> > http://blog.udanax.org
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
S pozdravom
Michal Laclavik
==
Institute of Informatics SAS
email: laclavik.ui@savba.sk
web: http://laclavik.net/

Re: FYI, Large-scale graph computing at Google

Posted by Delip Rao <de...@gmail.com>.

@Ed, the paper is still available at
http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf<http://www.clsp.jhu.edu/%7Edelip/nocrawl/textgraphs09.pdf>
(Looks like you accessed when our server closet had a brief power outage)

@Michal, as Ted suggests this is easy to implement if you run several
iterations. Please look for the algorithm "Label Propagation" in the above
paper for something very close to your needs.

- delip

On Thu, Jul 2, 2009 at 4:21 AM, Michal Laclavik <la...@savba.sk>wrote:

> very interesting discussion ...
>
> we are dealing with processing social networks from email
> communication with connection to other objects extracted from the
> email.
>
> Extraction of the network works very fine on hadoop, but processing of
> the graph  it it is not that easy.
> We would like to implement spread activation algorithm over MR but it
> is quite dificult.
>
> Anyone tried something with spread activation on Hadoop?
>
>
> Michal
>
>
>
> On Thu, Jul 2, 2009 at 12:36 PM, Edward J. Yoon<ed...@apache.org>
> wrote:
> > Thanks. BTW this link seems broken. Could you send me a paper? ;)
> >
> > And, We've just begun to design the Hamburg --
> > http://wiki.apache.org/hadoop/Hamburg -- any comments are welcome.
> >
> > On Wed, Jul 1, 2009 at 1:38 AM, Delip Rao<de...@gmail.com> wrote:
> >> We've had some success in dealing with locality problems using the
> adjacency
> >> list
> >> representation. This could be serialized using frameworks like Thrift
> >> or Protocol Buffers.
> >> For details, please see:
> >> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf<http://www.clsp.jhu.edu/%7Edelip/nocrawl/textgraphs09.pdf>
> >>
> >> I intend to
> >> continue this line of work and will be very happy to be of any help.
> >>
> >> On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >>
> >>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
> >>> Let's discuss about the graph computing framework named Hambrug.
> >>>
> >>> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org>
> >>> wrote:
> >>> > To be honest, I was thought the BigTable (HBase) for the map/reduce
> >>> > based graph/matrix operations. The main problems of performance were
> >>> > the sequential algorithm, the cost for MR job building in iterations.
> >>> > and, the locality of adjacent components. As mentioned on Pregel, If
> >>> > some algorithm requires small resources to get result, the BSP model
> >>> > based another computing framework on HDFS can be useful for us.
> >>> >
> >>> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com>
> >>> wrote:
> >>> >> I've been working on some graph stuff using MR as well. I'd be more
> than
> >>> >> interested to chip in as well..
> >>> >>
> >>> >> I remember exchanging a few mails with Paolo about having an RDF
> store
> >>> over
> >>> >> HBase and developing graph algorithms over it.
> >>> >>
> >>> >>
> >>> >> Amandeep Khurana
> >>> >> Computer Science Graduate Student
> >>> >> University of California, Santa Cruz
> >>> >>
> >>> >>
> >>> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org>
> >>> wrote:
> >>> >>
> >>> >>> Edward J. Yoon wrote:
> >>> >>>
> >>> >>>> What do you think about another new computation framework on HDFS?
> >>> >>>>
> >>> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> >>>> wrote:
> >>> >>>>
> >>> >>>>>
> >>> >>>>>
> >>>
> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
> >>> >>>>> -- It sounds like Pregel seems, a computing framework based on
> >>> dynamic
> >>> >>>>> programming for the graph operations. I guess maybe they removed
> the
> >>> >>>>> file communications/intermediate files during iterations.
> >>> >>>>>
> >>> >>>>> Anyway, What do you think?
> >>> >>>>>
> >>> >>>>
> >>> >>> I have a colleague (paolo) who would be interested in adding a set
> of
> >>> graph
> >>> >>> algorithms on top of the MR engine
> >>> >>>
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Best Regards, Edward J. Yoon @ NHN, corp.
> >>> > edwardyoon@apache.org
> >>> > http://blog.udanax.org
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon @ NHN, corp.
> >>> edwardyoon@apache.org
> >>> http://blog.udanax.org
> >>>
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon @ NHN, corp.
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >
>
>
>
> --
> S pozdravom
> Michal Laclavik
> ==
> Institute of Informatics SAS
> email: laclavik.ui@savba.sk
> web: http://laclavik.net/
>

Re: FYI, Large-scale graph computing at Google

Posted by Ted Dunning <te...@gmail.com>.

Michal,

Can you say why it is difficult?

Is it because you have to run many map-reduce iterations?

If you allow many iterations, it seems like a fairly simple map reduce
program for each iteration:

map:  {emit current state keyed by current node, emit activations to
neighbors}
combine: {pass current state unchanged, accumulate activations}
reduce: {accumulate activations}

Clearly, I don't understand something about what you are doing since this
seems pretty simple.

Is it just the number of times this has to run that is the problem?

On Thu, Jul 2, 2009 at 4:21 AM, Michal Laclavik <la...@savba.sk>wrote:

> We would like to implement spread activation algorithm over MR but it
> is quite dificult.
>
> Anyone tried something with spread activation on Hadoop?
>

Re: FYI, Large-scale graph computing at Google

Posted by Michal Laclavik <la...@savba.sk>.

very interesting discussion ...

we are dealing with processing social networks from email
communication with connection to other objects extracted from the
email.

Extraction of the network works very fine on hadoop, but processing of
the graph  it it is not that easy.
We would like to implement spread activation algorithm over MR but it
is quite dificult.

Anyone tried something with spread activation on Hadoop?


Michal



On Thu, Jul 2, 2009 at 12:36 PM, Edward J. Yoon<ed...@apache.org> wrote:
> Thanks. BTW this link seems broken. Could you send me a paper? ;)
>
> And, We've just begun to design the Hamburg --
> http://wiki.apache.org/hadoop/Hamburg -- any comments are welcome.
>
> On Wed, Jul 1, 2009 at 1:38 AM, Delip Rao<de...@gmail.com> wrote:
>> We've had some success in dealing with locality problems using the adjacency
>> list
>> representation. This could be serialized using frameworks like Thrift
>> or Protocol Buffers.
>> For details, please see:
>> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf
>>
>> I intend to
>> continue this line of work and will be very happy to be of any help.
>>
>> On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
>>> Let's discuss about the graph computing framework named Hambrug.
>>>
>>> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org>
>>> wrote:
>>> > To be honest, I was thought the BigTable (HBase) for the map/reduce
>>> > based graph/matrix operations. The main problems of performance were
>>> > the sequential algorithm, the cost for MR job building in iterations.
>>> > and, the locality of adjacent components. As mentioned on Pregel, If
>>> > some algorithm requires small resources to get result, the BSP model
>>> > based another computing framework on HDFS can be useful for us.
>>> >
>>> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com>
>>> wrote:
>>> >> I've been working on some graph stuff using MR as well. I'd be more than
>>> >> interested to chip in as well..
>>> >>
>>> >> I remember exchanging a few mails with Paolo about having an RDF store
>>> over
>>> >> HBase and developing graph algorithms over it.
>>> >>
>>> >>
>>> >> Amandeep Khurana
>>> >> Computer Science Graduate Student
>>> >> University of California, Santa Cruz
>>> >>
>>> >>
>>> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org>
>>> wrote:
>>> >>
>>> >>> Edward J. Yoon wrote:
>>> >>>
>>> >>>> What do you think about another new computation framework on HDFS?
>>> >>>>
>>> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>> wrote:
>>> >>>>
>>> >>>>>
>>> >>>>>
>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>> >>>>> -- It sounds like Pregel seems, a computing framework based on
>>> dynamic
>>> >>>>> programming for the graph operations. I guess maybe they removed the
>>> >>>>> file communications/intermediate files during iterations.
>>> >>>>>
>>> >>>>> Anyway, What do you think?
>>> >>>>>
>>> >>>>
>>> >>> I have a colleague (paolo) who would be interested in adding a set of
>>> graph
>>> >>> algorithms on top of the MR engine
>>> >>>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon @ NHN, corp.
>>> > edwardyoon@apache.org
>>> > http://blog.udanax.org
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
S pozdravom
Michal Laclavik
==
Institute of Informatics SAS
email: laclavik.ui@savba.sk
web: http://laclavik.net/

Re: FYI, Large-scale graph computing at Google

Posted by "Edward J. Yoon" <ed...@apache.org>.

Thanks. BTW this link seems broken. Could you send me a paper? ;)

And, We've just begun to design the Hamburg --
http://wiki.apache.org/hadoop/Hamburg -- any comments are welcome.

On Wed, Jul 1, 2009 at 1:38 AM, Delip Rao<de...@gmail.com> wrote:
> We've had some success in dealing with locality problems using the adjacency
> list
> representation. This could be serialized using frameworks like Thrift
> or Protocol Buffers.
> For details, please see:
> http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf
>
> I intend to
> continue this line of work and will be very happy to be of any help.
>
> On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
>> Let's discuss about the graph computing framework named Hambrug.
>>
>> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org>
>> wrote:
>> > To be honest, I was thought the BigTable (HBase) for the map/reduce
>> > based graph/matrix operations. The main problems of performance were
>> > the sequential algorithm, the cost for MR job building in iterations.
>> > and, the locality of adjacent components. As mentioned on Pregel, If
>> > some algorithm requires small resources to get result, the BSP model
>> > based another computing framework on HDFS can be useful for us.
>> >
>> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com>
>> wrote:
>> >> I've been working on some graph stuff using MR as well. I'd be more than
>> >> interested to chip in as well..
>> >>
>> >> I remember exchanging a few mails with Paolo about having an RDF store
>> over
>> >> HBase and developing graph algorithms over it.
>> >>
>> >>
>> >> Amandeep Khurana
>> >> Computer Science Graduate Student
>> >> University of California, Santa Cruz
>> >>
>> >>
>> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org>
>> wrote:
>> >>
>> >>> Edward J. Yoon wrote:
>> >>>
>> >>>> What do you think about another new computation framework on HDFS?
>> >>>>
>> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>>
>> >>>>>
>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>> >>>>> -- It sounds like Pregel seems, a computing framework based on
>> dynamic
>> >>>>> programming for the graph operations. I guess maybe they removed the
>> >>>>> file communications/intermediate files during iterations.
>> >>>>>
>> >>>>> Anyway, What do you think?
>> >>>>>
>> >>>>
>> >>> I have a colleague (paolo) who would be interested in adding a set of
>> graph
>> >>> algorithms on top of the MR engine
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon @ NHN, corp.
>> > edwardyoon@apache.org
>> > http://blog.udanax.org
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI, Large-scale graph computing at Google

Posted by Delip Rao <de...@gmail.com>.

We've had some success in dealing with locality problems using the adjacency
list
representation. This could be serialized using frameworks like Thrift
or Protocol Buffers.
For details, please see:
http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf

I intend to
continue this line of work and will be very happy to be of any help.

On Thu, Jun 25, 2009 at 8:24 PM, Edward J. Yoon <ed...@apache.org>wrote:

> I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
> Let's discuss about the graph computing framework named Hambrug.
>
> On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org>
> wrote:
> > To be honest, I was thought the BigTable (HBase) for the map/reduce
> > based graph/matrix operations. The main problems of performance were
> > the sequential algorithm, the cost for MR job building in iterations.
> > and, the locality of adjacent components. As mentioned on Pregel, If
> > some algorithm requires small resources to get result, the BSP model
> > based another computing framework on HDFS can be useful for us.
> >
> > On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com>
> wrote:
> >> I've been working on some graph stuff using MR as well. I'd be more than
> >> interested to chip in as well..
> >>
> >> I remember exchanging a few mails with Paolo about having an RDF store
> over
> >> HBase and developing graph algorithms over it.
> >>
> >>
> >> Amandeep Khurana
> >> Computer Science Graduate Student
> >> University of California, Santa Cruz
> >>
> >>
> >> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org>
> wrote:
> >>
> >>> Edward J. Yoon wrote:
> >>>
> >>>> What do you think about another new computation framework on HDFS?
> >>>>
> >>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>>
> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
> >>>>> -- It sounds like Pregel seems, a computing framework based on
> dynamic
> >>>>> programming for the graph operations. I guess maybe they removed the
> >>>>> file communications/intermediate files during iterations.
> >>>>>
> >>>>> Anyway, What do you think?
> >>>>>
> >>>>
> >>> I have a colleague (paolo) who would be interested in adding a set of
> graph
> >>> algorithms on top of the MR engine
> >>>
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon @ NHN, corp.
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: FYI, Large-scale graph computing at Google

Posted by "Edward J. Yoon" <ed...@apache.org>.

I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug --
Let's discuss about the graph computing framework named Hambrug.

On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon<ed...@apache.org> wrote:
> To be honest, I was thought the BigTable (HBase) for the map/reduce
> based graph/matrix operations. The main problems of performance were
> the sequential algorithm, the cost for MR job building in iterations.
> and, the locality of adjacent components. As mentioned on Pregel, If
> some algorithm requires small resources to get result, the BSP model
> based another computing framework on HDFS can be useful for us.
>
> On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com> wrote:
>> I've been working on some graph stuff using MR as well. I'd be more than
>> interested to chip in as well..
>>
>> I remember exchanging a few mails with Paolo about having an RDF store over
>> HBase and developing graph algorithms over it.
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org> wrote:
>>
>>> Edward J. Yoon wrote:
>>>
>>>> What do you think about another new computation framework on HDFS?
>>>>
>>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org>
>>>> wrote:
>>>>
>>>>>
>>>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>>>> -- It sounds like Pregel seems, a computing framework based on dynamic
>>>>> programming for the graph operations. I guess maybe they removed the
>>>>> file communications/intermediate files during iterations.
>>>>>
>>>>> Anyway, What do you think?
>>>>>
>>>>
>>> I have a colleague (paolo) who would be interested in adding a set of graph
>>> algorithms on top of the MR engine
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI, Large-scale graph computing at Google

Posted by "Edward J. Yoon" <ed...@apache.org>.

To be honest, I was thought the BigTable (HBase) for the map/reduce
based graph/matrix operations. The main problems of performance were
the sequential algorithm, the cost for MR job building in iterations.
and, the locality of adjacent components. As mentioned on Pregel, If
some algorithm requires small resources to get result, the BSP model
based another computing framework on HDFS can be useful for us.

On Fri, Jun 26, 2009 at 3:37 AM, Amandeep Khurana<am...@gmail.com> wrote:
> I've been working on some graph stuff using MR as well. I'd be more than
> interested to chip in as well..
>
> I remember exchanging a few mails with Paolo about having an RDF store over
> HBase and developing graph algorithms over it.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org> wrote:
>
>> Edward J. Yoon wrote:
>>
>>> What do you think about another new computation framework on HDFS?
>>>
>>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>>
>>>>
>>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>>> -- It sounds like Pregel seems, a computing framework based on dynamic
>>>> programming for the graph operations. I guess maybe they removed the
>>>> file communications/intermediate files during iterations.
>>>>
>>>> Anyway, What do you think?
>>>>
>>>
>> I have a colleague (paolo) who would be interested in adding a set of graph
>> algorithms on top of the MR engine
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI, Large-scale graph computing at Google

Posted by Amandeep Khurana <am...@gmail.com>.

I've been working on some graph stuff using MR as well. I'd be more than
interested to chip in as well..

I remember exchanging a few mails with Paolo about having an RDF store over
HBase and developing graph algorithms over it.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Thu, Jun 25, 2009 at 2:57 AM, Steve Loughran <st...@apache.org> wrote:

> Edward J. Yoon wrote:
>
>> What do you think about another new computation framework on HDFS?
>>
>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>>
>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>> -- It sounds like Pregel seems, a computing framework based on dynamic
>>> programming for the graph operations. I guess maybe they removed the
>>> file communications/intermediate files during iterations.
>>>
>>> Anyway, What do you think?
>>>
>>
> I have a colleague (paolo) who would be interested in adding a set of graph
> algorithms on top of the MR engine
>

Re: FYI, Large-scale graph computing at Google

Posted by Steve Loughran <st...@apache.org>.

mike anderson wrote:
> This would be really useful for my current projects. I'd be more than happy
> to help out if needed.
> 

well the first bit of code to play with then is this

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/

the standalone.xml file is the one you want to build and run with, the 
other would require you to check out and build two levels up, but gives 
you the ability to bring up local or remote clusters to test. Call 
run-local to run it locally., which should give you some stats like this:

      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Counters: 11
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:   File Systems
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Local 
bytes read=209445683448
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Local 
bytes written=173943642259
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:   Map-Reduce 
Framework
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Reduce 
input groups=9985124
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Combine 
output records=34
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Map input 
records=24383448
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Reduce 
output records=16494967
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Map 
output bytes=1243216870
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Map input 
bytes=1528854187
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Combine 
input records=4528655
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Map 
output records=41958636
      [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool:     Reduce 
input records=37430015

======================================================================
Exiting project "citerank"
======================================================================

BUILD SUCCESSFUL - at 25/06/09 17:09
Total time: 9 minutes 1 second

-- 
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Re: FYI, Large-scale graph computing at Google

Posted by mike anderson <sa...@gmail.com>.

This would be really useful for my current projects. I'd be more than happy
to help out if needed.

On Thu, Jun 25, 2009 at 5:57 AM, Steve Loughran <st...@apache.org> wrote:

> Edward J. Yoon wrote:
>
>> What do you think about another new computation framework on HDFS?
>>
>> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>>
>>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>>> -- It sounds like Pregel seems, a computing framework based on dynamic
>>> programming for the graph operations. I guess maybe they removed the
>>> file communications/intermediate files during iterations.
>>>
>>> Anyway, What do you think?
>>>
>>
> I have a colleague (paolo) who would be interested in adding a set of graph
> algorithms on top of the MR engine
>

Re: FYI, Large-scale graph computing at Google

Posted by Steve Loughran <st...@apache.org>.

Edward J. Yoon wrote:
> What do you think about another new computation framework on HDFS?
> 
> On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
>> -- It sounds like Pregel seems, a computing framework based on dynamic
>> programming for the graph operations. I guess maybe they removed the
>> file communications/intermediate files during iterations.
>>
>> Anyway, What do you think?

I have a colleague (paolo) who would be interested in adding a set of 
graph algorithms on top of the MR engine

Re: FYI, Large-scale graph computing at Google

Posted by "Edward J. Yoon" <ed...@apache.org>.

What do you think about another new computation framework on HDFS?

On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon <ed...@apache.org> wrote:
>
> http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
> -- It sounds like Pregel seems, a computing framework based on dynamic
> programming for the graph operations. I guess maybe they removed the
> file communications/intermediate files during iterations.
>
> Anyway, What do you think?
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org



--
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org