You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Lukáš Vlček <lu...@gmail.com> on 2009/02/01 20:37:25 UTC

Re: Finding longest path in a graph

Hi,
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.

BTW: Andrzej, do you consider contributing your graph processing utilis into
Hadoop or Mahout?

Regards,
Lukas

On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com> wrote:

> Andrzej,
> without deeper understanding of exactly what you are doing, I have a gut
> feeling that a different distributed system might be a better fit for this
> specific task. I assume, you are dealing with very large graphs if you are
> using Hadoop, and you want grid processing. But the linear nature of
> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
> task can be easily expressed this way.
>
> The other technology I mean is JavaSpaces, of which I usually use the
> GigaSpaces implementation. This allows more complex algorithms. You will
> store your complete graph as an appropriate structure in a JavaSpace, and
> will also restructure it for parallel processing, as outlined in some
> JavaSpaces books. Then you can have as many workers as you want, working on
> individual nodes.
>
> Mark
>
> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> > Hi,
> >
> > I'm looking for an advice. I need to process a directed graph encoded as
> a
> > list of <from, to> pairs. The goal is to compute a list of longest paths
> in
> > the graph. There is no guarantee that the graph is acyclic, so there
> should
> > be some mechanism to detect cycles.
> >
> > Currently I'm using a simple approach consisting of the following: I
> encode
> > the graph as <vertex, <neighbor, direction, distance>>, and extending the
> > paths by one degree at a time. This means that in order to find the
> longest
> > path of degree N it takes N + 1 map-reduce jobs.
> >
> > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > pointers.
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>

-- 
http://blog.lukas-vlcek.com/

Re: Finding longest path in a graph

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.

one thing that people seem to routinely forget is HBase --random
access for MR jobs.  the basic problem with MR is that mappers are
independent of each other and you need to marshall the overall flow to
overcome this (for example, rekey items so that they become localised
within a single mapper/reducer).  HBase / Hypertable etc allow you to
do away with this step, at the cost of network latency.

once you realise this then you are pretty much in standard sequential
programming territory.

Miles

2009/2/1 Ricky Ho <rh...@adobe.com>:
> Yes, Map/Reduce model itself is simple.  But transforming a non-trivial algorithm into Map/Reduce is not simple.
>
> I wonder if there is any effort from the academia to look into build a catalog of how each of our familiar algorithms can be transformed into Map/Reduce, such as ...
> 1) Sorting
> 2) Searching (index tree, hash, geo/spacial search)
> 3) Network/Graph processing (min spanning tree, shortest path, network diameter)
> 4) Computational Geometry (Ray tracing, Convex Hull)
> 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> 6) Machine learning (logistic regression, nearest neighbor, cluster, Bayesian classification, decision tree, neural network)
>
> It is also important to recognize there are other models in our tool box (besides map/reduce) for parallelizing an algorithm, such as ...
> a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> b) Dependency graph / Data flow programming (e.g. Dryad)
> c) MPI
> d) Multi-thread / Shared memory model
> e) ... anything else ...
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 11:37 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Hi,
> just my 2 cents.
> You are right that MapReduce does not fit to all problems. Expecially, when
> it comes to processing graphs. On the other hand I think even in Google they
> stick with MapReduce for complex graph processing (and they do it in Yahoo!
> as well). I had a chance to talk to one Google engineer and I asked him
> exactly this question ("Do you use MapReduce even if it is known that
> specific problems can be solved with different architecture in a more
> efficient way?"). And the answer was "yes". I don't know if Google engineers
> are using different architectures - and I would be surprised if not - but
> they probably don't use it at the same scale as they do with MapReduce.
> There are several good reasons for this: MapReduce is easy to learn (which
> means that even fresh interns can use it very quickly - also in not
> efficient way), they probably have very nice visual tools for managing
> MapReduce clusters and finally they have a lot of HW resources.
>
> BTW: Andrzej, do you consider contributing your graph processing utilis into
> Hadoop or Mahout?
>
> Regards,
> Lukas
>
> On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com> wrote:
>
>> Andrzej,
>> without deeper understanding of exactly what you are doing, I have a gut
>> feeling that a different distributed system might be a better fit for this
>> specific task. I assume, you are dealing with very large graphs if you are
>> using Hadoop, and you want grid processing. But the linear nature of
>> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
>> task can be easily expressed this way.
>>
>> The other technology I mean is JavaSpaces, of which I usually use the
>> GigaSpaces implementation. This allows more complex algorithms. You will
>> store your complete graph as an appropriate structure in a JavaSpace, and
>> will also restructure it for parallel processing, as outlined in some
>> JavaSpaces books. Then you can have as many workers as you want, working on
>> individual nodes.
>>
>> Mark
>>
>> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>
>> > Hi,
>> >
>> > I'm looking for an advice. I need to process a directed graph encoded as
>> a
>> > list of <from, to> pairs. The goal is to compute a list of longest paths
>> in
>> > the graph. There is no guarantee that the graph is acyclic, so there
>> should
>> > be some mechanism to detect cycles.
>> >
>> > Currently I'm using a simple approach consisting of the following: I
>> encode
>> > the graph as <vertex, <neighbor, direction, distance>>, and extending the
>> > paths by one degree at a time. This means that in order to find the
>> longest
>> > path of degree N it takes N + 1 map-reduce jobs.
>> >
>> > Are you perhaps aware of a smarter way to do it? I would appreciate any
>> > pointers.
>> >
>> > --
>> > Best regards,
>> > Andrzej Bialecki     <><
>> >  ___. ___ ___ ___ _ _   __________________________________
>> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> > http://www.sigram.com  Contact: info at sigram dot com
>> >
>> >
>>
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

RE: Finding longest path in a graph

Posted by Ricky Ho <rh...@adobe.com>.

I haven't.  Thanks for the link

-----Original Message-----
From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com] 
Sent: Sunday, February 01, 2009 10:43 PM
To: core-user@hadoop.apache.org
Subject: Re: Finding longest path in a graph

Did you look at the paper which is a base for Mahout implementation?
http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf

I think it dicusses some theoretical aspects of such transformation.

Lukas

2009/2/2 Ricky Ho <rh...@adobe.com>

> I heard about Mahout address the machine learning portion that but haven't
> look at any detail yet.
>
> I am looking more from the theory side (how to transform the algorithm into
> the Map/Reduce form) and not necessary an implementation.
>
> Rgds,
> Ricky
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 10:29 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Ricky,
> Are you aware of Mahout project?
> http://lucene.apache.org/mahout/
>
> I think this project tries to addess some of the algorithms you have
> mentioned.
>
> Regards,
> Lukas
>
> 2009/2/2 Ricky Ho <rh...@adobe.com>
>
> > Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> > algorithm into Map/Reduce is not simple.
> >
> > I wonder if there is any effort from the academia to look into build a
> > catalog of how each of our familiar algorithms can be transformed into
> > Map/Reduce, such as ...
> > 1) Sorting
> > 2) Searching (index tree, hash, geo/spacial search)
> > 3) Network/Graph processing (min spanning tree, shortest path, network
> > diameter)
> > 4) Computational Geometry (Ray tracing, Convex Hull)
> > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> > 6) Machine learning (logistic regression, nearest neighbor, cluster,
> > Bayesian classification, decision tree, neural network)
> >
> > It is also important to recognize there are other models in our tool box
> > (besides map/reduce) for parallelizing an algorithm, such as ...
> > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> > b) Dependency graph / Data flow programming (e.g. Dryad)
> > c) MPI
> > d) Multi-thread / Shared memory model
> > e) ... anything else ...
> >
> > Rgds,
> > Ricky
> >
> > -----Original Message-----
> > From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> > Sent: Sunday, February 01, 2009 11:37 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Finding longest path in a graph
> >
> > Hi,
> > just my 2 cents.
> > You are right that MapReduce does not fit to all problems. Expecially,
> when
> > it comes to processing graphs. On the other hand I think even in Google
> > they
> > stick with MapReduce for complex graph processing (and they do it in
> Yahoo!
> > as well). I had a chance to talk to one Google engineer and I asked him
> > exactly this question ("Do you use MapReduce even if it is known that
> > specific problems can be solved with different architecture in a more
> > efficient way?"). And the answer was "yes". I don't know if Google
> > engineers
> > are using different architectures - and I would be surprised if not - but
> > they probably don't use it at the same scale as they do with MapReduce.
> > There are several good reasons for this: MapReduce is easy to learn
> (which
> > means that even fresh interns can use it very quickly - also in not
> > efficient way), they probably have very nice visual tools for managing
> > MapReduce clusters and finally they have a lot of HW resources.
> >
> > BTW: Andrzej, do you consider contributing your graph processing utilis
> > into
> > Hadoop or Mahout?
> >
> > Regards,
> > Lukas
> >
> > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com>
> > wrote:
> >
> > > Andrzej,
> > > without deeper understanding of exactly what you are doing, I have a
> gut
> > > feeling that a different distributed system might be a better fit for
> > this
> > > specific task. I assume, you are dealing with very large graphs if you
> > are
> > > using Hadoop, and you want grid processing. But the linear nature of
> > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> > every
> > > task can be easily expressed this way.
> > >
> > > The other technology I mean is JavaSpaces, of which I usually use the
> > > GigaSpaces implementation. This allows more complex algorithms. You
> will
> > > store your complete graph as an appropriate structure in a JavaSpace,
> and
> > > will also restructure it for parallel processing, as outlined in some
> > > JavaSpaces books. Then you can have as many workers as you want,
> working
> > on
> > > individual nodes.
> > >
> > > Mark
> > >
> > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking for an advice. I need to process a directed graph encoded
> > as
> > > a
> > > > list of <from, to> pairs. The goal is to compute a list of longest
> > paths
> > > in
> > > > the graph. There is no guarantee that the graph is acyclic, so there
> > > should
> > > > be some mechanism to detect cycles.
> > > >
> > > > Currently I'm using a simple approach consisting of the following: I
> > > encode
> > > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> > the
> > > > paths by one degree at a time. This means that in order to find the
> > > longest
> > > > path of degree N it takes N + 1 map-reduce jobs.
> > > >
> > > > Are you perhaps aware of a smarter way to do it? I would appreciate
> any
> > > > pointers.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >  ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > http://blog.lukas-vlcek.com/
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

Re: Finding longest path in a graph

Posted by Lukáš Vlček <lu...@gmail.com>.

Did you look at the paper which is a base for Mahout implementation?
http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf

I think it dicusses some theoretical aspects of such transformation.

Lukas

2009/2/2 Ricky Ho <rh...@adobe.com>

> I heard about Mahout address the machine learning portion that but haven't
> look at any detail yet.
>
> I am looking more from the theory side (how to transform the algorithm into
> the Map/Reduce form) and not necessary an implementation.
>
> Rgds,
> Ricky
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 10:29 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Ricky,
> Are you aware of Mahout project?
> http://lucene.apache.org/mahout/
>
> I think this project tries to addess some of the algorithms you have
> mentioned.
>
> Regards,
> Lukas
>
> 2009/2/2 Ricky Ho <rh...@adobe.com>
>
> > Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> > algorithm into Map/Reduce is not simple.
> >
> > I wonder if there is any effort from the academia to look into build a
> > catalog of how each of our familiar algorithms can be transformed into
> > Map/Reduce, such as ...
> > 1) Sorting
> > 2) Searching (index tree, hash, geo/spacial search)
> > 3) Network/Graph processing (min spanning tree, shortest path, network
> > diameter)
> > 4) Computational Geometry (Ray tracing, Convex Hull)
> > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> > 6) Machine learning (logistic regression, nearest neighbor, cluster,
> > Bayesian classification, decision tree, neural network)
> >
> > It is also important to recognize there are other models in our tool box
> > (besides map/reduce) for parallelizing an algorithm, such as ...
> > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> > b) Dependency graph / Data flow programming (e.g. Dryad)
> > c) MPI
> > d) Multi-thread / Shared memory model
> > e) ... anything else ...
> >
> > Rgds,
> > Ricky
> >
> > -----Original Message-----
> > From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> > Sent: Sunday, February 01, 2009 11:37 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Finding longest path in a graph
> >
> > Hi,
> > just my 2 cents.
> > You are right that MapReduce does not fit to all problems. Expecially,
> when
> > it comes to processing graphs. On the other hand I think even in Google
> > they
> > stick with MapReduce for complex graph processing (and they do it in
> Yahoo!
> > as well). I had a chance to talk to one Google engineer and I asked him
> > exactly this question ("Do you use MapReduce even if it is known that
> > specific problems can be solved with different architecture in a more
> > efficient way?"). And the answer was "yes". I don't know if Google
> > engineers
> > are using different architectures - and I would be surprised if not - but
> > they probably don't use it at the same scale as they do with MapReduce.
> > There are several good reasons for this: MapReduce is easy to learn
> (which
> > means that even fresh interns can use it very quickly - also in not
> > efficient way), they probably have very nice visual tools for managing
> > MapReduce clusters and finally they have a lot of HW resources.
> >
> > BTW: Andrzej, do you consider contributing your graph processing utilis
> > into
> > Hadoop or Mahout?
> >
> > Regards,
> > Lukas
> >
> > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com>
> > wrote:
> >
> > > Andrzej,
> > > without deeper understanding of exactly what you are doing, I have a
> gut
> > > feeling that a different distributed system might be a better fit for
> > this
> > > specific task. I assume, you are dealing with very large graphs if you
> > are
> > > using Hadoop, and you want grid processing. But the linear nature of
> > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> > every
> > > task can be easily expressed this way.
> > >
> > > The other technology I mean is JavaSpaces, of which I usually use the
> > > GigaSpaces implementation. This allows more complex algorithms. You
> will
> > > store your complete graph as an appropriate structure in a JavaSpace,
> and
> > > will also restructure it for parallel processing, as outlined in some
> > > JavaSpaces books. Then you can have as many workers as you want,
> working
> > on
> > > individual nodes.
> > >
> > > Mark
> > >
> > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking for an advice. I need to process a directed graph encoded
> > as
> > > a
> > > > list of <from, to> pairs. The goal is to compute a list of longest
> > paths
> > > in
> > > > the graph. There is no guarantee that the graph is acyclic, so there
> > > should
> > > > be some mechanism to detect cycles.
> > > >
> > > > Currently I'm using a simple approach consisting of the following: I
> > > encode
> > > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> > the
> > > > paths by one degree at a time. This means that in order to find the
> > > longest
> > > > path of degree N it takes N + 1 map-reduce jobs.
> > > >
> > > > Are you perhaps aware of a smarter way to do it? I would appreciate
> any
> > > > pointers.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >  ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > http://blog.lukas-vlcek.com/
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

RE: Finding longest path in a graph

Posted by Ricky Ho <rh...@adobe.com>.

I heard about Mahout address the machine learning portion that but haven't look at any detail yet.

I am looking more from the theory side (how to transform the algorithm into the Map/Reduce form) and not necessary an implementation.

Rgds,
Ricky
-----Original Message-----
From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com] 
Sent: Sunday, February 01, 2009 10:29 PM
To: core-user@hadoop.apache.org
Subject: Re: Finding longest path in a graph

Ricky,
Are you aware of Mahout project?
http://lucene.apache.org/mahout/

I think this project tries to addess some of the algorithms you have
mentioned.

Regards,
Lukas

2009/2/2 Ricky Ho <rh...@adobe.com>

> Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> algorithm into Map/Reduce is not simple.
>
> I wonder if there is any effort from the academia to look into build a
> catalog of how each of our familiar algorithms can be transformed into
> Map/Reduce, such as ...
> 1) Sorting
> 2) Searching (index tree, hash, geo/spacial search)
> 3) Network/Graph processing (min spanning tree, shortest path, network
> diameter)
> 4) Computational Geometry (Ray tracing, Convex Hull)
> 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> 6) Machine learning (logistic regression, nearest neighbor, cluster,
> Bayesian classification, decision tree, neural network)
>
> It is also important to recognize there are other models in our tool box
> (besides map/reduce) for parallelizing an algorithm, such as ...
> a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> b) Dependency graph / Data flow programming (e.g. Dryad)
> c) MPI
> d) Multi-thread / Shared memory model
> e) ... anything else ...
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 11:37 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Hi,
> just my 2 cents.
> You are right that MapReduce does not fit to all problems. Expecially, when
> it comes to processing graphs. On the other hand I think even in Google
> they
> stick with MapReduce for complex graph processing (and they do it in Yahoo!
> as well). I had a chance to talk to one Google engineer and I asked him
> exactly this question ("Do you use MapReduce even if it is known that
> specific problems can be solved with different architecture in a more
> efficient way?"). And the answer was "yes". I don't know if Google
> engineers
> are using different architectures - and I would be surprised if not - but
> they probably don't use it at the same scale as they do with MapReduce.
> There are several good reasons for this: MapReduce is easy to learn (which
> means that even fresh interns can use it very quickly - also in not
> efficient way), they probably have very nice visual tools for managing
> MapReduce clusters and finally they have a lot of HW resources.
>
> BTW: Andrzej, do you consider contributing your graph processing utilis
> into
> Hadoop or Mahout?
>
> Regards,
> Lukas
>
> On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com>
> wrote:
>
> > Andrzej,
> > without deeper understanding of exactly what you are doing, I have a gut
> > feeling that a different distributed system might be a better fit for
> this
> > specific task. I assume, you are dealing with very large graphs if you
> are
> > using Hadoop, and you want grid processing. But the linear nature of
> > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> every
> > task can be easily expressed this way.
> >
> > The other technology I mean is JavaSpaces, of which I usually use the
> > GigaSpaces implementation. This allows more complex algorithms. You will
> > store your complete graph as an appropriate structure in a JavaSpace, and
> > will also restructure it for parallel processing, as outlined in some
> > JavaSpaces books. Then you can have as many workers as you want, working
> on
> > individual nodes.
> >
> > Mark
> >
> > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org>
> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for an advice. I need to process a directed graph encoded
> as
> > a
> > > list of <from, to> pairs. The goal is to compute a list of longest
> paths
> > in
> > > the graph. There is no guarantee that the graph is acyclic, so there
> > should
> > > be some mechanism to detect cycles.
> > >
> > > Currently I'm using a simple approach consisting of the following: I
> > encode
> > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> the
> > > paths by one degree at a time. This means that in order to find the
> > longest
> > > path of degree N it takes N + 1 map-reduce jobs.
> > >
> > > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > > pointers.
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki     <><
> > >  ___. ___ ___ ___ _ _   __________________________________
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

Re: Finding longest path in a graph

Posted by Lukáš Vlček <lu...@gmail.com>.

Ricky,
Are you aware of Mahout project?
http://lucene.apache.org/mahout/

I think this project tries to addess some of the algorithms you have
mentioned.

Regards,
Lukas

2009/2/2 Ricky Ho <rh...@adobe.com>

> Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> algorithm into Map/Reduce is not simple.
>
> I wonder if there is any effort from the academia to look into build a
> catalog of how each of our familiar algorithms can be transformed into
> Map/Reduce, such as ...
> 1) Sorting
> 2) Searching (index tree, hash, geo/spacial search)
> 3) Network/Graph processing (min spanning tree, shortest path, network
> diameter)
> 4) Computational Geometry (Ray tracing, Convex Hull)
> 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> 6) Machine learning (logistic regression, nearest neighbor, cluster,
> Bayesian classification, decision tree, neural network)
>
> It is also important to recognize there are other models in our tool box
> (besides map/reduce) for parallelizing an algorithm, such as ...
> a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> b) Dependency graph / Data flow programming (e.g. Dryad)
> c) MPI
> d) Multi-thread / Shared memory model
> e) ... anything else ...
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 11:37 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Hi,
> just my 2 cents.
> You are right that MapReduce does not fit to all problems. Expecially, when
> it comes to processing graphs. On the other hand I think even in Google
> they
> stick with MapReduce for complex graph processing (and they do it in Yahoo!
> as well). I had a chance to talk to one Google engineer and I asked him
> exactly this question ("Do you use MapReduce even if it is known that
> specific problems can be solved with different architecture in a more
> efficient way?"). And the answer was "yes". I don't know if Google
> engineers
> are using different architectures - and I would be surprised if not - but
> they probably don't use it at the same scale as they do with MapReduce.
> There are several good reasons for this: MapReduce is easy to learn (which
> means that even fresh interns can use it very quickly - also in not
> efficient way), they probably have very nice visual tools for managing
> MapReduce clusters and finally they have a lot of HW resources.
>
> BTW: Andrzej, do you consider contributing your graph processing utilis
> into
> Hadoop or Mahout?
>
> Regards,
> Lukas
>
> On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com>
> wrote:
>
> > Andrzej,
> > without deeper understanding of exactly what you are doing, I have a gut
> > feeling that a different distributed system might be a better fit for
> this
> > specific task. I assume, you are dealing with very large graphs if you
> are
> > using Hadoop, and you want grid processing. But the linear nature of
> > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> every
> > task can be easily expressed this way.
> >
> > The other technology I mean is JavaSpaces, of which I usually use the
> > GigaSpaces implementation. This allows more complex algorithms. You will
> > store your complete graph as an appropriate structure in a JavaSpace, and
> > will also restructure it for parallel processing, as outlined in some
> > JavaSpaces books. Then you can have as many workers as you want, working
> on
> > individual nodes.
> >
> > Mark
> >
> > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org>
> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for an advice. I need to process a directed graph encoded
> as
> > a
> > > list of <from, to> pairs. The goal is to compute a list of longest
> paths
> > in
> > > the graph. There is no guarantee that the graph is acyclic, so there
> > should
> > > be some mechanism to detect cycles.
> > >
> > > Currently I'm using a simple approach consisting of the following: I
> > encode
> > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> the
> > > paths by one degree at a time. This means that in order to find the
> > longest
> > > path of degree N it takes N + 1 map-reduce jobs.
> > >
> > > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > > pointers.
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki     <><
> > >  ___. ___ ___ ___ _ _   __________________________________
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

RE: Finding longest path in a graph

Posted by Ricky Ho <rh...@adobe.com>.

Yes, Map/Reduce model itself is simple.  But transforming a non-trivial algorithm into Map/Reduce is not simple.

I wonder if there is any effort from the academia to look into build a catalog of how each of our familiar algorithms can be transformed into Map/Reduce, such as ...
1) Sorting
2) Searching (index tree, hash, geo/spacial search)
3) Network/Graph processing (min spanning tree, shortest path, network diameter)
4) Computational Geometry (Ray tracing, Convex Hull)
5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
6) Machine learning (logistic regression, nearest neighbor, cluster, Bayesian classification, decision tree, neural network)

It is also important to recognize there are other models in our tool box (besides map/reduce) for parallelizing an algorithm, such as ...
a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
b) Dependency graph / Data flow programming (e.g. Dryad)
c) MPI
d) Multi-thread / Shared memory model
e) ... anything else ...

Rgds,
Ricky

-----Original Message-----
From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com] 
Sent: Sunday, February 01, 2009 11:37 AM
To: core-user@hadoop.apache.org
Subject: Re: Finding longest path in a graph

Hi,
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.

BTW: Andrzej, do you consider contributing your graph processing utilis into
Hadoop or Mahout?

Regards,
Lukas

On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <ma...@gmail.com> wrote:

> Andrzej,
> without deeper understanding of exactly what you are doing, I have a gut
> feeling that a different distributed system might be a better fit for this
> specific task. I assume, you are dealing with very large graphs if you are
> using Hadoop, and you want grid processing. But the linear nature of
> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
> task can be easily expressed this way.
>
> The other technology I mean is JavaSpaces, of which I usually use the
> GigaSpaces implementation. This allows more complex algorithms. You will
> store your complete graph as an appropriate structure in a JavaSpace, and
> will also restructure it for parallel processing, as outlined in some
> JavaSpaces books. Then you can have as many workers as you want, working on
> individual nodes.
>
> Mark
>
> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> > Hi,
> >
> > I'm looking for an advice. I need to process a directed graph encoded as
> a
> > list of <from, to> pairs. The goal is to compute a list of longest paths
> in
> > the graph. There is no guarantee that the graph is acyclic, so there
> should
> > be some mechanism to detect cycles.
> >
> > Currently I'm using a simple approach consisting of the following: I
> encode
> > the graph as <vertex, <neighbor, direction, distance>>, and extending the
> > paths by one degree at a time. This means that in order to find the
> longest
> > path of degree N it takes N + 1 map-reduce jobs.
> >
> > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > pointers.
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>

-- 
http://blog.lukas-vlcek.com/

Re: Finding longest path in a graph

Posted by Andrzej Bialecki <ab...@getopt.org>.

Lukáš Vlček wrote:

> BTW: Andrzej, do you consider contributing your graph processing utilis into
> Hadoop or Mahout?

Sorry, this was done under a contract, and the client is not willing to 
contribute the code (I already asked) ... It should be easy to 
reimplement this from scratch, based on the general approach I described.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com