You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apurv Verma <da...@gmail.com> on 2012/03/24 13:55:43 UTC
GSoC2012 Idea: Integrating Nutch With Hama
Hi,
Would the Nutch community be interested in integrating Nutch and Hama.
Apache Hama is a Bulk Synchronous Parallel programming model written on top
of HDFS, highly suited for graph algorithms.
Currently Nutch supports running with Map Reduce paradigm. If the community
is interested I would like to take it up as a gsoc project.
--
thanks and regards,
Apurv Verma
B. Tech.(CSE)
IIT- Ropar
Re: GSoC2012 Idea: Integrating Nutch With Hama
Posted by Apurv Verma <da...@gmail.com>.
Hi,
Here is what I think, please correct me if I am wrong.
1. At its core, since Nutch is a web crawler, there must be a bfs going
on. In local mode we would be using a simple bfs algorithm but in deploy
mode we need a distributed version of it.
In the current version of Nutch, this should have been implemented as a
Map Reduce program. My suggestion is to implement it as a BSP program using
Hama.
Advantages:
BSP is naturally suited model for graph algorithms. Please see [0] and
[1]. IMO we should see a performance improvement with Hama.
[0]
http://www.slideshare.net/chodakowski/processing-graphrelational-data-with-mapreduce-and-bulk-synchronous-parallel
[1]
http://www.slideshare.net/udanax/apache-hama-an-introduction-tobulk-synchronization-parallel-on-hadoop-2699426
--
thanks and regards,
Apurv Verma
B. Tech.(CSE)
IIT- Ropar
On Sun, Mar 25, 2012 at 2:50 AM, Mathijs Homminga <
mathijs.homminga@kalooga.com> wrote:
> This is interesting, can you elaborate a bit more on this. In what way do
> you think could Nutch benefit from an implementation in Hama?
>
> Mathijs Homminga
>
> On 24 mrt. 2012, at 13:55, Apurv Verma wrote:
>
> > Hi,
> > Would the Nutch community be interested in integrating Nutch and Hama.
> Apache Hama is a Bulk Synchronous Parallel programming model written on top
> of HDFS, highly suited for graph algorithms.
> > Currently Nutch supports running with Map Reduce paradigm. If the
> community is interested I would like to take it up as a gsoc project.
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> > B. Tech.(CSE)
> > IIT- Ropar
> >
> >
> >
> >
>
>
Re: GSoC2012 Idea: Integrating Nutch With Hama
Posted by Mathijs Homminga <ma...@kalooga.com>.
This is interesting, can you elaborate a bit more on this. In what way do you think could Nutch benefit from an implementation in Hama?
Mathijs Homminga
On 24 mrt. 2012, at 13:55, Apurv Verma wrote:
> Hi,
> Would the Nutch community be interested in integrating Nutch and Hama. Apache Hama is a Bulk Synchronous Parallel programming model written on top of HDFS, highly suited for graph algorithms.
> Currently Nutch supports running with Map Reduce paradigm. If the community is interested I would like to take it up as a gsoc project.
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>
>
>
>