You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by rishi pathak <ma...@gmail.com> on 2011/01/14 06:59:00 UTC

Nutch hadoop and Torque integration

Hello,
           Sorry for cross posting. We have a compute cluster running Torque
resource manager and Maui scheduler.
Compute cluster is almost full but at times (early mornings, late night,
holidays), resources are available in pockets( 2-10 nodes for 2-5 hrs).
Our idea is setup nutch(hadoop) in a way to utilize these pockets i.e. an
automated  system wherein a long crawling job is broken down in to smaller
map/red jobs
. The system would be constantly monitoring the availability of resources
and would request, execute and finalize these smaller tasks using resource
manager interface. We had a look at HOD but to the extent of my knowledge
about it, it does not serve the purpose.

In a way it is too much to ask and may be a complete solution is not
available but any pointers/links are more than welcomed.

We are also looking at JobStream.py available at
http://wiki.apache.org/nutch/Automating_Fetches_with_Python

Thanks

-- 
Rishi Pathak
National PARAM Supercomputing Facility
C-DAC