You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Brian Ferris <bd...@cs.washington.edu> on 2009/05/13 19:51:53 UTC

HOD 0.20.0 - deallocate fails to terminate cluster

I've been running HOD with hadoop-0.20.0 successfully on our Torque  
cluster and it runs map-reduce jobs as advertised.  The only hiccup  
comes when I try to deallocate the cluster.  I run the following  
command:

 > hod deallocate path/to/my/cluster_config

The HOD executable cleans up the cluster_config directory, but it  
doesn't seem to actually terminate the actual HOD process job in the  
Torque queue.  I can still see it if I run "qstat" and it doesn't go  
away, even after waiting a few minutes.  Eventually, I have to kill  
the task with "qdel".

Shouldn't the deallocate operation automatically stop the HOD job?  Is  
this a bug?  Something I haven't configured correctly?  How might I go  
about debugging what's wrong if it is a bug?

This obviously isn't mission critical, since everything else is  
working correctly and qdel seems to do the trick either way.  Mostly  
I'm just curious.

More details and version information:

Fedora 7
hadoop-0.20.0
torque-2.1.10-1.fc7
torque-client-2.1.10-1.fc7
torque-scheduler-2.1.10-1.fc7
libtorque-2.1.10-1.fc7
torque-gui-2.1.10-1.fc7
torque-docs-2.1.10-1.fc7
torque-server-2.1.10-1.fc7

Thanks,
Brian