You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by yh...@apache.org on 2008/06/20 18:31:41 UTC

svn commit: r669980 [1/3] - in /hadoop/core/trunk: docs/ src/contrib/hod/ src/docs/src/documentation/content/xdocs/

Author: yhemanth
Date: Fri Jun 20 09:31:41 2008
New Revision: 669980

URL: http://svn.apache.org/viewvc?rev=669980&view=rev
Log:
HADOOP-3505. Updated HOD documentation with changes made for Hadoop 0.18. Contributed by Vinod Kumar Vavilapalli.

Modified:
    hadoop/core/trunk/docs/hod_admin_guide.html
    hadoop/core/trunk/docs/hod_admin_guide.pdf
    hadoop/core/trunk/docs/hod_config_guide.html
    hadoop/core/trunk/docs/hod_config_guide.pdf
    hadoop/core/trunk/docs/hod_user_guide.html
    hadoop/core/trunk/docs/hod_user_guide.pdf
    hadoop/core/trunk/src/contrib/hod/CHANGES.txt
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_config_guide.xml
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_user_guide.xml
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/core/trunk/docs/hod_admin_guide.html
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod_admin_guide.html?rev=669980&r1=669979&r2=669980&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod_admin_guide.html (original)
+++ hadoop/core/trunk/docs/hod_admin_guide.html Fri Jun 20 09:31:41 2008
@@ -605,22 +605,27 @@
 </p>
 <a name="N10205"></a><a name="checklimits.sh+-+Tool+to+update+torque+comment+field+reflecting+resource+limits"></a>
 <h3 class="h4">checklimits.sh - Tool to update torque comment field reflecting resource limits</h3>
-<p>checklimits is a HOD tool specific to torque/maui environment. It
+<p>checklimits is a HOD tool specific to Torque/Maui environment
+      (<a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php">Maui Cluster Scheduler</a> is an open source job
+      scheduler for clusters and supercomputers, from clusterresources). The
+      checklimits.sh script
       updates torque comment field when newly submitted job(s) violate/cross
-      over user limits set up in maui scheduler. It uses qstat, does one pass
-      over torque job list to find out queued or unfinished jobs, runs maui
+      over user limits set up in Maui scheduler. It uses qstat, does one pass
+      over torque job list to find out queued or unfinished jobs, runs Maui
       tool checkjob on each job to see if user limits are violated and then
       runs torque's qalter utility to update job attribute 'comment'. Currently
       it updates the comment as <em>User-limits exceeded. Requested:([0-9]*)
       Used:([0-9]*) MaxLimit:([0-9]*)</em> for those jobs that violate limits.
       This comment field is then used by HOD to behave accordingly depending on
       the type of violation.</p>
-<a name="N10211"></a><a name="Running+checklimits.sh"></a>
+<a name="N10215"></a><a name="Running+checklimits.sh"></a>
 <h4>Running checklimits.sh</h4>
 <p>checklimits.sh is available under hod_install_location/support
         folder. This is a shell script and can be run directly as <em>sh
         checklimits.sh </em>or as <em>./checklimits.sh</em> after enabling
-        execute permissions. In order for this tool to be able to update
+        execute permissions. Torque and Maui binaries should be available
+        on the machine where the tool is run and should be in the path
+        of the shell script process. In order for this tool to be able to update
         comment field of jobs from different users, it has to be run with
         torque administrative privileges. This tool has to be run repeatedly
         after specific intervals of time to frequently update jobs violating

Modified: hadoop/core/trunk/docs/hod_admin_guide.pdf
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod_admin_guide.pdf?rev=669980&r1=669979&r2=669980&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod_admin_guide.pdf (original)
+++ hadoop/core/trunk/docs/hod_admin_guide.pdf Fri Jun 20 09:31:41 2008
@@ -450,10 +450,10 @@
 >>
 endobj
 62 0 obj
-<< /Length 2306 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 2515 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-Gatm=c^1+Z&AJ%Fi0I*pK)'+'[jL>tl"aVZ?0@Hs!J>4<@NYs-lV%9O%LCDeMm]*6H1=81Ai0,(bVpd;G.mZ^^%f=GZlT)AFV1'sWNB2OpiO4C=n`o^#gAQ7&T+lLF7Jt*J--2Fs4ctXFq6bgbCWN<jVkQdh/7gun-=VA"=uM7G&NYXK<X0KR]T2_1TXZ\hJSg+]BS>RgmEQfS2Gr"[cqlP0DG-=i^IL`[GJ1D/(&?urc@KgQ?QnR:F0=hpKI/ogV4@Bn)`acmju^u;gc8aR%;hUr3\B%lE\]V%*-rMHgU,n<p]I3YaX]O9oE7p.XLtqdqq;Ll'mLBf"Pkahkg`c#cGu`lq-%@P!hjg>WgUekuD[E3\B(aYM,^PAVW`_CO==r=,&rB1X.b(bLI*V"8A`1XEN=I(<:4p3a:SfhpPe/Ffrq//"P.\$X0(RF&0dFp:]TTk(Vcqph4FrL5aqG1`<O60cDmCXgdRD[%Da\%^\uhVOl^0/CJ(PhCP#QM_FfHeLJgObR@NR1uD^UL\c#Xi(CR?Q$BQ_Y,qrT\W0@1F"X8H;_$=7pQAQr7fT8#31^A%g3pDV8M@T$IqMKj?"(s=!b"`*?4`'lXll_J/%G:(*gcEgO*eqSpDO/TU&P9eCpQi(#noH^k0]d?:f8cb'\%LGS[3G;+2-dtF:F\q71$jl"/GI-M\rC-a=S_ab/ER1ZOAdOKZ6k`Pape8@1Pb.VpUah<"^e/pg\f+Y)na8&YhGq=Vb)GXOZ=%.nC4R2++Jt+W76kZ[R.KpRkRH3[6+^"&jEa-qPiU<MN?D>+K/_+FK'hjZQ2((tm:X@AkJhEs1MZ+BBL2_f;@Ucm$erC*dIO6Gam2%Uj\uZ>LLM8>*:".?.8:$;qdRX>h;92\<N^ODWKXa;_^]bJ'^86)X!&a7VkBjTX)V8eTT)-&EeZ>GlOG<[+)6#7o,i,MOQ/bu^n96t5gg,\6&Cd<.nSd%&W`Q?=tr#/!:eZlu1
 ::Gb'US][sWU22U_D,?@Cq#"G]]""'IV^P=fJKA7&_8o9^1I[%,n'j>CqrWPURiiWkZnQHY%aT-WAL(:iM%`EBKP]!Im[02/*`Z1^qe/I,09U&l*O0h@[f'N!!V6`Dr+aJ,O^E+ZYaSN9Z`'b:X#f'JK@c0./j9PIdi!drjoF#<-Zd'=#\r:njW2Fqr3eHm;sitqJU<Pk\m46gka8Q#o5M#f.^:,BdD!g1qO(t5IhjVkY9Vmaq#\tO.A[pFA\K_$KV26dQZBA%d/hW,8N!S"O7>'sQpsYEPWGQs0qS-\#Z_`VY84jLq'<:+(k?`gJFUskf@F9]DJ)c6kl@?:O-*)Kf8rX3NsD)-"hB;H3Ym]W`'^$_4c58<`o+"3Jd\oY5PS.uiVo*8ELDb+r-Q,I2/Tmtq/YbqSgZt'`sO[UP\:F3ck>!4)ZB1`b>J%Eqn<YE4fA(0D'3pu'*"NM4ft-g;KRY<>SdQ)&Edt1-5"O18=&]R_9hZ*G6g0f$'%GC[OoBhGu5/S1tB^`_>O?8o=_B+h9K[KIIY!Iq0qN;mfP/r1[s/=ftt1kYamK3=]1oFOqU!R)M4!*Ohd@W6Ue(GOe]A?7h%o*O[%6naBSF9Zp*&>nC\!/.tFuo_]#&8"Qb^XPjLIkA15Q&ZN2ap9F61`S&ZKApf`(:6jkDKZAbi9@FBJ=e][Q-/?0#rN(BP_">i6aLbT09dU]/Bi2,oa/j\IS4kP>2rSA#2Le3arU>6g*>=S509g\pu@:*0>c@ddo&[;4,TAHf,1`1AjA==J@.8)$c@E=M0ak7@f"YamSpBP*e'^dG:EW^A,M;iWs>O3FV#SXe_1`?/c<D"n)C4nmRV<jK'k#$DX_]<<rPa"&MBKa$WFkBg`RiI"(4:+X:7s/+b9nRqFe^rn5Pc*u=We.EnV9p)P4ai/ua&fmQ#1lWGG.<_6q#^?W;qT359M!UfKG8C@eU=78HNL^o317F)]_Q=-iO>F5*Ir2'&"f
 TZhqKr`+(rVH`sn#n)9=0u_\VlY-3lkh/?2LOBUWuj,`55:G8tqU8=eo^*+a2Z3F"oGC4cRPkp'jT8Y[Xq/]Md--."bog*EI>Gp7B1>c:Q[eAi3(MWpt%]*kQu>ffiqREQN.A8?bTShs:d`Xp>])teu+n:<;>oN/JEo/W/'n&`9o@f08Va4n,SNac>P`D(X;DuV=UFaE6iJ_qRK8$jMh=["Thr-oEj)u?^mhUGa`4bP=]$XE2=ii:iMjftPl'YQ)CZoq'`Ct&$qC5q\o5F90IB`1bDl[s:4^FN^)*P`]7n>iqaCin-Z8VpDBU-8UfIfVC5[a5~>
+Gatm=>BcQ+&:XAWi%7]bKimd(X\%e)4j?Q6.o1.+&?M[E&J-R\p!VaSJH^34(1#A.-T\?A4-skt^6urY=#[mqDSG!>D%$An]a5,'cs51V:YTTT(QRVAL>ckNe^c[/hRl!u^k)acs0b"nh18?g#^e70>;r,CG!l!5p]lR$#70ODh&GHn5A53Z7O+)'L3:A`gJ,S%2I3tf)q/YrZ_hrs1osa;jh=JcS94I.lRcRQM2m;NT6&U.cBY)oSiH!&[H?5@G9oAsl0'bgM+3pJ?Ys\>enYA:Lh;c-lPrR,9Pq2h^>:3iFqN3/@VJrL/7$a/]6gY39[ALC[pf)9qkgKE)[iknS/rKNVeL46`o7\*`H/G^SI"XN.n+g[<P....@GRa>t:c_3>^,_PY)ahI7YqK,Ce2N/B<<)bs'UjU_d:M^U9rnXtZ_KJY4/<a7kA;?DOUb/%G:(*gcEgO*eqSpDO/d+o_ZjVU6Z`KZPE%VW"+8:f8cb'\%LGS[3G;+2-dtF:F\a71$jl"/GI-K,CPckoD)r<f5FY,meH+0oNo+.QPTO_X<8bg4a$s;k7TH`Z+anFG6*Z-.VU^3*SiUb:E7*`tfp+XI>g5(=Rq;LflsT^OK2jZ)n$AYb>$?a"WMkmo>!9R9H+L"qWtPq=9*VFbK*p<9WV2VD3Vk)%\;.XMf*e#gb+sNbT_,0L3Ju5Rq4L'PHgKQ]OJD%m(QoO=<_6R$?;"8DCl"NL4V#3^MW,c!9sR\I:MtO#aWrngMbEj>*VFgFsQ3ik*`Y:5URM8YrqHY"Q@*'A7bu3(G5p7n)F]>W1<E80=.`<L>'!N._3b:cb_
 XB`0/IQ:RpdaIZNp)ZJ<da?G!#Y9$tc,4[0Vj&)HSQn'l]'(66k@G!Y!cL\bQJ]K0o7Q`:3>bh7=PU-Wqe3mdA&e?-)('o-.8Mgka-KTY8,MEf&[1A>N4)EZ#0;3oA#%i=lA67hnPgj85Mh,#1;rCZHm]s[pa#b(Z"L%'Kr^CmYD4..Z?ipP^F40V#cC+U=2%3F.mE:*.&i"3Oi\>5:YC6"]]?(DbTJ,7f:FQ&scS5>L=q4np]Q<WIU_ONb'$e[JR3aRV=s0/0Z1=!.n;'iiAQp4.2Y"9bj"/@i-*+!DZ:@BK<EAU(iG4a>15gbc=9gu+s"$6Gc'@[6<dR=A=?fh2Q[1#&Q+S.P)*3=R6E6&K/J_4@ju\X@j-q4%mS%S.2Y;<Fnfur);CO5nZ%IP@aCe4OY$&=Uqm_HI+^7>A0b'@\$OY5nfW5qfnMF](r%U?1pIWCSSo3g!SK3^h:=u.<V-`SO%e7'O18GKEDk4olc<j,.mkr)2n!eM,3/oAI76rjA_b,2Ocn_Y[ahYak!!;b^TVL>oBGGsc]hnN2@/'*lM$XUkZ.Ef#Ro%a<i--_$cp>d1Sl#q_8&Rr+bdsJl;PUa:;-SJVCPIWiI:bp6b4%RMcFitW%9j?J)iQ.*!TWok1H,D[pni=biBtXHp!uQ&5nJ"Ae65@')QhS)X]#h_=G6!3Y"V4hipMjOCZ"Zj2J;MM2JD]@e(Q,m7L/XU6\,(So)aQ"UWD@`8?o["#*\TC,0(t"A9;37MXmi+K3!7VG+hZn;mbg_X'bmlXjJeU.EYuJNsmH6;bX98:(+#q]P]XdL0/!`M3P`ubGWsJNE27d!;?t@Xb!2n--#5RI53gJ5L*#Z*JQ2CLZF9)?%W:qmYq5/dk1?ioY</3NB.TA=U)!f/u'U@EDqA,cuWM8Ra-_A8h,*92sC6DAs=IO#k*h>U_#!MhG!,]LiFcd+">&]&#W`-&Ko>fYTA@8.&8t!=BXU]i;hhilX#Hlp=?
 Uj_8(n3Vf7`_]2>eONYZb@c2f;6H(08G]m)N0@C2MDCNBC:Y^!=Lm?/Yef]LUmkjEYBd3TTL<N=J>8o!$r*gKed$c8^&P-SO'X,cK/N2H]uqlm(-[]Uu2JiF%_f*D"qe^`b3Pc*tb+H3\E"isH)qX.=R%tFFP%=f4UY%tSR9j[o#`>Edp&lBQkD[Q8@%ho!HHAgZCh0q3s_7-&dF?o1X&/kh@pPUkUq7Y09'rIF+9ZMbYileVR!f&eEPiEuS(06VM)X!VYeT_Qba//((74_NEU7@YM-(V6j..Eul$Z2WMn/:,\U*qO$(bTg^`"X,!/5;YEI\ju^N]C9SRk[$E3RNV4R$t55PA/HtH(:8X`;3oP!Pgf=Gcn%%ISQa$Hb+-r'i-Je50oDS:?$BUX<Cr!S%f90Zj4&-H_'^l_C_$pH-#X:%uRr:HFAH<&-??`i&[<FK--m.ZEjC1YjDq62L@2",$(RLG[e(t07JAqJK`G;1##Y4kloUc#5>bOgu0*!VqgQ,Y?YJ$TZ(tVLco_W!f&XEpA~>
 endstream
 endobj
 63 0 obj
@@ -468,6 +468,7 @@
 64 0 obj
 [
 65 0 R
+66 0 R
 ]
 endobj
 65 0 obj
@@ -482,170 +483,181 @@
 >>
 endobj
 66 0 obj
-<< /Length 1031 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 397.968 250.594 518.952 238.594 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A << /URI (http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php)
+/S /URI >>
+/H /I
+>>
+endobj
+67 0 obj
+<< /Length 1142 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-Gat=*gN(as&:Ml+pp$B`MCi2r4AqhW?7:mo?oj/s`r]Ca,rmQ,cfXLmEJD>PddcV2B?qX$p3J`GZ^fjtP%Y<>\$JLsii"74pn-[&n#M*o8BPUV7C8M3RU,s6LA;::&di^'Aesk/`UK3^qn6-%MR&2,?1dfA&P<[V)-tYG=Vm\8Xef?b"'$kOT1X:T46"u;k^0]70Eik9,c9Z0;B1lO!0j)UKj\RdMNmPF4S<j8+Plh;2d@iH;IUUaF=lim[\gd&g3KjNSMCrSdce4\qVG==Y%ioI>q#8TMqnLga%t:oV3"h_g%tk#@9a*ME/]_@)A1-bBWP#IG!H_K[%d-V(II'jIc`T.+'d5>-]9/Vh-p^=dl<VX'J`0E*Pk=e4qgFY$eQJ;-M(k3cf10c:iaW`j)KC!L?o-X'p#(-]U2;Y^&W^+&*el9UiRX`L[tEcZ#4BNU^"m,"+>E('Xjl;)X8$?)Z#nL1fp*J',p18$ADGhRdp91=Wo&M2'(7Bdg3QIBnD=",OpnU_$TF6+N5Sg/U)[T0$rp:oI\p/mg%'(h+92h[r<3DBY:fp,F)@=GP3587/o>&.s_4ifA!>__$GrjGt)JM(R8fIKdbBR@)J8!cAS%FR;=f_!$=83cVG8eAp#>X!j[DM%(ZTH!<c/%TM\#+"J<LXA'PrA\^kH$j]2]U&1DBL7,N$G/^1bEB!<!gl+#kkb\'`U*Roag:j@G9"^hc`!@1iYp<R$a@i5bKlpTe$n=?Qd(6!M)Km&#&6a.u(pmPS!/)^U'!-?Um#h0a3!O%fj0aOk_Fc-`/`q86O1BX(h8r+P,6Vb%T7O:1<\C;9KW$%.'OqMFNp8#@]p#0;uaFY-`](i3+o<me*e;S7[[_LJBS&C8ir6PJ[,Ed!&s5s+U-Vjd(i&E=;BT\F?*W4fH@KLa"$6Y09%1Oa8bAu+`,l.0,b9JER#u1+`]e]O>HR+T9>?8Jc4;f^ha)&,KHSOuKQT_OVrP^VmJQjs
 OA"`)=FVY*sgU@B6A*;Si><omXLmj:Xr<B$UBF4~>
+Gat=*gQLSR&:Ml+pi3a"e_'FMO";tiW[JY()c0arh2VZEjH\fMmb_s':<'DD_a7(/b&(;HmW2-Ilcil4c!g?qQUE5!ICn7S/<5U^_#IPdH%)=F3@d`D=br*MT6G5Qa+hX41.jO/*SOd]m>LKfM<;KS`epdR%Xn!(6ch;\fW43nT?_cXA^joKe+07$T`.U+?@'XspF?"a3rH2t:g1/l="b`eJlP*%q+Et]0K`o8b3*1#IGJ@oDL!"1+dG#:=C-uu!P=_iC#2GK]MRma1:?W"kXRQAK<]n$W\k--Oj"=VFVD8i*aK"C&[Sodn5iH']TSXn\%IB3#bXL;`f['*8$^K4$aj%X8JLUXkOGPH)jWV.ftPNo[hQ\Z)V8(_j-Xgc7rXWrT,C.V]^rb[m.%EQB5u'R.QKo#oore0aF57/p]1!N@?$"hNrm':1k=tiF!q8`GrPHq88&:MqGE37YU-]]&ffGS)-+#gkDE.H@G#d7Tbhi!20n`IJVB)E]eM!9l:U&9`o:4^J[0n*r[E<,($m&*;3koT523$$QW'ub_,2Pn-Ou+FC:^5KoKf=.BR$jL7_lq'Wtik$QrQ)h`h=!9CoL.dK;K)G!_&-dn>A:0!*a5b>1d\MeC'OM#M+;hJ1#-'E;uM7_!eV*m.bpM2nd_hH&HP0]O!@=\THoKb-P&d!k6i36i?t2]G:]:Umn%/KdH0lW+@JpW_NEL/<j?d^rmeM_e:VE*d[+Q/%O\RbdJ2sZ?hg++bmh3`5[DE@Vh:R5e=a+N1N=)95TY8CMspn8%`WtK74q*[4&QCb\6=8\S,uWGYThu-2TN%R\dp!h2I5/]9E;5,p8k$)bS7RNS`:djpZnR")'VBF1kom.VY;IB1MHBH4[W0/Hdi!"'u!SWKp#2f*^3XfVmk+*V#n`)8_h!,?]0s?pO7dQXRt1MIjUT/7P]u4=ZPZ-%%o^n==6q#F?akQN`EkIaWfC2=q9idlm)=!M!%Zif]T&onnK
 &5/1@Lg=lc_\ITfQa(^f.qi"P:ec/hf;W+]%#pf%'c*<2t1<[eQo5[4S(O:[$.H/Bre5p8WBQ'Q/eHM5qH(HQ&O5k8c%^X/fj3%Jd-*_qWXa_+P]Ql8W/"EbkN,5!Hd;2Lj*O,Y"n/3.Q@.su/k0)B~>
 endstream
 endobj
-67 0 obj
+68 0 obj
 << /Type /Page
 /Parent 1 0 R
 /MediaBox [ 0 0 612 792 ]
 /Resources 3 0 R
-/Contents 66 0 R
+/Contents 67 0 R
 >>
 endobj
-69 0 obj
+70 0 obj
 <<
  /Title (\376\377\0\61\0\40\0\117\0\166\0\145\0\162\0\166\0\151\0\145\0\167)
- /Parent 68 0 R
- /Next 70 0 R
+ /Parent 69 0 R
+ /Next 71 0 R
  /A 9 0 R
 >> endobj
-70 0 obj
+71 0 obj
 <<
  /Title (\376\377\0\62\0\40\0\120\0\162\0\145\0\55\0\162\0\145\0\161\0\165\0\151\0\163\0\151\0\164\0\145\0\163)
- /Parent 68 0 R
- /Prev 69 0 R
- /Next 71 0 R
+ /Parent 69 0 R
+ /Prev 70 0 R
+ /Next 72 0 R
  /A 11 0 R
 >> endobj
-71 0 obj
+72 0 obj
 <<
  /Title (\376\377\0\63\0\40\0\122\0\145\0\163\0\157\0\165\0\162\0\143\0\145\0\40\0\115\0\141\0\156\0\141\0\147\0\145\0\162)
- /Parent 68 0 R
- /Prev 70 0 R
- /Next 72 0 R
+ /Parent 69 0 R
+ /Prev 71 0 R
+ /Next 73 0 R
  /A 13 0 R
 >> endobj
-72 0 obj
+73 0 obj
 <<
  /Title (\376\377\0\64\0\40\0\111\0\156\0\163\0\164\0\141\0\154\0\154\0\151\0\156\0\147\0\40\0\110\0\117\0\104)
- /Parent 68 0 R
- /Prev 71 0 R
- /Next 73 0 R
+ /Parent 69 0 R
+ /Prev 72 0 R
+ /Next 74 0 R
  /A 15 0 R
 >> endobj
-73 0 obj
+74 0 obj
 <<
  /Title (\376\377\0\65\0\40\0\103\0\157\0\156\0\146\0\151\0\147\0\165\0\162\0\151\0\156\0\147\0\40\0\110\0\117\0\104)
- /Parent 68 0 R
- /First 74 0 R
- /Last 75 0 R
- /Prev 72 0 R
- /Next 76 0 R
+ /Parent 69 0 R
+ /First 75 0 R
+ /Last 76 0 R
+ /Prev 73 0 R
+ /Next 77 0 R
  /Count -2
  /A 17 0 R
 >> endobj
-74 0 obj
+75 0 obj
 <<
  /Title (\376\377\0\65\0\56\0\61\0\40\0\115\0\151\0\156\0\151\0\155\0\141\0\154\0\40\0\103\0\157\0\156\0\146\0\151\0\147\0\165\0\162\0\141\0\164\0\151\0\157\0\156\0\40\0\164\0\157\0\40\0\147\0\145\0\164\0\40\0\163\0\164\0\141\0\162\0\164\0\145\0\144)
- /Parent 73 0 R
- /Next 75 0 R
+ /Parent 74 0 R
+ /Next 76 0 R
  /A 19 0 R
 >> endobj
-75 0 obj
+76 0 obj
 <<
  /Title (\376\377\0\65\0\56\0\62\0\40\0\101\0\144\0\166\0\141\0\156\0\143\0\145\0\144\0\40\0\103\0\157\0\156\0\146\0\151\0\147\0\165\0\162\0\141\0\164\0\151\0\157\0\156)
- /Parent 73 0 R
- /Prev 74 0 R
+ /Parent 74 0 R
+ /Prev 75 0 R
  /A 21 0 R
 >> endobj
-76 0 obj
+77 0 obj
 <<
  /Title (\376\377\0\66\0\40\0\122\0\165\0\156\0\156\0\151\0\156\0\147\0\40\0\110\0\117\0\104)
- /Parent 68 0 R
- /Prev 73 0 R
- /Next 77 0 R
+ /Parent 69 0 R
+ /Prev 74 0 R
+ /Next 78 0 R
  /A 23 0 R
 >> endobj
-77 0 obj
+78 0 obj
 <<
  /Title (\376\377\0\67\0\40\0\123\0\165\0\160\0\160\0\157\0\162\0\164\0\151\0\156\0\147\0\40\0\124\0\157\0\157\0\154\0\163\0\40\0\141\0\156\0\144\0\40\0\125\0\164\0\151\0\154\0\151\0\164\0\151\0\145\0\163)
- /Parent 68 0 R
- /First 78 0 R
- /Last 83 0 R
- /Prev 76 0 R
+ /Parent 69 0 R
+ /First 79 0 R
+ /Last 84 0 R
+ /Prev 77 0 R
  /Count -5
  /A 25 0 R
 >> endobj
-78 0 obj
+79 0 obj
 <<
  /Title (\376\377\0\67\0\56\0\61\0\40\0\154\0\157\0\147\0\143\0\157\0\156\0\144\0\145\0\156\0\163\0\145\0\56\0\160\0\171\0\40\0\55\0\40\0\124\0\157\0\157\0\154\0\40\0\146\0\157\0\162\0\40\0\162\0\145\0\155\0\157\0\166\0\151\0\156\0\147\0\40\0\154\0\157\0\147\0\40\0\146\0\151\0\154\0\145\0\163\0\40\0\165\0\160\0\154\0\157\0\141\0\144\0\145\0\144\0\40\0\164\0\157\0\40\0\104\0\106\0\123)
- /Parent 77 0 R
- /First 80 0 R
- /Last 82 0 R
- /Next 83 0 R
+ /Parent 78 0 R
+ /First 81 0 R
+ /Last 83 0 R
+ /Next 84 0 R
  /Count -2
  /A 27 0 R
 >> endobj
-80 0 obj
+81 0 obj
 <<
  /Title (\376\377\0\67\0\56\0\61\0\56\0\61\0\40\0\122\0\165\0\156\0\156\0\151\0\156\0\147\0\40\0\154\0\157\0\147\0\143\0\157\0\156\0\144\0\145\0\156\0\163\0\145\0\56\0\160\0\171)
- /Parent 78 0 R
- /Next 82 0 R
- /A 79 0 R
+ /Parent 79 0 R
+ /Next 83 0 R
+ /A 80 0 R
 >> endobj
-82 0 obj
+83 0 obj
 <<
  /Title (\376\377\0\67\0\56\0\61\0\56\0\62\0\40\0\103\0\157\0\155\0\155\0\141\0\156\0\144\0\40\0\114\0\151\0\156\0\145\0\40\0\117\0\160\0\164\0\151\0\157\0\156\0\163\0\40\0\146\0\157\0\162\0\40\0\154\0\157\0\147\0\143\0\157\0\156\0\144\0\145\0\156\0\163\0\145\0\56\0\160\0\171)
- /Parent 78 0 R
- /Prev 80 0 R
- /A 81 0 R
+ /Parent 79 0 R
+ /Prev 81 0 R
+ /A 82 0 R
 >> endobj
-83 0 obj
+84 0 obj
 <<
  /Title (\376\377\0\67\0\56\0\62\0\40\0\143\0\150\0\145\0\143\0\153\0\154\0\151\0\155\0\151\0\164\0\163\0\56\0\163\0\150\0\40\0\55\0\40\0\124\0\157\0\157\0\154\0\40\0\164\0\157\0\40\0\165\0\160\0\144\0\141\0\164\0\145\0\40\0\164\0\157\0\162\0\161\0\165\0\145\0\40\0\143\0\157\0\155\0\155\0\145\0\156\0\164\0\40\0\146\0\151\0\145\0\154\0\144\0\40\0\162\0\145\0\146\0\154\0\145\0\143\0\164\0\151\0\156\0\147\0\40\0\162\0\145\0\163\0\157\0\165\0\162\0\143\0\145\0\40\0\154\0\151\0\155\0\151\0\164\0\163)
- /Parent 77 0 R
- /First 85 0 R
- /Last 85 0 R
- /Prev 78 0 R
+ /Parent 78 0 R
+ /First 86 0 R
+ /Last 86 0 R
+ /Prev 79 0 R
  /Count -1
  /A 29 0 R
 >> endobj
-85 0 obj
+86 0 obj
 <<
  /Title (\376\377\0\67\0\56\0\62\0\56\0\61\0\40\0\122\0\165\0\156\0\156\0\151\0\156\0\147\0\40\0\143\0\150\0\145\0\143\0\153\0\154\0\151\0\155\0\151\0\164\0\163\0\56\0\163\0\150)
- /Parent 83 0 R
- /A 84 0 R
+ /Parent 84 0 R
+ /A 85 0 R
 >> endobj
-86 0 obj
+87 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F3
 /BaseFont /Helvetica-Bold
 /Encoding /WinAnsiEncoding >>
 endobj
-87 0 obj
+88 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F5
 /BaseFont /Times-Roman
 /Encoding /WinAnsiEncoding >>
 endobj
-88 0 obj
+89 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F6
 /BaseFont /Times-Italic
 /Encoding /WinAnsiEncoding >>
 endobj
-89 0 obj
+90 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F1
 /BaseFont /Helvetica
 /Encoding /WinAnsiEncoding >>
 endobj
-90 0 obj
+91 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F2
 /BaseFont /Helvetica-Oblique
 /Encoding /WinAnsiEncoding >>
 endobj
-91 0 obj
+92 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F7
@@ -655,18 +667,18 @@
 1 0 obj
 << /Type /Pages
 /Count 8
-/Kids [6 0 R 31 0 R 35 0 R 47 0 R 52 0 R 59 0 R 63 0 R 67 0 R ] >>
+/Kids [6 0 R 31 0 R 35 0 R 47 0 R 52 0 R 59 0 R 63 0 R 68 0 R ] >>
 endobj
 2 0 obj
 << /Type /Catalog
 /Pages 1 0 R
- /Outlines 68 0 R
+ /Outlines 69 0 R
  /PageMode /UseOutlines
  >>
 endobj
 3 0 obj
 << 
-/Font << /F3 86 0 R /F5 87 0 R /F1 89 0 R /F6 88 0 R /F2 90 0 R /F7 91 0 R >> 
+/Font << /F3 87 0 R /F5 88 0 R /F1 90 0 R /F6 89 0 R /F2 91 0 R /F7 92 0 R >> 
 /ProcSet [ /PDF /ImageC /Text ] >> 
 endobj
 9 0 obj
@@ -735,61 +747,61 @@
 /D [63 0 R /XYZ 85.0 292.9 null]
 >>
 endobj
-68 0 obj
+69 0 obj
 <<
- /First 69 0 R
- /Last 77 0 R
+ /First 70 0 R
+ /Last 78 0 R
 >> endobj
-79 0 obj
+80 0 obj
 <<
 /S /GoTo
 /D [59 0 R /XYZ 85.0 615.4 null]
 >>
 endobj
-81 0 obj
+82 0 obj
 <<
 /S /GoTo
 /D [59 0 R /XYZ 85.0 472.828 null]
 >>
 endobj
-84 0 obj
+85 0 obj
 <<
 /S /GoTo
-/D [67 0 R /XYZ 85.0 659.0 null]
+/D [68 0 R /XYZ 85.0 659.0 null]
 >>
 endobj
 xref
-0 92
+0 93
 0000000000 65535 f 
-0000027940 00000 n 
-0000028047 00000 n 
-0000028139 00000 n 
+0000028494 00000 n 
+0000028601 00000 n 
+0000028693 00000 n 
 0000000015 00000 n 
 0000000071 00000 n 
 0000001049 00000 n 
 0000001169 00000 n 
 0000001264 00000 n 
-0000028273 00000 n 
+0000028827 00000 n 
 0000001399 00000 n 
-0000028336 00000 n 
+0000028890 00000 n 
 0000001536 00000 n 
-0000028400 00000 n 
+0000028954 00000 n 
 0000001671 00000 n 
-0000028466 00000 n 
+0000029020 00000 n 
 0000001808 00000 n 
-0000028530 00000 n 
+0000029084 00000 n 
 0000001944 00000 n 
-0000028596 00000 n 
+0000029150 00000 n 
 0000002081 00000 n 
-0000028662 00000 n 
+0000029216 00000 n 
 0000002217 00000 n 
-0000028726 00000 n 
+0000029280 00000 n 
 0000002354 00000 n 
-0000028792 00000 n 
+0000029346 00000 n 
 0000002490 00000 n 
-0000028858 00000 n 
+0000029412 00000 n 
 0000002627 00000 n 
-0000028924 00000 n 
+0000029478 00000 n 
 0000002764 00000 n 
 0000005137 00000 n 
 0000005260 00000 n 
@@ -823,41 +835,42 @@
 0000019196 00000 n 
 0000019223 00000 n 
 0000019418 00000 n 
-0000021817 00000 n 
-0000021940 00000 n 
-0000021967 00000 n 
-0000022155 00000 n 
-0000023279 00000 n 
-0000028988 00000 n 
-0000023387 00000 n 
-0000023526 00000 n 
-0000023715 00000 n 
-0000023916 00000 n 
-0000024105 00000 n 
-0000024340 00000 n 
-0000024654 00000 n 
-0000024887 00000 n 
-0000025058 00000 n 
-0000025367 00000 n 
-0000029039 00000 n 
-0000025858 00000 n 
-0000029103 00000 n 
-0000026100 00000 n 
-0000026441 00000 n 
-0000029169 00000 n 
-0000027045 00000 n 
-0000027273 00000 n 
-0000027386 00000 n 
-0000027496 00000 n 
-0000027607 00000 n 
-0000027715 00000 n 
-0000027831 00000 n 
+0000022026 00000 n 
+0000022149 00000 n 
+0000022183 00000 n 
+0000022371 00000 n 
+0000022598 00000 n 
+0000023833 00000 n 
+0000029542 00000 n 
+0000023941 00000 n 
+0000024080 00000 n 
+0000024269 00000 n 
+0000024470 00000 n 
+0000024659 00000 n 
+0000024894 00000 n 
+0000025208 00000 n 
+0000025441 00000 n 
+0000025612 00000 n 
+0000025921 00000 n 
+0000029593 00000 n 
+0000026412 00000 n 
+0000029657 00000 n 
+0000026654 00000 n 
+0000026995 00000 n 
+0000029723 00000 n 
+0000027599 00000 n 
+0000027827 00000 n 
+0000027940 00000 n 
+0000028050 00000 n 
+0000028161 00000 n 
+0000028269 00000 n 
+0000028385 00000 n 
 trailer
 <<
-/Size 92
+/Size 93
 /Root 2 0 R
 /Info 4 0 R
 >>
 startxref
-29233
+29787
 %%EOF

Modified: hadoop/core/trunk/docs/hod_config_guide.html
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod_config_guide.html?rev=669980&r1=669979&r2=669980&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod_config_guide.html (original)
+++ hadoop/core/trunk/docs/hod_config_guide.html Fri Jun 20 09:31:41 2008
@@ -386,9 +386,26 @@
                        as many paths are specified as there are disks available
                        to ensure all disks are being utilized. The restrictions
                        and notes for the temp-dir variable apply here too.</li>
+          
+<li>max-master-failures: It defines how many times a hadoop master
+                       daemon can fail to launch, beyond which HOD will fail
+                       the cluster allocation altogether. In HOD clusters,
+                       sometimes there might be a single or few "bad" nodes due
+                       to issues like missing java, missing/incorrect version
+                       of Hadoop etc. When this configuration variable is set
+                       to a positive integer, the RingMaster returns an error
+                       to the client only when the number of times a hadoop
+                       master (JobTracker or NameNode) fails to start on these
+                       bad nodes because of above issues, exceeds the specified
+                       value. If the number is not exceeded, the next HodRing
+                       which requests for a command to launch is given the same
+                       hadoop master again. This way, HOD tries its best for a
+                       successful allocation even in the presence of a few bad
+                       nodes in the cluster.
+                       </li>
         
 </ul>
-<a name="N100A5"></a><a name="3.5+gridservice-hdfs+options"></a>
+<a name="N100A8"></a><a name="3.5+gridservice-hdfs+options"></a>
 <h3 class="h4">3.5 gridservice-hdfs options</h3>
 <ul>
           
@@ -429,7 +446,7 @@
 <li>final-server-params: Same as above, except they will be marked final.</li>
         
 </ul>
-<a name="N100C4"></a><a name="3.6+gridservice-mapred+options"></a>
+<a name="N100C7"></a><a name="3.6+gridservice-mapred+options"></a>
 <h3 class="h4">3.6 gridservice-mapred options</h3>
 <ul>
           
@@ -462,7 +479,7 @@
 <li>final-server-params: Same as above, except they will be marked final.</li>
         
 </ul>
-<a name="N100E3"></a><a name="3.7+hodring+options"></a>
+<a name="N100E6"></a><a name="3.7+hodring+options"></a>
 <h3 class="h4">3.7 hodring options</h3>
 <ul>
           

Modified: hadoop/core/trunk/docs/hod_config_guide.pdf
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod_config_guide.pdf?rev=669980&r1=669979&r2=669980&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod_config_guide.pdf (original)
+++ hadoop/core/trunk/docs/hod_config_guide.pdf Fri Jun 20 09:31:41 2008
@@ -182,10 +182,10 @@
 >>
 endobj
 34 0 obj
-<< /Length 2182 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 2591 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-Gb!Sn>EbO7'Roe[i%^X&CCsW0Db7C,b=n*VEM(r[7t>EV`eYaRYP[X40OToC'JGkKRW`jt#9bA<G:qc58*l/QjJTuXo7U9tpjG#fgVgOP+Fq4u?X6HUf-\3r[gI"AP@#Xi(6e(=d%W/:/X@(I[*FCJC=R)_QcuF=/[j5K:L)nAVq0?Xm8T(V5p<PaQDp#`h*%0m6>#M^onTr"4!^0YPIVL<P(aDD`F;]&3ji)$85;FK95BZ=G9"AC_=ap[DZR7YbmN?H<ZGYTR3nY)cT4W6CT_n@A-F#rQ`3e-K"(*LA%CEi#8PQf_+L;@[rPNP<AuJ\$uL+uBo5t.HRs]U$r%F6U-4gJBGAhr&%6Ubn:7"KV(;a.b4>E-9fH2EN)r9kdk&sE.A^l-d7,[L1@GP%%X(t3Ga/?pd+$>-Vf$J0?2i(=O$)+7Hl02,'7.[B8qu-p(5p#NMI8jYKEY^)f%_ZR[=:,m;AjD;ag;ZK+\g>L(WTe!/5;1,_oCT>ltCL>PKs`;_Oh!BhPrp(4;LS@rNbTlKddECTeJ%+B-LnaC9So.e5"a*/"s]mIPBbH,fi\POc$nr/SQBD<EY+po1T[Dk`W0(YISNt27M/L;[]ppDtAi?\iY7>#TZ"Mo+]kIf4-6U3HJ0?[2Q%?*[NV8jb@DrNG!YROG-4TILn6+Y*1.5E/AF[f-Rc6f<VMG6fL4]DLKPMFuUF,\((h(hQ5XZO:sC8,.;oPLi^[9`NQN.[1LOpLksJQ<[dHDCI&i94$=HuGF6!^*(FtX>,/VKiiM?WE/G<%%f%'?`+cKW,KR)Xk[\;X/s?UnkXNL+U]uc;"bP-%Z5:`hD>S9*;.V_YhFTi5X&ii/R@'Ph2P:)T4L&*"!,n;Khm3kPI);e`m)de^,'q12BY9JKKB^(%IY&FRPD$q[g,;"Xf>7r1A*-),2;5iPQV(7%/)ZdCM]I9Ykd!/djGmf;6)#GJB80C7L*pRd=M,=ui:"U[[G%n-0]J1
 WTLCu-dNT_g/cCSYht7AFG4:iW9W943=c>c`R9"`L!_7X;PndZY[b,/'Dc!b&%RUTmB-$rm?)`H,jOPR\N](0j*#]"\]G$`N0Pp)b3A2(GX?8#:L5c]^3uS(7,`:^!j_lWk1,Wa_o/f:oZZO6C!Ma&F$ON)(Q-BLqmOY>"n@]UII?m!_Z9.8'hVe=78g'0I72ll5kCR3)8q`f%FB#c)J:E=8=(TC[Hs!'\eN>ZXhnaL)+H^'&/@";t(El[qh/O!7`L5VLlH`]N)!(u>%QeDY`O3!qG<UJ...@aaQ.l>4!1Iu-atUT,*u:fqF#k(9R,o7X^=6_66"/800&eB7?7Q^Nj;HG^<L)),@[=<Wp$e7h,,FM/b1D"B
 ?"X=*HmD0Wt!6K+>Umq%]ac4"]aH*n:la7X*<[D(5#*Kncf8l(q,hP'FDG(jR)>/0nZ4EE*QMJ$?Blj'uYg=n`CQJ/acWWWlbNOjTPTY=4aJStPi:oF-5p/:%n68[e1=P4HP^o-#0Z0eb72';JU_=O29Gl1,cb<p>7>Mlun8Gf2&JsOrAX9YYI.l?B97liFhkjBO6_Vtb~>
+Gau`VD/\/e&H88.@I-o#`/<D_pY47O>6h7s=qCM7%Ke"S2A\s781KYolUP*@GpeMm4:E0\VP5>6p[[*\G@\G@gjH`Eq=DU1D"YR3DUtuN0&ekA^Vd"^?:I6?,ggo]kT[_9l/nJ[c/!fRn,iQ);r"n'H*7/:(B*#BE7UVqfMmc9?Na-f]Nd#dj%A.K$j<W?'p4@\F,r8;nQNY3r3k8U0IKp53U8@:-1fqhrs.ZV(7sF,Kk9gb`s]OdVU)su%:?WWO3T(smV8+$7Dk'I(D"]Z$2M`"]%KqPc&`2j0UF)L$KtQYl\nm[GYunD#bl!MDDaiLU`$0Q!DDf_=i/kO&IO&cJHpdJYXZIJ@#0[1g:>%u][iQaM@2nUZ@)]ON>ZPH9fkeZgS@;B8Vrbm?m1\ShFn%l#;*;AS0bRbi6e%k:M+QY/q5.k8"-n)S_)P;*Ufd[;$>hN*f<7SXTK$KJ1g>;^S=i/](O;r/U@%4D6?Ut7+3-'IWK1)7`t:@47E*)e-@LsNgQ@l7C7rCok4]cd#8\Jic9W'@1.^X8Z3DM,ci8*C%3CLAW)k+7[Pe3?VIG2&c`Ta9(j(0&deXhQ!bITRYgl4T_WJ%+$mBu#hB9P.*NC-RZ9)GSOXk@et"Wl$%.BmW&Sur*+^;$.!Dd>[;O!@e11$rm.kj.BZFf@6[F(e_E4Y!X.eN7^:-C%YmNo`Qo'Ze.JN"Y<kJY78ALm^&Ar,*EFE<iX!DPQm;=(@%NINN^LPsB_I5D*-Qe,(cH6o<Ti&uhdX_e_;g_6M8-,**>?eg)/1GN#FUUT(;j(4(@dlrO<c#BHcj$hZOZJ"81a0"0$@MeP9(K/n4QQ"dH#!_'-=mQl/D1,g;+_P=IqY^`B^`QfR>-!2XNn?>EYO<fD*R(_R!>N#L8k[Wjd.!iTfF9mB=okL[M9JUT#WaZ,cR6A$COuC9apnU>Va9+AnSNG>:ONQa3EP;2TUrC^_0VSU.4Aa-nb^40lC,R.iGH
 t,u[2%g/!Y3-J!%!U(XYs<BFH@L78*%M0sdB"&Ck-A.Rmni;+InkW=dYcS'9g>:L8bLd5]"2.8qJ6fT^QjqM*HL87_9&QI=HS<B*]cQ@cF31`<FPifDHD^KBeP-UpGJr44;%)TQ0"_97fc)5NNdlUkRii;5V/0B0:&J;g3RK`XgCT_KA(Vo'5;aZ_'Y3[s]]N/5WA2XQ`NL7lpS5VET%9nrH@5Y&V:pEQ!"c)Z"_Nd'4G"AU&QLf1ihI[XjZ=lN>S'+_EfmqcBdHF2ResQuTo;2dV_eLTrb_=;1&V_9>8gMlHU(Snsf?R!tY!Xi\TjBPWmM@GC=t#]ajXr3:#a[U#d.NpZ3]qP\?K1b+kt$H7^724,.,MNQiHglbj+iM\Pd"U!:+/X^.1/\sV:<%n)O.q-"5*Djodsh<:=Vs0H'.Y5Po_N5RIjnWlOinJn-P8cj,^dpK9i24g2Uor_/Pji$A08ki;hbaAAke?q!o<k!-l3;n2WB[L,F@AOe^[n/C<e*N#)1T\ioWrFF6iMU[4)JR<;^W'o82t^:/N7(+6=8Krg5:41#E6N-f!qX:b7hl;hC-"WYjbZ=Q#CdUt#2`5*Y,Y5`69B;0DV1?f@AYXTOh8!oFDeTtUbo,7jL-QY.:lh\q"$!'!e/NNaF;d23:PU(l*l@P5S"AWZ^G$_neXCuB2)W\9jS25[;)uATs_TE0kfu/EgL;eqAdFB'Bq$*i.+kCbKKm$FtDWdSJQ0=e1T1u%V%[]j]VGq97L/)q?69NhAo*U+XC';l;$toJOi_G[-]rhEi,u-Mlf\=P&D85E<Qqs%qRQN4j@ea^Jm->:XlP_8.l%"6ejd2u:obt:S\^-8_Z!i/@Jb;dIqf+p@Q%n^,.X!UCr)Unr)_ReL+1SP:cm6SiZC<ot5J.3gmrj[B5h,W,R*[IaQ=q7EVja2s+u318r#[GE]Qp[ESu-,ho@naple&nVGjAt3fW0;MU91k-2jn5;.E9DFGKB
 ;X2CS@=*\"/YVVMQ,U04P`RhJ.D@-pV6-<rU...@Xmq-I>ru"BIDpmFRSnpgu.hc]9_OmK"q2(MB9USF+=;cY+r8'@:'l(:S`u],PtKFd&aNF#%9`X/Oa"SJ[V0)7tG9%5LF@-J&UA8OgC(C`@BA`%pO4\H#jPW=]3\a77UA=R??sLM&?a-Y5R;EL1G9$M&0nk1l)%#ql]3F)TMm-HFNR"j0p<Odj&"e8i:It^B&YR8Y?~>
 endstream
 endobj
 35 0 obj
@@ -197,10 +197,10 @@
 >>
 endobj
 36 0 obj
-<< /Length 1610 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 2055 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-Gau0DbAu>q']&(*\8sE8;LF@dG>2<il#`B&<i:ka$]"pS9Lsa=5WeuBJe&/('VNOrG!*3T8V9"sFm]:fm8i-5<e%:Sk9,7=XL>k+4.FJarf*-#],US:,DdCpSc.i?e[sXPrK''+,k(-CPs;GCj\Of.CZELj#,/A=S3Pju?L#4Hgi.Yh?n%kF"XED#?(d`2[,l4s^RX*j-;[S!D<(6o.nk^/#V'eFDuuj<$9)9&"V[On__@L&=?e'D$N`P^=!Yo#U-Tg;L!*i5^l;P0jt]QJS\feN(sW:r0Bs0aeTXl!n@oKpb37R]1M&aO,d3I.naeoR8*X8\#M`dC"J-W:J<u[\No52t+!<6,[A^oHRA_^%d36dekfY$Ak`>d`O_rJZWdA.%ddhlH;kUfQd%3XRA+i]a@rV&nVeo!8]aTWH*=.Wr%g@opTHLV?TX7X@JT)ha%ohghF.9H>M(;i@'H]JjHZSV<Cu]An^5[\bC$CM4E-LT-FjDce`HmEl'WQ:^p[XNYU];J66j8<!T`Z3H6[A/2kG0B50S<&1+!3dLR"8o?MY;1d]6V!>_s5eR+W-]b78QZ<!f>/>7`#.s^iEcReI]H!d$l/)*G,c>I%m/F(hQ@b``K6l>I:RgE##hZ8(m#I'ouH&NTmV7%g$]E'k"3.YqBk3et=[("-$CC1(12BJN)@(LcABC;pj`J0gV1a\;Yd+0u+u*hMJcp<1cRa1g`]E8JE[5WTh5N0UKm"q8DY<!BgD:Y<N(Y1!^`jCM?5GhM8K>Wu6DPD;1d'<$rK&leF6L.'j%oAcP*c[(s-:KtcQ6RcXlT8kEdMZ""j5JC9nb_LUp`RNg)u9>ZsFk1Xl\@hd8EK'cBBaVkf+^l\85eEjQ7.aFLHmDZT3(@NE)+?0"+=\adBW[R]!H9FCKAq<])=YgD#5#hS9jDBWg#Xc7a'ns?<S`0AKGCJ-V2hPElW?GVV5QLgq4l'9A<L<#aX?DcZ=eLRcY@4E8L'+F
 l.3:%X?6Y%oaD6ag*l.%XbqXu7iY.+YAVEj?a>q0Ff-`E!cJU@3bmWnjo5W!R.2&+Dd@<C$/.u]d2ZHqn/VqbI8T>AM-[XpB_&Ztc,gUt8GdN9uFlpVh-K&m:BVjs%#6/o3"/Yt*jNhBl!iob'7YI1)j"OCo13;:E$:IOAhTEC+-EP7dUGt7[kZ]565%tXBQ\@gIaGPA1OGJNWED?2h^k$P$eB5%&f8C,#_7[2<DK-qdEaueY6MqS]]bn\dGt],GUmCMNc6]_;.f,7C[kkO^hj/RR:HDJ9?jM52B!J5ITEF,V@`l,Fdht^%i,b,oMNf`7$\SVsb=K`"U"Ri"HFD)CT%cZ15]!XOa)a].e9Dn/$EU`AJRfLk/7GZGN&D&gVPp;I@_=5`X.I6IO_VYXah7DMqpHVXJoEegU9nF!HJ<NT3c@3kRX'r5e/!8^[J.kDG.R:=-O#5XR>*Y&j0lcP''K!^^)&-6gSIf]>r"^2JgsE+Ud(UVf!\2$T7K53L@8\^)COIF=caocG*=nR4Ge`4ADbl(Kon_[Q<?6O?F!B?-nFrg<*Ag/SrE,!PXG>1QK!0BWThP-Prs[oo6jq'<RPE:HZNRf3Pq@W)EA#i+7@)1&H~>
+Gb!#]>EbO7'RnqH_:p:m60=2u[p5A<1/3sES`rMrl,Y!+AcYa8G2n'Z@DFq@_O"sme(ND["I=/DF8=+:r-Ee_e,&4q)qWfdhtK&[#<>g!7Ik@.G1Uoq"_\eM(+7\qk1?*SB.(4F"Rts#]?kR]q.-&P3;Zj=r,F=?=a!`]#XqKrX\8!SB*kUgOu1K:gY8q)XC:B4=6c6%D[<Oc;=)dq3B<khQBP;(qh4:=,]+3hDMaot8;WpBIp20!dQ$u/gh9e#.Ib]!3S="[kCnZtiR*;Z8`6@O7$9@R]MtBWa^Klr?j=dSb+)]Im3CAe3DLV$J"^c.%/Oa`(8SE+nT>t&#C.Q8VVhEWRW/W,M]A0Qpls5QT>@Z!W-_md]E*R.r"2,g6="M3>GYu[\e,:Q6gMb5):?k^@m:Ig'A9>#[guQU5+fuXO8S<_b8JJ8GQ"`\!:kn;1bq.46Q%;nP]Bp:N:0"e9%u1U\jnZc>3%YSXG48]P.&2rJsc'D/AVcU)sC,tFftl%brDq(mr>tZ*1%64.p.mEM/>PicSVjU$S"D.^`oC[m;#mbpY-9_qZ*gu8O>3?2g?H,q[jVr))+d(`3J<p)H/8MR_$e$!02K%7017+T!SRe#d\hM.afIE/2K_.$ec0a]e!R^j6$2d2`nOd%,NC/FCZjO>_*XA=?I"a&8-&T::ZB9_U):nO\n^/p08s#CTRShLuSqk,VNOo,c#<78*hjJAVp3bFQVF$`>#T=qBjSW$G!=k6h!q]P"1NtUfkiS"d55&*?C)64Ve<slQS9D7u?0Ch*3;"k+^eDaS4M+U*+*ei@q%F[Hs<)hstX+;)W%qo(Un$[K,2aPI*E!p>,\DgiiGfm9G`P"W>`K=>!H1)DBg(SK?P;aTsVbi&)9Z`qlcj<n+Y$bmC=C)MmOWB"@:LN<X5GXhH[QMaQQdpdS+:kp-KT1c*M7KLSm68ssrcblQIUQ["7I0D<_3R@19J*cabeHcOejZ^[Hl]<@6
 KUc2&t7IR;6<K!...@R>8XJmiY^i7!kUM\A"#*LcNNWMXAVmuu""0\Q!j=?oAkYXOip!$O]M9H1qNCG!c3J1VtZ0Nknj%9`3Q-q49\X%59:<YM>76Wmhh?EaU&)puZ.R26_U>W!H@b4,Q\*>nLg/03U.ZQXWg#ksH^*P).d[/\$2q+pQGk/SF.=m7Jh=jM<D-\NTX>ZS%3QMT?e"oa//t#C="_\baQ@?YSU2:VHS?C>IL4bdu*/*fCX>!&C.(#9qhn6e3Dcj^5FIVaJ/FF
 K2=X\CBaE*kRNJ?<F%pmdGJ`D*C2,?S3olo888,i;1hhPoo-M<#ih?&YXrYY4q4b7_15Ib1>E<~>
 endstream
 endobj
 37 0 obj
@@ -394,19 +394,19 @@
 23 0 obj
 <<
 /S /GoTo
-/D [35 0 R /XYZ 85.0 556.947 null]
+/D [35 0 R /XYZ 85.0 424.947 null]
 >>
 endobj
 25 0 obj
 <<
 /S /GoTo
-/D [35 0 R /XYZ 85.0 309.694 null]
+/D [35 0 R /XYZ 85.0 177.694 null]
 >>
 endobj
 27 0 obj
 <<
 /S /GoTo
-/D [37 0 R /XYZ 85.0 659.0 null]
+/D [37 0 R /XYZ 85.0 534.2 null]
 >>
 endobj
 38 0 obj
@@ -417,33 +417,33 @@
 xref
 0 55
 0000000000 65535 f 
-0000015216 00000 n 
-0000015302 00000 n 
-0000015394 00000 n 
+0000016070 00000 n 
+0000016156 00000 n 
+0000016248 00000 n 
 0000000015 00000 n 
 0000000071 00000 n 
 0000000918 00000 n 
 0000001038 00000 n 
 0000001126 00000 n 
-0000015528 00000 n 
+0000016382 00000 n 
 0000001261 00000 n 
-0000015591 00000 n 
+0000016445 00000 n 
 0000001398 00000 n 
-0000015657 00000 n 
+0000016511 00000 n 
 0000001535 00000 n 
-0000015723 00000 n 
+0000016577 00000 n 
 0000001672 00000 n 
-0000015789 00000 n 
+0000016643 00000 n 
 0000001808 00000 n 
-0000015853 00000 n 
+0000016707 00000 n 
 0000001943 00000 n 
-0000015919 00000 n 
+0000016773 00000 n 
 0000002080 00000 n 
-0000015983 00000 n 
+0000016837 00000 n 
 0000002217 00000 n 
-0000016049 00000 n 
+0000016903 00000 n 
 0000002353 00000 n 
-0000016115 00000 n 
+0000016969 00000 n 
 0000002490 00000 n 
 0000004654 00000 n 
 0000004762 00000 n 
@@ -451,26 +451,26 @@
 0000007519 00000 n 
 0000007546 00000 n 
 0000007800 00000 n 
-0000010075 00000 n 
-0000010183 00000 n 
-0000011886 00000 n 
-0000016179 00000 n 
-0000011994 00000 n 
-0000012172 00000 n 
-0000012341 00000 n 
-0000012764 00000 n 
-0000013052 00000 n 
-0000013253 00000 n 
-0000013532 00000 n 
-0000013775 00000 n 
-0000014053 00000 n 
-0000014343 00000 n 
-0000014554 00000 n 
-0000014667 00000 n 
-0000014777 00000 n 
-0000014885 00000 n 
-0000014991 00000 n 
-0000015107 00000 n 
+0000010484 00000 n 
+0000010592 00000 n 
+0000012740 00000 n 
+0000017033 00000 n 
+0000012848 00000 n 
+0000013026 00000 n 
+0000013195 00000 n 
+0000013618 00000 n 
+0000013906 00000 n 
+0000014107 00000 n 
+0000014386 00000 n 
+0000014629 00000 n 
+0000014907 00000 n 
+0000015197 00000 n 
+0000015408 00000 n 
+0000015521 00000 n 
+0000015631 00000 n 
+0000015739 00000 n 
+0000015845 00000 n 
+0000015961 00000 n 
 trailer
 <<
 /Size 55
@@ -478,5 +478,5 @@
 /Info 4 0 R
 >>
 startxref
-16230
+17084
 %%EOF

Modified: hadoop/core/trunk/docs/hod_user_guide.html
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod_user_guide.html?rev=669980&r1=669979&r2=669980&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod_user_guide.html (original)
+++ hadoop/core/trunk/docs/hod_user_guide.html Fri Jun 20 09:31:41 2008
@@ -244,7 +244,7 @@
 </ul>
 </li>
 <li>
-<a href="#Troubleshooting-N10576"> Troubleshooting </a>
+<a href="#Troubleshooting-N10579"> Troubleshooting </a>
 <ul class="minitoc">
 <li>
 <a href="#Hangs+During+Allocation">hod Hangs During Allocation </a>
@@ -313,7 +313,7 @@
 <strong> Create a Cluster Directory </strong>
 </p>
 <a name="Create_a_Cluster_Directory" id="Create_a_Cluster_Directory"></a>
-<p>The <em>cluster directory</em> is a directory on the local file system where <span class="codefrag">hod</span> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Create this directory and pass it to the <span class="codefrag">hod</span> operations as stated below. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the cluster directory as the Hadoop --config option. </p>
+<p>The <em>cluster directory</em> is a directory on the local file system where <span class="codefrag">hod</span> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Pass this directory to the <span class="codefrag">hod</span> operations as stated below. If the cluster directory passed doesn't already exist, HOD will automatically try to create it and use it. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the cluster directory as the Hadoop --config option. </p>
 <p>
 <strong> Operation <em>allocate</em></strong>
 </p>
@@ -436,7 +436,7 @@
   
 </tr>
 </table>
-<p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. The users however need to create a cluster directory just like when using the allocate operation.</p>
+<p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. The users however need to specify a cluster directory just like when using the allocate operation.</p>
 <p>
 <strong> Running the script </strong>
 </p>
@@ -536,9 +536,11 @@
 <li> For better distribution performance it is recommended that the Hadoop tarball contain only the libraries and binaries, and not the source or documentation.</li>
     
 <li> When you want to run jobs against a cluster allocated using the tarball, you must use a compatible version of hadoop to submit your jobs. The best would be to untar and use the version that is present in the tarball itself.</li>
+    
+<li> You need to make sure that there are no Hadoop configuration files, hadoop-env.sh and hadoop-site.xml, present in the conf directory of the tarred distribution. The presence of these files with incorrect values could make the cluster allocation to fail.</li>
   
 </ul>
-<a name="N10215"></a><a name="Using+an+external+HDFS"></a>
+<a name="N10218"></a><a name="Using+an+external+HDFS"></a>
 <h3 class="h4"> Using an external HDFS </h3>
 <a name="Using_an_external_HDFS" id="Using_an_external_HDFS"></a>
 <p>In typical Hadoop clusters provisioned by HOD, HDFS is already set up statically (without using HOD). This allows data to persist in HDFS after the HOD provisioned clusters is deallocated. To use a statically configured HDFS, your hodrc must point to an external HDFS. Specifically, set the following options to the correct values in the section <span class="codefrag">gridservice-hdfs</span> of the hodrc:</p>
@@ -575,7 +577,7 @@
 <td colspan="1" rowspan="1">external = false</td>
 </tr>
 </table>
-<a name="N10259"></a><a name="Options+for+Configuring+Hadoop"></a>
+<a name="N1025C"></a><a name="Options+for+Configuring+Hadoop"></a>
 <h3 class="h4"> Options for Configuring Hadoop </h3>
 <a name="Options_for_Configuring_Hadoop" id="Options_for_Configuring_Hadoop"></a>
 <p>HOD provides a very convenient mechanism to configure both the Hadoop daemons that it provisions and also the hadoop-site.xml that it generates on the client side. This is done by specifying Hadoop configuration parameters in either the HOD configuration file, or from the command line when allocating clusters.</p>
@@ -633,7 +635,7 @@
     
 </table>
 <p>In this example, the <em>mapred.userlog.limit.kb</em> and <em>mapred.child.java.opts</em> options will be included into the hadoop-site.xml that is generated by HOD.</p>
-<a name="N102EB"></a><a name="Viewing+Hadoop+Web-UIs"></a>
+<a name="N102EE"></a><a name="Viewing+Hadoop+Web-UIs"></a>
 <h3 class="h4"> Viewing Hadoop Web-UIs </h3>
 <a name="Viewing_Hadoop_Web_UIs" id="Viewing_Hadoop_Web_UIs"></a>
 <p>The HOD allocation operation prints the JobTracker and NameNode web UI URLs. For example:</p>
@@ -650,7 +652,7 @@
 </tr>
 </table>
 <p>The same information is also available via the <em>info</em> operation described above.</p>
-<a name="N1030D"></a><a name="Collecting+and+Viewing+Hadoop+Logs"></a>
+<a name="N10310"></a><a name="Collecting+and+Viewing+Hadoop+Logs"></a>
 <h3 class="h4"> Collecting and Viewing Hadoop Logs </h3>
 <a name="Collecting_and_Viewing_Hadoop_Lo" id="Collecting_and_Viewing_Hadoop_Lo"></a>
 <p>To get the Hadoop logs of the daemons running on one of the allocated nodes: </p>
@@ -678,13 +680,13 @@
 </table>
 <p>Under the root directory specified above in the path, HOD will create a create a path user_name/torque_jobid and store gzipped log files for each node that was part of the job.</p>
 <p>Note that to store the files to HDFS, you may need to configure the <span class="codefrag">hodring.pkgs</span> option with the Hadoop version that matches the HDFS mentioned. If not, HOD will try to use the Hadoop version that it is using to provision the Hadoop cluster itself.</p>
-<a name="N10356"></a><a name="Auto-deallocation+of+Idle+Clusters"></a>
+<a name="N10359"></a><a name="Auto-deallocation+of+Idle+Clusters"></a>
 <h3 class="h4"> Auto-deallocation of Idle Clusters </h3>
 <a name="Auto_deallocation_of_Idle_Cluste" id="Auto_deallocation_of_Idle_Cluste"></a>
 <p>HOD automatically deallocates clusters that are not running Hadoop jobs for a given period of time. Each HOD allocation includes a monitoring facility that constantly checks for running Hadoop jobs. If it detects no running Hadoop jobs for a given period, it will automatically deallocate its own cluster and thus free up nodes which are not being used effectively.</p>
 <p>
 <em>Note:</em> While the cluster is deallocated, the <em>cluster directory</em> is not cleaned up automatically. The user must deallocate this cluster through the regular <em>deallocate</em> operation to clean this up.</p>
-<a name="N1036C"></a><a name="Specifying+Additional+Job+Attributes"></a>
+<a name="N1036F"></a><a name="Specifying+Additional+Job+Attributes"></a>
 <h3 class="h4"> Specifying Additional Job Attributes </h3>
 <a name="Specifying_Additional_Job_Attrib" id="Specifying_Additional_Job_Attrib"></a>
 <p>HOD allows the user to specify a wallclock time and a name (or title) for a Torque job. </p>
@@ -712,7 +714,7 @@
 </table>
 <p>
 <em>Note:</em> Due to restriction in the underlying Torque resource manager, names which do not start with a alphabet or contain a 'space' will cause the job to fail. The failure message points to the problem being in the specified job name.</p>
-<a name="N103A3"></a><a name="Capturing+HOD+exit+codes+in+Torque"></a>
+<a name="N103A6"></a><a name="Capturing+HOD+exit+codes+in+Torque"></a>
 <h3 class="h4"> Capturing HOD exit codes in Torque </h3>
 <a name="Capturing_HOD_exit_codes_in_Torq" id="Capturing_HOD_exit_codes_in_Torq"></a>
 <p>HOD exit codes are captured in the Torque exit_status field. This will help users and system administrators to distinguish successful runs from unsuccessful runs of HOD. The exit codes are 0 if allocation succeeded and all hadoop jobs ran on the allocated cluster correctly. They are non-zero if allocation failed or some of the hadoop jobs failed on the allocated cluster. The exit codes that are possible are mentioned in the table below. <em>Note: Hadoop job status is captured only if the version of Hadoop used is 16 or above.</em>
@@ -792,7 +794,7 @@
     
   
 </table>
-<a name="N10435"></a><a name="Command+Line"></a>
+<a name="N10438"></a><a name="Command+Line"></a>
 <h3 class="h4"> Command Line</h3>
 <a name="Command_Line" id="Command_Line"></a>
 <p>HOD command line has the following general syntax:<br>
@@ -854,7 +856,7 @@
 <br>
         All configuration options provided in the hodrc file can be passed on the command line, using the syntax <span class="codefrag">--section_name.option_name[=value]</span>. When provided this way, the value provided on command line overrides the option provided in hodrc. The verbose-help command lists all the available options in the hodrc file. This is also a nice way to see the meaning of the configuration options.</p>
 <p>See the <a href="#Options_Configuring_HOD">next section</a> for a description of most important hod configuration options. For basic options, one can do a <span class="codefrag">hod help options</span> and for all options possible in hod configuration, one can see <span class="codefrag">hod --verbose-help</span>. See <a href="hod_config_guide.html">config guide</a> for a description of all options.</p>
-<a name="N104BC"></a><a name="Options+Configuring+HOD"></a>
+<a name="N104BF"></a><a name="Options+Configuring+HOD"></a>
 <h3 class="h4"> Options Configuring HOD </h3>
 <a name="Options_Configuring_HOD" id="Options_Configuring_HOD"></a>
 <p>As described above, HOD is configured using a configuration file that is usually set up by system administrators. This is a INI style configuration file that is divided into sections, and options inside each section. Each section relates to one of the HOD processes: client, ringmaster, hodring, mapreduce or hdfs. The options inside a section comprise of an option name and value. </p>
@@ -874,7 +876,7 @@
 <p>
 <em>-d cluster_dir</em>
 <br>
-        This is required for most of the hod operations. As described <a href="#Create_a_Cluster_Directory">here</a>, the <em>cluster directory</em> is a directory on the local file system where <span class="codefrag">hod</span> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Create this directory and pass it to the <span class="codefrag">hod</span> operations as an argument to -d or --hod.clusterdir. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the clusterdirectory as the Hadoop --config option.</p>
+        This is required for most of the hod operations. As described <a href="#Create_a_Cluster_Directory">here</a>, the <em>cluster directory</em> is a directory on the local file system where <span class="codefrag">hod</span> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Pass it to the <span class="codefrag">hod</span> operations as an argument to -d or --hod.clusterdir. If it doesn't already exist, HOD will automatically try to create it and use it. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the clusterdirectory as the Hadoop --config option.</p>
 <p>
 <em>-n number_of_nodes</em>
 <br>
@@ -936,12 +938,12 @@
 </p>
 </div>
 	
-<a name="N10576"></a><a name="Troubleshooting-N10576"></a>
+<a name="N10579"></a><a name="Troubleshooting-N10579"></a>
 <h2 class="h3"> Troubleshooting </h2>
 <div class="section">
 <a name="Troubleshooting" id="Troubleshooting"></a>
 <p>The following section identifies some of the most likely error conditions users can run into when using HOD and ways to trouble-shoot them</p>
-<a name="N10581"></a><a name="Hangs+During+Allocation"></a>
+<a name="N10584"></a><a name="Hangs+During+Allocation"></a>
 <h3 class="h4">hod Hangs During Allocation </h3>
 <a name="_hod_Hangs_During_Allocation" id="_hod_Hangs_During_Allocation"></a><a name="hod_Hangs_During_Allocation" id="hod_Hangs_During_Allocation"></a>
 <p>
@@ -950,12 +952,12 @@
 <em>Possible Cause:</em> A large allocation is fired with a tarball. Sometimes due to load in the network, or on the allocated nodes, the tarball distribution might be significantly slow and take a couple of minutes to come back. Wait for completion. Also check that the tarball does not have the Hadoop sources or documentation.</p>
 <p>
 <em>Possible Cause:</em> A Torque related problem. If the cause is Torque related, the <span class="codefrag">hod</span> command will not return for more than 5 minutes. Running <span class="codefrag">hod</span> in debug mode may show the <span class="codefrag">qstat</span> command being executed repeatedly. Executing the <span class="codefrag">qstat</span> command from a separate shell may show that the job is in the <span class="codefrag">Q</span> (Queued) state. This usually indicates a problem with Torque. Possible causes could include some nodes being down, or new nodes added that Torque is not aware of. Generally, system administator help is needed to resolve this problem.</p>
-<a name="N105AE"></a><a name="Hangs+During+Deallocation"></a>
+<a name="N105B1"></a><a name="Hangs+During+Deallocation"></a>
 <h3 class="h4">hod Hangs During Deallocation </h3>
 <a name="_hod_Hangs_During_Deallocation" id="_hod_Hangs_During_Deallocation"></a><a name="hod_Hangs_During_Deallocation" id="hod_Hangs_During_Deallocation"></a>
 <p>
 <em>Possible Cause:</em> A Torque related problem, usually load on the Torque server, or the allocation is very large. Generally, waiting for the command to complete is the only option.</p>
-<a name="N105BF"></a><a name="Fails+With+an+error+code+and+error+message"></a>
+<a name="N105C2"></a><a name="Fails+With+an+error+code+and+error+message"></a>
 <h3 class="h4">hod Fails With an error code and error message </h3>
 <a name="hod_Fails_With_an_error_code_and" id="hod_Fails_With_an_error_code_and"></a><a name="_hod_Fails_With_an_error_code_an" id="_hod_Fails_With_an_error_code_an"></a>
 <p>If the exit code of the <span class="codefrag">hod</span> command is not <span class="codefrag">0</span>, then refer to the following table of error exit codes to determine why the code may have occurred and how to debug the situation.</p>
@@ -1021,11 +1023,14 @@
         
 <td colspan="1" rowspan="1"> 6 </td>
         <td colspan="1" rowspan="1"> Ringmaster failure </td>
-        <td colspan="1" rowspan="1"> 1. Invalid configuration in the <span class="codefrag">ringmaster</span> section,<br>
-          2. invalid <span class="codefrag">pkgs</span> option in <span class="codefrag">gridservice-mapred or gridservice-hdfs</span> section,<br>
-          3. an invalid hadoop tarball,<br>
-          4. mismatched version in Hadoop between the MapReduce and an external HDFS.<br>
-          The Torque <span class="codefrag">qstat</span> command will most likely show a job in the <span class="codefrag">C</span> (Completed) state. Refer to the section <em>Locating Ringmaster Logs</em> below for more information. </td>
+        <td colspan="1" rowspan="1"> HOD prints the message "Cluster could not be allocated because of the following errors on the ringmaster host &lt;hostname&gt;". The actual error message may indicate one of the following:<br>
+          1. Invalid configuration on the node running the ringmaster, specified by the hostname in the error message.<br>
+          2. Invalid configuration in the <span class="codefrag">ringmaster</span> section,<br>
+          3. Invalid <span class="codefrag">pkgs</span> option in <span class="codefrag">gridservice-mapred or gridservice-hdfs</span> section,<br>
+          4. An invalid hadoop tarball, or a tarball which has bundled an invalid configuration file in the conf directory,<br>
+          5. Mismatched version in Hadoop between the MapReduce and an external HDFS.<br>
+          The Torque <span class="codefrag">qstat</span> command will most likely show a job in the <span class="codefrag">C</span> (Completed) state. <br>
+          One can login to the ringmaster host as given by HOD failure message and debug the problem with the help of the error message. If the error message doesn't give complete information, ringmaster logs should help finding out the root cause of the problem. Refer to the section <em>Locating Ringmaster Logs</em> below for more information. </td>
       
 </tr>
       
@@ -1033,10 +1038,14 @@
         
 <td colspan="1" rowspan="1"> 7 </td>
         <td colspan="1" rowspan="1"> DFS failure </td>
-        <td colspan="1" rowspan="1"> 1. Problem in starting Hadoop clusters. Review the Hadoop related configuration. Look at the Hadoop logs using information specified in <em>Getting Hadoop Logs</em> section above. <br>
-          2. Invalid configuration in the <span class="codefrag">hodring</span> section of hodrc. <span class="codefrag">ssh</span> to all allocated nodes (determined by <span class="codefrag">qstat -f torque_job_id</span>) and grep for <span class="codefrag">ERROR</span> or <span class="codefrag">CRITICAL</span> in hodring logs. Refer to the section <em>Locating Hodring Logs</em> below for more information. <br>
-          3. Invalid tarball specified which is not packaged correctly. <br>
-          4. Cannot communicate with an externally configured HDFS. </td>
+        <td colspan="1" rowspan="1"> When HOD fails to allocate due to DFS failures (or Job tracker failures, error code 8, see below), it prints a failure message "Hodring at &lt;hostname&gt; failed with following errors:" and then gives the actual error message, which may indicate one of the following:<br>
+          1. Problem in starting Hadoop clusters. Usually the actual cause in the error message will indicate the problem on the hostname mentioned. Also, review the Hadoop related configuration in the HOD configuration files. Look at the Hadoop logs using information specified in <em>Collecting and Viewing Hadoop Logs</em> section above. <br>
+          2. Invalid configuration on the node running the hodring, specified by the hostname in the error message <br>
+          3. Invalid configuration in the <span class="codefrag">hodring</span> section of hodrc. <span class="codefrag">ssh</span> to the hostname specified in the error message and grep for <span class="codefrag">ERROR</span> or <span class="codefrag">CRITICAL</span> in hodring logs. Refer to the section <em>Locating Hodring Logs</em> below for more information. <br>
+          4. Invalid tarball specified which is not packaged correctly. <br>
+          5. Cannot communicate with an externally configured HDFS.<br>
+          When such DFS or Job tracker failure occurs, one can login into the host with hostname mentioned in HOD failure message and debug the problem. While fixing the problem, one should also review other log messages in the ringmaster log to see which other machines also might have had problems bringing up the jobtracker/namenode, apart from the hostname that is reported in the failure message. This possibility of other machines also having problems occurs because HOD continues to try and launch hadoop daemons on multiple machines one after another depending upon the value of the configuration variable <a href="hod_config_guide.html#3.4+ringmaster+options">ringmaster.max-master-failures</a>. Refer to the section <em>Locating Ringmaster Logs</em> below to find more about ringmaster logs.
+          </td>
       
 </tr>
       
@@ -1104,7 +1113,7 @@
 </tr>
   
 </table>
-<a name="N10742"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
+<a name="N10755"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
 <h3 class="h4"> Hadoop Jobs Not Running on a Successfully Allocated Cluster </h3>
 <a name="Hadoop_Jobs_Not_Running_on_a_Suc" id="Hadoop_Jobs_Not_Running_on_a_Suc"></a>
 <p>This scenario generally occurs when a cluster is allocated, and is left inactive for sometime, and then hadoop jobs are attempted to be run on them. Then Hadoop jobs fail with the following exception:</p>
@@ -1123,31 +1132,31 @@
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop client being used to submit jobs and the hadoop used in provisioning (typically via the tarball option). Ensure compatible versions are being used.</p>
 <p>
 <em>Possible Cause:</em> You used one of the options for specifying Hadoop configuration <span class="codefrag">-M or -H</span>, which had special characters like space or comma that were not escaped correctly. Refer to the section <em>Options Configuring HOD</em> for checking how to specify such options correctly.</p>
-<a name="N1077D"></a><a name="My+Hadoop+Job+Got+Killed"></a>
+<a name="N10790"></a><a name="My+Hadoop+Job+Got+Killed"></a>
 <h3 class="h4"> My Hadoop Job Got Killed </h3>
 <a name="My_Hadoop_Job_Got_Killed" id="My_Hadoop_Job_Got_Killed"></a>
 <p>
 <em>Possible Cause:</em> The wallclock limit specified by the Torque administrator or the <span class="codefrag">-l</span> option defined in the section <em>Specifying Additional Job Attributes</em> was exceeded since allocation time. Thus the cluster would have got released. Deallocate the cluster and allocate it again, this time with a larger wallclock time.</p>
 <p>
 <em>Possible Cause:</em> Problems with the JobTracker node. Refer to the section in <em>Collecting and Viewing Hadoop Logs</em> to get more information.</p>
-<a name="N10798"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
+<a name="N107AB"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
 <h3 class="h4"> Hadoop Job Fails with Message: 'Job tracker still initializing' </h3>
 <a name="Hadoop_Job_Fails_with_Message_Jo" id="Hadoop_Job_Fails_with_Message_Jo"></a>
 <p>
 <em>Possible Cause:</em> The hadoop job was being run as part of the HOD script command, and it started before the JobTracker could come up fully. Allocate the cluster using a large value for the configuration option <span class="codefrag">--hod.script-wait-time</span>. Typically a value of 120 should work, though it is typically unnecessary to be that large.</p>
-<a name="N107A8"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
+<a name="N107BB"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
 <h3 class="h4"> The Exit Codes For HOD Are Not Getting Into Torque </h3>
 <a name="The_Exit_Codes_For_HOD_Are_Not_G" id="The_Exit_Codes_For_HOD_Are_Not_G"></a>
 <p>
 <em>Possible Cause:</em> Version 0.16 of hadoop is required for this functionality to work. The version of Hadoop used does not match. Use the required version of Hadoop.</p>
 <p>
 <em>Possible Cause:</em> The deallocation was done without using the <span class="codefrag">hod</span> command; for e.g. directly using <span class="codefrag">qdel</span>. When the cluster is deallocated in this manner, the HOD processes are terminated using signals. This results in the exit code to be based on the signal number, rather than the exit code of the program.</p>
-<a name="N107C0"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
+<a name="N107D3"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
 <h3 class="h4"> The Hadoop Logs are Not Uploaded to DFS </h3>
 <a name="The_Hadoop_Logs_are_Not_Uploaded" id="The_Hadoop_Logs_are_Not_Uploaded"></a>
 <p>
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop being used for uploading the logs and the external HDFS. Ensure that the correct version is specified in the <span class="codefrag">hodring.pkgs</span> option.</p>
-<a name="N107D0"></a><a name="Locating+Ringmaster+Logs"></a>
+<a name="N107E3"></a><a name="Locating+Ringmaster+Logs"></a>
 <h3 class="h4"> Locating Ringmaster Logs </h3>
 <a name="Locating_Ringmaster_Logs" id="Locating_Ringmaster_Logs"></a>
 <p>To locate the ringmaster logs, follow these steps: </p>
@@ -1164,7 +1173,7 @@
 <li> If you don't get enough information, you may want to set the ringmaster debug level to 4. This can be done by passing <span class="codefrag">--ringmaster.debug 4</span> to the hod command line.</li>
   
 </ul>
-<a name="N107FC"></a><a name="Locating+Hodring+Logs"></a>
+<a name="N1080F"></a><a name="Locating+Hodring+Logs"></a>
 <h3 class="h4"> Locating Hodring Logs </h3>
 <a name="Locating_Hodring_Logs" id="Locating_Hodring_Logs"></a>
 <p>To locate hodring logs, follow the steps below: </p>