You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Prashant Kommireddi <pr...@gmail.com> on 2013/06/21 06:45:00 UTC

Job end notification does not always work (Hadoop 2.x)

Hello,

I came across an issue that occurs with the job notification callbacks in
MR2. It works fine if the Application master has started, but does not send
a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured
      MRApps.setJobClassLoader(conf);
      initAndStartAppMaster(appMaster, conf, jobUserName);
    } catch (Throwable t) {
      LOG.fatal("Error starting MRAppMaster", t);
      System.exit(1);
    }
  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }

appMaster.init(conf) does not dispatch JobFinishEventHandler which is
responsible for sending a HTTP callback (via shutDownJob()). If there was
an exception at this time, the process would simply terminate (via
System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things
work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job
failed?

Thanks,
Prashant

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks everyone. I have opened a JIRA and added a link to this discussion
https://issues.apache.org/jira/browse/MAPREDUCE-5353


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

if a job can finish but you cannot determine it status after it ended, then
the system is not usable. Thus, HS is a required component.

thx


On Tue, Jun 25, 2013 at 6:11 AM, Devaraj k <de...@huawei.com> wrote:

>  I agree, for getting status/counters we need HS. I mean Job can finish
> without HS also.  ****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 25 June 2013 18:05
> *To:* common-user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> Devaraj,****
>
> ** **
>
> If you don't run the HS, once your jobs finished you cannot retrieve
> status/counters from it, from Java AP or Web UI. So I'd for any practical
> usage, you need it.****
>
> ** **
>
> thx****
>
> ** **
>
> On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:**
> **
>
> It is not mandatory to have running HS in the cluster. Still the user can
> submit the job without HS in the cluster, and user may expect the Job/App
> End Notification.****
>
>  ****
>
> Thanks****
>
> Devaraj k****
>
>  ****
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org****
>
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
>  ****
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
>  appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Alejandro ****
>



-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

if a job can finish but you cannot determine it status after it ended, then
the system is not usable. Thus, HS is a required component.

thx


On Tue, Jun 25, 2013 at 6:11 AM, Devaraj k <de...@huawei.com> wrote:

>  I agree, for getting status/counters we need HS. I mean Job can finish
> without HS also.  ****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 25 June 2013 18:05
> *To:* common-user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> Devaraj,****
>
> ** **
>
> If you don't run the HS, once your jobs finished you cannot retrieve
> status/counters from it, from Java AP or Web UI. So I'd for any practical
> usage, you need it.****
>
> ** **
>
> thx****
>
> ** **
>
> On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:**
> **
>
> It is not mandatory to have running HS in the cluster. Still the user can
> submit the job without HS in the cluster, and user may expect the Job/App
> End Notification.****
>
>  ****
>
> Thanks****
>
> Devaraj k****
>
>  ****
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org****
>
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
>  ****
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
>  appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Alejandro ****
>



-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

if a job can finish but you cannot determine it status after it ended, then
the system is not usable. Thus, HS is a required component.

thx


On Tue, Jun 25, 2013 at 6:11 AM, Devaraj k <de...@huawei.com> wrote:

>  I agree, for getting status/counters we need HS. I mean Job can finish
> without HS also.  ****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 25 June 2013 18:05
> *To:* common-user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> Devaraj,****
>
> ** **
>
> If you don't run the HS, once your jobs finished you cannot retrieve
> status/counters from it, from Java AP or Web UI. So I'd for any practical
> usage, you need it.****
>
> ** **
>
> thx****
>
> ** **
>
> On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:**
> **
>
> It is not mandatory to have running HS in the cluster. Still the user can
> submit the job without HS in the cluster, and user may expect the Job/App
> End Notification.****
>
>  ****
>
> Thanks****
>
> Devaraj k****
>
>  ****
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org****
>
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
>  ****
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
>  appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Alejandro ****
>



-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

if a job can finish but you cannot determine it status after it ended, then
the system is not usable. Thus, HS is a required component.

thx


On Tue, Jun 25, 2013 at 6:11 AM, Devaraj k <de...@huawei.com> wrote:

>  I agree, for getting status/counters we need HS. I mean Job can finish
> without HS also.  ****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 25 June 2013 18:05
> *To:* common-user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> Devaraj,****
>
> ** **
>
> If you don't run the HS, once your jobs finished you cannot retrieve
> status/counters from it, from Java AP or Web UI. So I'd for any practical
> usage, you need it.****
>
> ** **
>
> thx****
>
> ** **
>
> On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:**
> **
>
> It is not mandatory to have running HS in the cluster. Still the user can
> submit the job without HS in the cluster, and user may expect the Job/App
> End Notification.****
>
>  ****
>
> Thanks****
>
> Devaraj k****
>
>  ****
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org****
>
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
>  ****
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
>  appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Alejandro ****
>



-- 
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

I agree, for getting status/counters we need HS. I mean Job can finish without HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 25 June 2013 18:05
To: common-user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve status/counters from it, from Java AP or Web UI. So I'd for any practical usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com>> wrote:
It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com<ma...@cloudera.com>]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

I agree, for getting status/counters we need HS. I mean Job can finish without HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 25 June 2013 18:05
To: common-user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve status/counters from it, from Java AP or Web UI. So I'd for any practical usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com>> wrote:
It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com<ma...@cloudera.com>]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

I agree, for getting status/counters we need HS. I mean Job can finish without HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 25 June 2013 18:05
To: common-user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve status/counters from it, from Java AP or Web UI. So I'd for any practical usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com>> wrote:
It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com<ma...@cloudera.com>]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

I agree, for getting status/counters we need HS. I mean Job can finish without HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 25 June 2013 18:05
To: common-user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve status/counters from it, from Java AP or Web UI. So I'd for any practical usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com>> wrote:
It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com<ma...@cloudera.com>]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

--
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks everyone. I have opened a JIRA and added a link to this discussion
https://issues.apache.org/jira/browse/MAPREDUCE-5353


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks everyone. I have opened a JIRA and added a link to this discussion
https://issues.apache.org/jira/browse/MAPREDUCE-5353


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks everyone. I have opened a JIRA and added a link to this discussion
https://issues.apache.org/jira/browse/MAPREDUCE-5353


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <de...@huawei.com> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* 24 June 2013 21:42
> *To:* user@hadoop.apache.org
> *Cc:* user@hadoop.apache.org
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:ravihoo@ymail.com <ra...@ymail.com>]
> *Sent:* 23 June 2013 19:01
> *To:* user@hadoop.apache.org
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <tu...@cloudera.com>
> *To:* "common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:****
>
>
>
>
> ****
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.****
>
>  ****
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:**
> **
>
> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hello,****
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.****
>
> Here is the code from MRAppMaster.java
>
> .....
> .......****
>
>       // set job classloader if configured****
>
>       MRApps.setJobClassLoader(conf);****
>
>       initAndStartAppMaster(appMaster, conf, jobUserName);****
>
>     } catch (Throwable t) {****
>
>       LOG.fatal("Error starting MRAppMaster", t);****
>
>       System.exit(1);****
>
>     }****
>
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,****
>
>       final YarnConfiguration conf, String jobUserName) throws IOException,****
>
>       InterruptedException {****
>
>     UserGroupInformation.setConfiguration(conf);****
>
>     UserGroupInformation appMasterUgi = UserGroupInformation****
>
>         .createRemoteUser(jobUserName);****
>
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {****
>
>       @Override****
>
>       public Object run() throws Exception {****
>
>         appMaster.init(conf);****
>
>         appMaster.start();****
>
>         if(appMaster.errorHappenedShutDown) {****
>
>           throw new IOException("Was asked to shut down.");****
>
>         }****
>
>         return null;****
>
>       }****
>
>     });****
>
>   }****
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )****
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.****
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?****
>
> Thanks,****
>
> Prashant****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Alejandro ****
>
>  ****
>
>


-- 
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

It is not mandatory to have running HS in the cluster. Still the user can submit the job without HS in the cluster, and user may expect the Job/App End Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.

Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi

________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi

________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

if we ought to do this in a yarn service it 
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types. 

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:

> Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.
>  
> I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.
>  
>  
> Thanks
> Devaraj K
>  
> From: Ravi Prakash [mailto:ravihoo@ymail.com] 
> Sent: 23 June 2013 19:01
> To: user@hadoop.apache.org
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> Hi Alejandro,
> 
> Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?
> 
> Thanks
> Ravi
>  
>  
> From: Alejandro Abdelnur <tu...@cloudera.com>
> To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Saturday, June 22, 2013 7:37 PM
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.
> 
> thx
>  
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Prashanth, 
>  
>  Please file a jira.
>  
>  One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>  
>  Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>  
> hth,
> Arun
>  
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
> 
> 
> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
>  
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
>  
>  
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
>  
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
>  
>  
> 
>  
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> 
>  
> -- 
> Alejandro
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

if we ought to do this in a yarn service it 
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types. 

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:

> Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.
>  
> I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.
>  
>  
> Thanks
> Devaraj K
>  
> From: Ravi Prakash [mailto:ravihoo@ymail.com] 
> Sent: 23 June 2013 19:01
> To: user@hadoop.apache.org
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> Hi Alejandro,
> 
> Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?
> 
> Thanks
> Ravi
>  
>  
> From: Alejandro Abdelnur <tu...@cloudera.com>
> To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Saturday, June 22, 2013 7:37 PM
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.
> 
> thx
>  
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Prashanth, 
>  
>  Please file a jira.
>  
>  One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>  
>  Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>  
> hth,
> Arun
>  
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
> 
> 
> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
>  
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
>  
>  
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
>  
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
>  
>  
> 
>  
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> 
>  
> -- 
> Alejandro
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

if we ought to do this in a yarn service it 
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types. 

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:

> Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.
>  
> I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.
>  
>  
> Thanks
> Devaraj K
>  
> From: Ravi Prakash [mailto:ravihoo@ymail.com] 
> Sent: 23 June 2013 19:01
> To: user@hadoop.apache.org
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> Hi Alejandro,
> 
> Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?
> 
> Thanks
> Ravi
>  
>  
> From: Alejandro Abdelnur <tu...@cloudera.com>
> To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Saturday, June 22, 2013 7:37 PM
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.
> 
> thx
>  
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Prashanth, 
>  
>  Please file a jira.
>  
>  One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>  
>  Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>  
> hth,
> Arun
>  
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
> 
> 
> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
>  
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
>  
>  
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
>  
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
>  
>  
> 
>  
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> 
>  
> -- 
> Alejandro
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

if we ought to do this in a yarn service it 
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a good choice if we are concerned about the extra work this would cause in the RM. the problem with the current HS is that it is MR specific, we should generalize it for diff AM types. 

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k <de...@huawei.com> wrote:

> Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.
>  
> I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.
>  
>  
> Thanks
> Devaraj K
>  
> From: Ravi Prakash [mailto:ravihoo@ymail.com] 
> Sent: 23 June 2013 19:01
> To: user@hadoop.apache.org
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> Hi Alejandro,
> 
> Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?
> 
> Thanks
> Ravi
>  
>  
> From: Alejandro Abdelnur <tu...@cloudera.com>
> To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Saturday, June 22, 2013 7:37 PM
> Subject: Re: Job end notification does not always work (Hadoop 2.x)
>  
> If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.
> 
> thx
>  
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Prashanth, 
>  
>  Please file a jira.
>  
>  One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>  
>  Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>  
> hth,
> Arun
>  
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
> 
> 
> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
>  
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
>  
>  
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
>  
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
>  
>  
> 
>  
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> 
>  
> -- 
> Alejandro
>

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi


________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi


________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant




--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi


________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi


________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant




--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi


________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi


________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant




--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




--
Alejandro

RE: Job end notification does not always work (Hadoop 2.x)

Posted by Devaraj k <de...@huawei.com>.

Even if we handle all the failure cases in AM for Job End Notification, we may miss cases like abrupt kill of AM when it is in last retry. If we choose NM to give the notification, again RM needs to identify which NM should give the end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn Service because it ensures 100% notification and also useful for other types of applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravihoo@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi


________________________________
From: Alejandro Abdelnur <tu...@cloudera.com>>
To: "common-user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi


________________________________
From: Prashant Kommireddi <pr...@gmail.com>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured

      MRApps.setJobClassLoader(conf);

      initAndStartAppMaster(appMaster, conf, jobUserName);

    } catch (Throwable t) {

      LOG.fatal("Error starting MRAppMaster", t);

      System.exit(1);

    }

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

      final YarnConfiguration conf, String jobUserName) throws IOException,

      InterruptedException {

    UserGroupInformation.setConfiguration(conf);

    UserGroupInformation appMasterUgi = UserGroupInformation

        .createRemoteUser(jobUserName);

    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {

      @Override

      public Object run() throws Exception {

        appMaster.init(conf);

        appMaster.start();

        if(appMaster.errorHappenedShutDown) {

          throw new IOException("Was asked to shut down.");

        }

        return null;

      }

    });

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
Shouldn't a failure on init(..) also send a callback suggesting the job failed?
Thanks,
Prashant




--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




--
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi




________________________________
 From: Alejandro Abdelnur <tu...@cloudera.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)
 


If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx


On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

Prashanth, 
>
>
> Please file a jira.
>
>
> One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>
>
> Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>
>
>hth,
>Arun
>
>
>On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
>
>Thanks Ravi.
>>
>>Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
>>
>>
>>
>>
>>
>>On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>>
>>Hi Prashant,
>>>
>>>I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
>>>
>>>Thanks
>>>Ravi
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
>>>Sent: Thursday, June 20, 2013 9:45 PM
>>>Subject: Job end notification does not always work (Hadoop 2.x)
>>> 
>>>
>>>
>>>Hello,
>>>
>>>I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
>>>
>>>Here is the code from MRAppMaster.java
>>>
>>>.....
>>>.......
>>>
>>>// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }
>>>
>>>protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
>>>appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
>>>
>>>appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
>>>
>>>Shouldn't a failure on init(..) also send a callback suggesting the job failed?
>>>
>>>Thanks,
>>>
Prashant
>>>
>>>
>>>
>>>
>>>
>>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
> 
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi




________________________________
 From: Alejandro Abdelnur <tu...@cloudera.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)
 


If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx


On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

Prashanth, 
>
>
> Please file a jira.
>
>
> One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>
>
> Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>
>
>hth,
>Arun
>
>
>On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
>
>Thanks Ravi.
>>
>>Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
>>
>>
>>
>>
>>
>>On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>>
>>Hi Prashant,
>>>
>>>I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
>>>
>>>Thanks
>>>Ravi
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
>>>Sent: Thursday, June 20, 2013 9:45 PM
>>>Subject: Job end notification does not always work (Hadoop 2.x)
>>> 
>>>
>>>
>>>Hello,
>>>
>>>I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
>>>
>>>Here is the code from MRAppMaster.java
>>>
>>>.....
>>>.......
>>>
>>>// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }
>>>
>>>protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
>>>appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
>>>
>>>appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
>>>
>>>Shouldn't a failure on init(..) also send a callback suggesting the job failed?
>>>
>>>Thanks,
>>>
Prashant
>>>
>>>
>>>
>>>
>>>
>>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
> 
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi




________________________________
 From: Alejandro Abdelnur <tu...@cloudera.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)
 


If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx


On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

Prashanth, 
>
>
> Please file a jira.
>
>
> One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>
>
> Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>
>
>hth,
>Arun
>
>
>On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
>
>Thanks Ravi.
>>
>>Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
>>
>>
>>
>>
>>
>>On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>>
>>Hi Prashant,
>>>
>>>I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
>>>
>>>Thanks
>>>Ravi
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
>>>Sent: Thursday, June 20, 2013 9:45 PM
>>>Subject: Job end notification does not always work (Hadoop 2.x)
>>> 
>>>
>>>
>>>Hello,
>>>
>>>I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
>>>
>>>Here is the code from MRAppMaster.java
>>>
>>>.....
>>>.......
>>>
>>>// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }
>>>
>>>protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
>>>appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
>>>
>>>appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
>>>
>>>Shouldn't a failure on init(..) also send a callback suggesting the job failed?
>>>
>>>Thanks,
>>>
Prashant
>>>
>>>
>>>
>>>
>>>
>>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
> 
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested i.e. a failure during init() should still trigger an attempt to notify (by the AM). But now that you mention it, maybe we would be better of including this as a YARN feature after all (specially with all the new AMs being written). We could let the NM of the AM handle the notification burden, so that the RM doesn't get unduly taxed. Thoughts?

Thanks
Ravi




________________________________
 From: Alejandro Abdelnur <tu...@cloudera.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)
 


If the AM fails before doing the job end notification, at any stage of the execution for whatever reason, the job end notification will never be deliver. There is not way to fix this unless the notification is done by a Yarn service. The 2 'candidate' services for doing this would be the RM and the HS. The job notification URL is in the job conf. The RM never sees the job conf, that rules out the RM out unless we add, at AM registration time the possibility to specify a callback URL. The HS has access to the job conf, but the HS is currently a 'passive' service.

thx


On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

Prashanth, 
>
>
> Please file a jira.
>
>
> One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.
>
>
> Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.
>
>
>hth,
>Arun
>
>
>On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:
>
>Thanks Ravi.
>>
>>Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
>>
>>
>>
>>
>>
>>On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>>
>>Hi Prashant,
>>>
>>>I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
>>>
>>>Thanks
>>>Ravi
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
>>>Sent: Thursday, June 20, 2013 9:45 PM
>>>Subject: Job end notification does not always work (Hadoop 2.x)
>>> 
>>>
>>>
>>>Hello,
>>>
>>>I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
>>>
>>>Here is the code from MRAppMaster.java
>>>
>>>.....
>>>.......
>>>
>>>// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }
>>>
>>>protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
>>>appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
>>>
>>>appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
>>>
>>>Shouldn't a failure on init(..) also send a callback suggesting the job failed?
>>>
>>>Thanks,
>>>
Prashant
>>>
>>>
>>>
>>>
>>>
>>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
> 
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

If the AM fails before doing the job end notification, at any stage of the
execution for whatever reason, the job end notification will never be
deliver. There is not way to fix this unless the notification is done by a
Yarn service. The 2 'candidate' services for doing this would be the RM and
the HS. The job notification URL is in the job conf. The RM never sees the
job conf, that rules out the RM out unless we add, at AM registration time
the possibility to specify a callback URL. The HS has access to the job
conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Prashanth,
>
>  Please file a jira.
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.
>
> hth,
> Arun
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.
>
>
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>
>> Hi Prashant,
>>
>> I would tend to agree with you. Although job-end notification is only a
>> "best-effort" mechanism (i.e. we cannot always guarantee notification for
>> example when the AM OOMs), I agree with you that we can do more. If you
>> feel strongly about this, please create a JIRA and possibly upload a patch.
>>
>> Thanks
>> Ravi
>>
>>
>>   ------------------------------
>>  *From:* Prashant Kommireddi <pr...@gmail.com>
>> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Sent:* Thursday, June 20, 2013 9:45 PM
>> *Subject:* Job end notification does not always work (Hadoop 2.x)
>>
>> Hello,
>>
>> I came across an issue that occurs with the job notification callbacks in
>> MR2. It works fine if the Application master has started, but does not send
>> a callback if the initializing of AM fails.
>>
>> Here is the code from MRAppMaster.java
>>
>> .....
>> .......
>>
>>       // set job classloader if configured
>>       MRApps.setJobClassLoader(conf);
>>       initAndStartAppMaster(appMaster, conf, jobUserName);
>>     } catch (Throwable t) {
>>       LOG.fatal("Error starting MRAppMaster", t);
>>       System.exit(1);
>>     }
>>   }
>>
>> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>>       final YarnConfiguration conf, String jobUserName) throws IOException,
>>       InterruptedException {
>>     UserGroupInformation.setConfiguration(conf);
>>     UserGroupInformation appMasterUgi = UserGroupInformation
>>         .createRemoteUser(jobUserName);
>>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>>       @Override
>>       public Object run() throws Exception {
>>         appMaster.init(conf);
>>         appMaster.start();
>>         if(appMaster.errorHappenedShutDown) {
>>           throw new IOException("Was asked to shut down.");
>>         }
>>         return null;
>>       }
>>     });
>>   }
>>
>> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
>> responsible for sending a HTTP callback (via shutDownJob()). If there was
>> an exception at this time, the process would simply terminate (via
>> System.exit(1) )
>>
>> appMaster.start() however rightly uses the JobFinishEventHandler and
>> things work fine.
>>
>> Shouldn't a failure on init(..) also send a callback suggesting the job
>> failed?
>>
>> Thanks,
>> Prashant
>>
>>
>>
>>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

If the AM fails before doing the job end notification, at any stage of the
execution for whatever reason, the job end notification will never be
deliver. There is not way to fix this unless the notification is done by a
Yarn service. The 2 'candidate' services for doing this would be the RM and
the HS. The job notification URL is in the job conf. The RM never sees the
job conf, that rules out the RM out unless we add, at AM registration time
the possibility to specify a callback URL. The HS has access to the job
conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Prashanth,
>
>  Please file a jira.
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.
>
> hth,
> Arun
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.
>
>
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>
>> Hi Prashant,
>>
>> I would tend to agree with you. Although job-end notification is only a
>> "best-effort" mechanism (i.e. we cannot always guarantee notification for
>> example when the AM OOMs), I agree with you that we can do more. If you
>> feel strongly about this, please create a JIRA and possibly upload a patch.
>>
>> Thanks
>> Ravi
>>
>>
>>   ------------------------------
>>  *From:* Prashant Kommireddi <pr...@gmail.com>
>> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Sent:* Thursday, June 20, 2013 9:45 PM
>> *Subject:* Job end notification does not always work (Hadoop 2.x)
>>
>> Hello,
>>
>> I came across an issue that occurs with the job notification callbacks in
>> MR2. It works fine if the Application master has started, but does not send
>> a callback if the initializing of AM fails.
>>
>> Here is the code from MRAppMaster.java
>>
>> .....
>> .......
>>
>>       // set job classloader if configured
>>       MRApps.setJobClassLoader(conf);
>>       initAndStartAppMaster(appMaster, conf, jobUserName);
>>     } catch (Throwable t) {
>>       LOG.fatal("Error starting MRAppMaster", t);
>>       System.exit(1);
>>     }
>>   }
>>
>> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>>       final YarnConfiguration conf, String jobUserName) throws IOException,
>>       InterruptedException {
>>     UserGroupInformation.setConfiguration(conf);
>>     UserGroupInformation appMasterUgi = UserGroupInformation
>>         .createRemoteUser(jobUserName);
>>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>>       @Override
>>       public Object run() throws Exception {
>>         appMaster.init(conf);
>>         appMaster.start();
>>         if(appMaster.errorHappenedShutDown) {
>>           throw new IOException("Was asked to shut down.");
>>         }
>>         return null;
>>       }
>>     });
>>   }
>>
>> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
>> responsible for sending a HTTP callback (via shutDownJob()). If there was
>> an exception at this time, the process would simply terminate (via
>> System.exit(1) )
>>
>> appMaster.start() however rightly uses the JobFinishEventHandler and
>> things work fine.
>>
>> Shouldn't a failure on init(..) also send a callback suggesting the job
>> failed?
>>
>> Thanks,
>> Prashant
>>
>>
>>
>>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

If the AM fails before doing the job end notification, at any stage of the
execution for whatever reason, the job end notification will never be
deliver. There is not way to fix this unless the notification is done by a
Yarn service. The 2 'candidate' services for doing this would be the RM and
the HS. The job notification URL is in the job conf. The RM never sees the
job conf, that rules out the RM out unless we add, at AM registration time
the possibility to specify a callback URL. The HS has access to the job
conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Prashanth,
>
>  Please file a jira.
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.
>
> hth,
> Arun
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.
>
>
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>
>> Hi Prashant,
>>
>> I would tend to agree with you. Although job-end notification is only a
>> "best-effort" mechanism (i.e. we cannot always guarantee notification for
>> example when the AM OOMs), I agree with you that we can do more. If you
>> feel strongly about this, please create a JIRA and possibly upload a patch.
>>
>> Thanks
>> Ravi
>>
>>
>>   ------------------------------
>>  *From:* Prashant Kommireddi <pr...@gmail.com>
>> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Sent:* Thursday, June 20, 2013 9:45 PM
>> *Subject:* Job end notification does not always work (Hadoop 2.x)
>>
>> Hello,
>>
>> I came across an issue that occurs with the job notification callbacks in
>> MR2. It works fine if the Application master has started, but does not send
>> a callback if the initializing of AM fails.
>>
>> Here is the code from MRAppMaster.java
>>
>> .....
>> .......
>>
>>       // set job classloader if configured
>>       MRApps.setJobClassLoader(conf);
>>       initAndStartAppMaster(appMaster, conf, jobUserName);
>>     } catch (Throwable t) {
>>       LOG.fatal("Error starting MRAppMaster", t);
>>       System.exit(1);
>>     }
>>   }
>>
>> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>>       final YarnConfiguration conf, String jobUserName) throws IOException,
>>       InterruptedException {
>>     UserGroupInformation.setConfiguration(conf);
>>     UserGroupInformation appMasterUgi = UserGroupInformation
>>         .createRemoteUser(jobUserName);
>>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>>       @Override
>>       public Object run() throws Exception {
>>         appMaster.init(conf);
>>         appMaster.start();
>>         if(appMaster.errorHappenedShutDown) {
>>           throw new IOException("Was asked to shut down.");
>>         }
>>         return null;
>>       }
>>     });
>>   }
>>
>> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
>> responsible for sending a HTTP callback (via shutDownJob()). If there was
>> an exception at this time, the process would simply terminate (via
>> System.exit(1) )
>>
>> appMaster.start() however rightly uses the JobFinishEventHandler and
>> things work fine.
>>
>> Shouldn't a failure on init(..) also send a callback suggesting the job
>> failed?
>>
>> Thanks,
>> Prashant
>>
>>
>>
>>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

If the AM fails before doing the job end notification, at any stage of the
execution for whatever reason, the job end notification will never be
deliver. There is not way to fix this unless the notification is done by a
Yarn service. The 2 'candidate' services for doing this would be the RM and
the HS. The job notification URL is in the job conf. The RM never sees the
job conf, that rules out the RM out unless we add, at AM registration time
the possibility to specify a callback URL. The HS has access to the job
conf, but the HS is currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Prashanth,
>
>  Please file a jira.
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.
>
> hth,
> Arun
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com>
> wrote:
>
> Thanks Ravi.
>
> Well, in this case its a no-effort :) A failure of AM init should be
> considered as failure of the job? I looked at the code and best-effort
> makes sense with respect to retry logic etc. You make a good point that
> there would be no notification in case AM OOMs, but I do feel AM init
> failure should send a notification by other means.
>
>
>
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
>
>> Hi Prashant,
>>
>> I would tend to agree with you. Although job-end notification is only a
>> "best-effort" mechanism (i.e. we cannot always guarantee notification for
>> example when the AM OOMs), I agree with you that we can do more. If you
>> feel strongly about this, please create a JIRA and possibly upload a patch.
>>
>> Thanks
>> Ravi
>>
>>
>>   ------------------------------
>>  *From:* Prashant Kommireddi <pr...@gmail.com>
>> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> *Sent:* Thursday, June 20, 2013 9:45 PM
>> *Subject:* Job end notification does not always work (Hadoop 2.x)
>>
>> Hello,
>>
>> I came across an issue that occurs with the job notification callbacks in
>> MR2. It works fine if the Application master has started, but does not send
>> a callback if the initializing of AM fails.
>>
>> Here is the code from MRAppMaster.java
>>
>> .....
>> .......
>>
>>       // set job classloader if configured
>>       MRApps.setJobClassLoader(conf);
>>       initAndStartAppMaster(appMaster, conf, jobUserName);
>>     } catch (Throwable t) {
>>       LOG.fatal("Error starting MRAppMaster", t);
>>       System.exit(1);
>>     }
>>   }
>>
>> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>>       final YarnConfiguration conf, String jobUserName) throws IOException,
>>       InterruptedException {
>>     UserGroupInformation.setConfiguration(conf);
>>     UserGroupInformation appMasterUgi = UserGroupInformation
>>         .createRemoteUser(jobUserName);
>>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>>       @Override
>>       public Object run() throws Exception {
>>         appMaster.init(conf);
>>         appMaster.start();
>>         if(appMaster.errorHappenedShutDown) {
>>           throw new IOException("Was asked to shut down.");
>>         }
>>         return null;
>>       }
>>     });
>>   }
>>
>> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
>> responsible for sending a HTTP callback (via shutDownJob()). If there was
>> an exception at this time, the process would simply terminate (via
>> System.exit(1) )
>>
>> appMaster.start() however rightly uses the JobFinishEventHandler and
>> things work fine.
>>
>> Shouldn't a failure on init(..) also send a callback suggesting the job
>> failed?
>>
>> Thanks,
>> Prashant
>>
>>
>>
>>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
Alejandro

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Prashanth, 

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:

> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
> 
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
> 
> 
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
> 
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Prashanth, 

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:

> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
> 
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
> 
> 
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
> 
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Prashanth, 

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:

> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
> 
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
> 
> 
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
> 
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Prashanth, 

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for fault-tolerance - which means we can't just assume that failure of a single AM is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <pr...@gmail.com> wrote:

> Thanks Ravi.
> 
> Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
> 
> 
> 
> On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:
> Hi Prashant,
> 
> I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
> 
> Thanks
> Ravi
> 
> 
> From: Prashant Kommireddi <pr...@gmail.com>
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
> Sent: Thursday, June 20, 2013 9:45 PM
> Subject: Job end notification does not always work (Hadoop 2.x)
> 
> Hello,
> 
> I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.
> 
> Here is the code from MRAppMaster.java
> 
> .....
> .......
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
> 
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )
> 
> appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.
> 
> Shouldn't a failure on init(..) also send a callback suggesting the job failed?
> 
> Thanks,
> Prashant
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be
considered as failure of the job? I looked at the code and best-effort
makes sense with respect to retry logic etc. You make a good point that
there would be no notification in case AM OOMs, but I do feel AM init
failure should send a notification by other means.



On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi
>
>
>   ------------------------------
>  *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)
>
> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be
considered as failure of the job? I looked at the code and best-effort
makes sense with respect to retry logic etc. You make a good point that
there would be no notification in case AM OOMs, but I do feel AM init
failure should send a notification by other means.



On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi
>
>
>   ------------------------------
>  *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)
>
> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be
considered as failure of the job? I looked at the code and best-effort
makes sense with respect to retry logic etc. You make a good point that
there would be no notification in case AM OOMs, but I do feel AM init
failure should send a notification by other means.



On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi
>
>
>   ------------------------------
>  *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)
>
> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be
considered as failure of the job? I looked at the code and best-effort
makes sense with respect to retry logic etc. You make a good point that
there would be no notification in case AM OOMs, but I do feel AM init
failure should send a notification by other means.



On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Prashant,
>
> I would tend to agree with you. Although job-end notification is only a
> "best-effort" mechanism (i.e. we cannot always guarantee notification for
> example when the AM OOMs), I agree with you that we can do more. If you
> feel strongly about this, please create a JIRA and possibly upload a patch.
>
> Thanks
> Ravi
>
>
>   ------------------------------
>  *From:* Prashant Kommireddi <pr...@gmail.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Thursday, June 20, 2013 9:45 PM
> *Subject:* Job end notification does not always work (Hadoop 2.x)
>
> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi




________________________________
 From: Prashant Kommireddi <pr...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi




________________________________
 From: Prashant Kommireddi <pr...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA>


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA>


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA>


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Prashant Kommireddi <pr...@gmail.com>.

Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA>


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi




________________________________
 From: Prashant Kommireddi <pr...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant

Re: Job end notification does not always work (Hadoop 2.x)

Posted by Ravi Prakash <ra...@ymail.com>.

Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi




________________________________
 From: Prashant Kommireddi <pr...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant