You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Maxime Petazzoni <ma...@turn.com> on 2012/03/16 18:30:15 UTC

Coordinators start and LAST_ONLY

Hello everyone,

I'm trying to use LAST_ONLY in the execution control of a coordinator
configuration. My understanding is that only the most recent
materialization of the coordinator would be triggered, executing the
desired workflow at the configured frequency starting from whenever the
coordinator is submitted to Oozie, even if the configured start date for
the coordinator is in the past.

Unfortunately, my experiment shows that the coordinator triggers the
workflow every 5 minutes, catching up on the "nominal" times of
execution. Once the backlog is processed, it correctly follows the
configured frequency.

Given this, I fail to understand what would be the difference between
LAST_ONLY and, well, nothing (FIFO by default, right?).

I'm still very new to Oozie so I'm probably missing something here. How
do other people achieve this? It seems very, very cumbersome to modify
the start date of a coordinator every time you (re)submit it to Oozie.

Also, and I'm not sure if it's a bug or not, but in the Oozie console
then looking at the details of my coordinator, the actions are shown
created 5 minutes *early*, even though the corresponding executed
workflow is created at the correct time. Is this normal? See attached
screenshot for details.

Thanks in advance,
/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)

Re: Coordinators start and LAST_ONLY

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Maxime,
Would you please update the respective JIRAs with your comments? It will help the dev to get the full context in one place.

Thanks,
Mohammad


________________________________
From: Maxime Petazzoni <ma...@turn.com>
To: oozie-users@incubator.apache.org; Mohammad Islam <mi...@yahoo.com> 
Sent: Thursday, March 22, 2012 11:01 AM
Subject: Re: Coordinators start and LAST_ONLY

Hi,

* Mohammad Islam <mi...@yahoo.com> [2012-03-22 01:17:41]:

> Currently there is a known bug for LAST_ONLY, We need to fix that.
> The outstanding JIRA is: https://issues.apache.org/jira/browse/OOZIE-614
> 
> For start time, we should support default value NOW as start time. I created a new JIRA regarding this.
> https://issues.apache.org/jira/browse/OOZIE-778
> As a workaround, you could write a shell command to get the current date in UTC and pass it as a variable (-D option) to the coordinator.

Thanks for all the pointers. I am indeed going to use a dynamic start
time passed with -D to the coordinator from the command-line when
submitting the job to Oozie. That's the only way for me to get the
behavior I desire.

What would be ideal is the following:

  - be able to support a "NOW" start time, adjusted to a configurable,
    "nearest" time. Something like:

    ${coord:now(15)} would have the first materialization of the
    coordinator at the next 15min increment of the hour: submitting at
    4:23pm for example would see the first materialization at 4:30pm.
    ${coord:now(1)} would be at the end of the current minute
    (submitting at 4:23:xx pm -> 4:24:00).

  - have LAST_ONLY work as it is understood in the documentation, i.e.
    ignore all previous realizations of the coordinator and have the
    first materialization whenever the frequency of the coordinator hits
    next.

    Because from what I understand from OOZIE-614, LAST_ONLY seems more
    complex right now than it needs to be IMHO.


Hope this helps!

/Max
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com) 

Re: Coordinators start and LAST_ONLY

Posted by Maxime Petazzoni <ma...@turn.com>.
Hi,

* Mohammad Islam <mi...@yahoo.com> [2012-03-22 01:17:41]:

> Currently there is a known bug for LAST_ONLY, We need to fix that.
> The outstanding JIRA is: https://issues.apache.org/jira/browse/OOZIE-614
> 
> For start time, we should support default value NOW as start time. I created a new JIRA regarding this.
> https://issues.apache.org/jira/browse/OOZIE-778 
> As a workaround, you could write a shell command to get the current date in UTC and pass it as a variable (-D option) to the coordinator.

Thanks for all the pointers. I am indeed going to use a dynamic start
time passed with -D to the coordinator from the command-line when
submitting the job to Oozie. That's the only way for me to get the
behavior I desire.

What would be ideal is the following:

  - be able to support a "NOW" start time, adjusted to a configurable,
    "nearest" time. Something like:

    ${coord:now(15)} would have the first materialization of the
    coordinator at the next 15min increment of the hour: submitting at
    4:23pm for example would see the first materialization at 4:30pm.
    ${coord:now(1)} would be at the end of the current minute
    (submitting at 4:23:xx pm -> 4:24:00).

  - have LAST_ONLY work as it is understood in the documentation, i.e.
    ignore all previous realizations of the coordinator and have the
    first materialization whenever the frequency of the coordinator hits
    next.

    Because from what I understand from OOZIE-614, LAST_ONLY seems more
    complex right now than it needs to be IMHO.


Hope this helps!

/Max
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)

Re: Coordinators start and LAST_ONLY

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Maxime,
Currently there is a known bug for LAST_ONLY, We need to fix that.
The outstanding JIRA is: https://issues.apache.org/jira/browse/OOZIE-614

For start time, we should support default value NOW as start time. I created a new JIRA regarding this.
https://issues.apache.org/jira/browse/OOZIE-778 
As a workaround, you could write a shell command to get the current date in UTC and pass it as a variable (-D option) to the coordinator.

Regards,
Mohammad

----- Original Message -----
From: Maxime Petazzoni <ma...@turn.com>
To: oozie-users@incubator.apache.org; Mohammad Islam <mi...@yahoo.com>
Cc: 
Sent: Wednesday, March 21, 2012 10:06 AM
Subject: Re: Coordinators start and LAST_ONLY

* Mohammad Islam <mi...@yahoo.com> [2012-03-20 23:38:13]:

> Hi  Maxime,
> Sorry for the late reply.
> Would you please explain the LAST_ONLY issue?
> I didn't find your original post  with LAST_ONLY.

Thanks for your reply. If you're interested in my original question,
it's available in the archives:
http://mail-archives.apache.org/mod_mbox/incubator-oozie-users/201203.mbox/%3C20120316173015.GF21666%40turn.com%3E

/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)


Re: Coordinators start and LAST_ONLY

Posted by Maxime Petazzoni <ma...@turn.com>.
* Mohammad Islam <mi...@yahoo.com> [2012-03-20 23:38:13]:

> Hi  Maxime,
> Sorry for the late reply.
> Would you please explain the LAST_ONLY issue?
> I didn't find your original post  with LAST_ONLY.

Thanks for your reply. If you're interested in my original question,
it's available in the archives:
http://mail-archives.apache.org/mod_mbox/incubator-oozie-users/201203.mbox/%3C20120316173015.GF21666%40turn.com%3E

/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)

Re: Coordinators start and LAST_ONLY

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi  Maxime,
Sorry for the late reply.
Would you please explain the LAST_ONLY issue?
I didn't find your original post  with LAST_ONLY.

Regards,
Mohammad



----- Original Message -----
From: Maxime Petazzoni <ma...@turn.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Monday, March 19, 2012 10:27 AM
Subject: Re: Coordinators start and LAST_ONLY

* Mohammad Islam <mi...@yahoo.com> [2012-03-17 01:21:08]:

> Yes we did it intentionally. The reason was to perform the
> housekeeping task little early (default is 5 minutes)  so that actual
> job can be started on-time. Coordinator action creation doesn't mean
> any execution. The workflow starts is the actual execution which
> should be started on time.
> Is there any issue with this 5 minutes early creation?

Thanks for explaining the 5min early creation of the workflow actions.
It does make a lot of sense to ensure the workflow's first action starts
right on time.

What about my other question about LAST_ONLY (that's really my most
important question here ;-p) ?

Thanks,
/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)


Re: Coordinators start and LAST_ONLY

Posted by Maxime Petazzoni <ma...@turn.com>.
* Mohammad Islam <mi...@yahoo.com> [2012-03-17 01:21:08]:

> Yes we did it intentionally. The reason was to perform the
> housekeeping task little early (default is 5 minutes)  so that actual
> job can be started on-time. Coordinator action creation doesn't mean
> any execution. The workflow starts is the actual execution which
> should be started on time.
> Is there any issue with this 5 minutes early creation?

Thanks for explaining the 5min early creation of the workflow actions.
It does make a lot of sense to ensure the workflow's first action starts
right on time.

What about my other question about LAST_ONLY (that's really my most
important question here ;-p) ?

Thanks,
/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)

Re: Coordinators start and LAST_ONLY

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Maxime
Yes we did it intentionally. The reason was to perform the housekeeping task little early (default is 5 minutes)  so that actual job can be started on-time. Coordinator action creation doesn't mean any execution. The workflow starts is the actual execution which should be started on time.
Is there any issue with this 5 minutes early creation?

Regards,
Mohammad



________________________________
From: Maxime Petazzoni <ma...@turn.com>
To: oozie-users@incubator.apache.org 
Sent: Friday, March 16, 2012 10:36 AM
Subject: Re: Coordinators start and LAST_ONLY

* Maxime Petazzoni <ma...@turn.com> [2012-03-16 10:30:15]:

> Also, and I'm not sure if it's a bug or not, but in the Oozie console
> then looking at the details of my coordinator, the actions are shown
> created 5 minutes *early*, even though the corresponding executed
> workflow is created at the correct time. Is this normal? See attached
> screenshot for details.

Looks like said attachment was removed by the mailing list. See
http://dl.bulix.org/4573f5be.jpg.

I'll add too that I'm running Oozie 3.1.0 w/ Hadoop 0.20.2.

/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)

Re: Coordinators start and LAST_ONLY

Posted by Maxime Petazzoni <ma...@turn.com>.
* Maxime Petazzoni <ma...@turn.com> [2012-03-16 10:30:15]:

> Also, and I'm not sure if it's a bug or not, but in the Oozie console
> then looking at the details of my coordinator, the actions are shown
> created 5 minutes *early*, even though the corresponding executed
> workflow is created at the correct time. Is this normal? See attached
> screenshot for details.

Looks like said attachment was removed by the mailing list. See
http://dl.bulix.org/4573f5be.jpg.

I'll add too that I'm running Oozie 3.1.0 w/ Hadoop 0.20.2.

/Maxime
-- 
Maxime Petazzoni, Platform Engineer at Turn, Inc (www.turn.com)