You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Max Hansmire <ha...@gmail.com> on 2012/03/06 15:25:45 UTC

Question about Dates

I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.

Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.

  <property>
    <name>runDate</name>
    <value>2012-03-04</value>
  </property>
  <property>
    <name>outputDir</name>
    <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
  </property>

Here is the dataset definition.

    <dataset name="example" frequency="${coord:days(1)}"
        initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
        <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
        </uri-template>
        <done-flag></done-flag>
    </dataset>

The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 

I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.

Max



Re: Question about Dates

Posted by Max Hansmire <ha...@gmail.com>.
I have a followup to this question.

Many of the coordinators that I have data outputs setup like this.

    <output-events>
		<data-out name="output" dataset="output">
			<instance>${coord:current(-1)}</instance>
		</data-out>
	</output-events>

and then later pass it to a workflow like this.

		           <property>
					<name>outputDir</name>
					<value>${coord:dataOut('output')}</value>
			    </property>

When I use the trick that you describe below, I get this problem "variable [outputDir] cannot be resolved". One solution I can think of is to get rid of the ${coord:current(-1)} and change it to this ${coord:current(0)}. But this does not really make sense. I am processing yesterday's data so I feel like the output directory should be labeled with yesterday's date.

Any tips you have would be great. For now, I will start each coordinator one day earlier than I actually want it to run.

Max

On Mar 7, 2012, at 8:33 AM, Max Hansmire wrote:

> Thanks Mohammad.
> 
> On Mar 7, 2012, at 2:53 AM, Mohammad Islam wrote:
> 
>> The better option is to  define a variable such as MyStartTime during job submission and use it as the value of starttime and initital-instance.
>> 
>> For example ..
>> <coordinator-app  start-time=${MyStartTime} ...>
>> 
>> <dataset initial-instance = ${MyStartTime}>
>> 
>> This will give you a lot of flexibility.
>> 
>> You can define the MyStartTime any of the following ways:
>> 1.  In job.properties file, add a line MyStartTime=2011-05-01T05:00Z
>> OR 2. Through oozie command line : oozie job -run -config ??.properties -DMyStartTime=2011-05-01T05:00Z
>> 
>> Regards,
>> Mohammad
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Max Hansmire <ha...@gmail.com>
>> To: Mohammad Islam <mi...@yahoo.com>
>> Cc: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org>
>> Sent: Tuesday, March 6, 2012 8:23 PM
>> Subject: Re: Question about Dates
>> 
>> No. They are not. Thanks for the help. Is there a mechanism for keeping these in sync. Or is it just a matter of doing it manually.
>> 
>> My dataset are defined in a separate file from the coordinator.
>> 
>> Max
>> On Mar 6, 2012, at 11:09 PM, Mohammad Islam wrote:
>> 
>>> Hi Max,
>>> The "starttime" attribute of coordinator and "iniital-instance" of output data set definition should be the same. Are they same?
>>> 
>>> Regards,
>>> Mohammad
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Max Hansmire <ha...@gmail.com>
>>> To: oozie-users@incubator.apache.org
>>> Cc: 
>>> Sent: Tuesday, March 6, 2012 6:25 AM
>>> Subject: Question about Dates
>>> 
>>> I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.
>>> 
>>> Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.
>>> 
>>>   <property>
>>>     <name>runDate</name>
>>>     <value>2012-03-04</value>
>>>   </property>
>>>   <property>
>>>     <name>outputDir</name>
>>>     <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
>>>   </property>
>>> 
>>> Here is the dataset definition.
>>> 
>>>     <dataset name="example" frequency="${coord:days(1)}"
>>>         initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
>>>         <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
>>>         </uri-template>
>>>         <done-flag></done-flag>
>>>     </dataset>
>>> 
>>> The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 
>>> 
>>> I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.
>>> 
>>> Max
> 


Re: Question about Dates

Posted by Max Hansmire <ha...@gmail.com>.
Thanks Mohammad.

On Mar 7, 2012, at 2:53 AM, Mohammad Islam wrote:

> The better option is to  define a variable such as MyStartTime during job submission and use it as the value of starttime and initital-instance.
> 
> For example ..
> <coordinator-app  start-time=${MyStartTime} ...>
> 
> <dataset initial-instance = ${MyStartTime}>
> 
> This will give you a lot of flexibility.
> 
> You can define the MyStartTime any of the following ways:
> 1.  In job.properties file, add a line MyStartTime=2011-05-01T05:00Z
> OR 2. Through oozie command line : oozie job -run -config ??.properties -DMyStartTime=2011-05-01T05:00Z
> 
> Regards,
> Mohammad
> 
> 
> 
> ----- Original Message -----
> From: Max Hansmire <ha...@gmail.com>
> To: Mohammad Islam <mi...@yahoo.com>
> Cc: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org>
> Sent: Tuesday, March 6, 2012 8:23 PM
> Subject: Re: Question about Dates
> 
> No. They are not. Thanks for the help. Is there a mechanism for keeping these in sync. Or is it just a matter of doing it manually.
> 
> My dataset are defined in a separate file from the coordinator.
> 
> Max
> On Mar 6, 2012, at 11:09 PM, Mohammad Islam wrote:
> 
>> Hi Max,
>> The "starttime" attribute of coordinator and "iniital-instance" of output data set definition should be the same. Are they same?
>> 
>> Regards,
>> Mohammad
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Max Hansmire <ha...@gmail.com>
>> To: oozie-users@incubator.apache.org
>> Cc: 
>> Sent: Tuesday, March 6, 2012 6:25 AM
>> Subject: Question about Dates
>> 
>> I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.
>> 
>> Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.
>> 
>>    <property>
>>      <name>runDate</name>
>>      <value>2012-03-04</value>
>>    </property>
>>    <property>
>>      <name>outputDir</name>
>>      <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
>>    </property>
>> 
>> Here is the dataset definition.
>> 
>>      <dataset name="example" frequency="${coord:days(1)}"
>>          initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
>>          <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
>>          </uri-template>
>>          <done-flag></done-flag>
>>      </dataset>
>> 
>> The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 
>> 
>> I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.
>> 
>> Max


Re: Question about Dates

Posted by Mohammad Islam <mi...@yahoo.com>.
The better option is to  define a variable such as MyStartTime during job submission and use it as the value of starttime and initital-instance.

For example ..
<coordinator-app  start-time=${MyStartTime} ...>

<dataset initial-instance = ${MyStartTime}>

This will give you a lot of flexibility.

You can define the MyStartTime any of the following ways:
1.  In job.properties file, add a line MyStartTime=2011-05-01T05:00Z
OR 2. Through oozie command line : oozie job -run -config ??.properties -DMyStartTime=2011-05-01T05:00Z

Regards,
Mohammad



----- Original Message -----
From: Max Hansmire <ha...@gmail.com>
To: Mohammad Islam <mi...@yahoo.com>
Cc: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org>
Sent: Tuesday, March 6, 2012 8:23 PM
Subject: Re: Question about Dates

No. They are not. Thanks for the help. Is there a mechanism for keeping these in sync. Or is it just a matter of doing it manually.

My dataset are defined in a separate file from the coordinator.

Max
On Mar 6, 2012, at 11:09 PM, Mohammad Islam wrote:

> Hi Max,
> The "starttime" attribute of coordinator and "iniital-instance" of output data set definition should be the same. Are they same?
> 
> Regards,
> Mohammad
> 
> 
> 
> ----- Original Message -----
> From: Max Hansmire <ha...@gmail.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Tuesday, March 6, 2012 6:25 AM
> Subject: Question about Dates
> 
> I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.
> 
> Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.
> 
>   <property>
>     <name>runDate</name>
>     <value>2012-03-04</value>
>   </property>
>   <property>
>     <name>outputDir</name>
>     <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
>   </property>
> 
> Here is the dataset definition.
> 
>     <dataset name="example" frequency="${coord:days(1)}"
>         initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
>         <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
>         </uri-template>
>         <done-flag></done-flag>
>     </dataset>
> 
> The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 
> 
> I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.
> 
> Max

Re: Question about Dates

Posted by Max Hansmire <ha...@gmail.com>.
No. They are not. Thanks for the help. Is there a mechanism for keeping these in sync. Or is it just a matter of doing it manually.

My dataset are defined in a separate file from the coordinator.

Max
On Mar 6, 2012, at 11:09 PM, Mohammad Islam wrote:

> Hi Max,
> The "starttime" attribute of coordinator and "iniital-instance" of output data set definition should be the same. Are they same?
> 
> Regards,
> Mohammad
> 
> 
> 
> ----- Original Message -----
> From: Max Hansmire <ha...@gmail.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Tuesday, March 6, 2012 6:25 AM
> Subject: Question about Dates
> 
> I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.
> 
> Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.
> 
>   <property>
>     <name>runDate</name>
>     <value>2012-03-04</value>
>   </property>
>   <property>
>     <name>outputDir</name>
>     <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
>   </property>
> 
> Here is the dataset definition.
> 
>     <dataset name="example" frequency="${coord:days(1)}"
>         initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
>         <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
>         </uri-template>
>         <done-flag></done-flag>
>     </dataset>
> 
> The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 
> 
> I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.
> 
> Max


Re: Question about Dates

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Max,
The "starttime" attribute of coordinator and "iniital-instance" of output data set definition should be the same. Are they same?

Regards,
Mohammad



----- Original Message -----
From: Max Hansmire <ha...@gmail.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Tuesday, March 6, 2012 6:25 AM
Subject: Question about Dates

I am having problems understanding the dates in oozie. The nominal time of my coordinator does not always match up with the output directory of my coordinator.

Here is some data taken from the runtime properties of my workflow. The runDate is the nominalTime of the workflow. The output dir is taken from the output event that uses ${coord:current(0)}.

  <property>
    <name>runDate</name>
    <value>2012-03-04</value>
  </property>
  <property>
    <name>outputDir</name>
    <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
  </property>

Here is the dataset definition.

    <dataset name="example" frequency="${coord:days(1)}"
        initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
        <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
        </uri-template>
        <done-flag></done-flag>
    </dataset>

The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}" 

I want the date on the outputDir to match the runDate. What is the best was to achieve that? In particular, I want to know how oozie chooses the date to use with an output event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems like they should match up. I suspect that am thinking about this all wrong though.

Max