You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Deepika Khera <dk...@lyris.com> on 2012/03/19 23:08:25 UTC

coordinator xml

Hi,

I have a coordinator.xml that looks like below. This is a daily job that
I want to run once every day at 1:15AM GMT. It should process data from
previous day for hour 00-23.
So the job that runs on March 19th,1:15AM will process input data for
{/2012/03/18/00 - /2012/03/18/23} .

The job starts fine and uses the right datasets but then somehow changes
the materialization time to 00:15AM. Following that it processes data
for {/2012/03/17/23} + {/2012/03/18/00 - /2012/03/18/22}. 


<coordinator-app name="daily-web-coord"
	frequency="${coord:days(1)}" start="2012-02-29T01:15Z"
end="2013-01-01T01:30Z"
	timezone="America/Los_Angeles" xmlns="uri:oozie:coordinator:0.1">
	<datasets>		
		<dataset name="daily-web-logs" frequency="${coord:hours(1)}"
			initial-instance="2012-02-28T00:00Z" timezone="America/Los_Angeles">
			<uri-template>
				${baseFsURI}/parsed/daily-web/${YEAR}/${MONTH}/${DAY}/${HOUR}
			</uri-template>
			<done-flag></done-flag>
		</dataset>
	</datasets>
	<input-events>
		<data-in name="daily-web-input" dataset="daily-web-logs">
				<start-instance>${coord:current(-25)}</start-instance>  
          	<end-instance>${coord:current(-2)}</end-instance>
		</data-in>		
	</input-events>

	<action>
		<workflow>
			<app-path>${wf_app_path}</app-path>
			<configuration>
				<property>
					<name>nameNode</name>
					<value>${nameNode}</value>
				</property>
				<property>
					<name>jobTracker</name>
					<value>${jobTracker}</value>
				</property>
				<property>
					<name>wfDailyWebInput</name>
					<value>${coord:dataIn('daily-web-input')}</value>
				</property>		
			</configuration>
		</workflow>
	</action>
</coordinator-app>

Any issues you see with the coordinator.xml?

Thanks,
Deepika


Re: coordinator xml

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Deepika,
Your assessment looks reasonable. I would suggest to use default (UTC) as timezone.
Regards,
Mohamamd 


----- Original Message -----
From: Deepika Khera <dk...@lyris.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Monday, March 19, 2012 4:27 PM
Subject: Re: coordinator xml

Thanks for your response Mohammad.

Let me put my question differently, when I run the coordinator.xml as
below, should it ever show me the next materialization time as 00:15 (i
am expecting 01:15 as in coordinator.xml). 00:15 is then the nominal
time of the job and it ends up working on the wrong dataset as I
explained. 
When I run this job, it tries to catch up the jobs by running it for
some old data. I have seen in few runs that it picks up the right
dataset. 

About timezone, there is not really a need to have LA time. Initially I
thought that the datasets could vary based on daylight saving, but since
I pick up 24 datasets for a day always, its really not helping me in any
way.
---------------------------

Here is what I think is happening this job schedule says that it should
run the job at 01:15AM every day starting Feb29th. The day light saving
time is off back then(unfortunately due to some other issues this job
did not run and the data stayed there). Then on March 11th, daylight
saving begins. Since I have time zone as LA time, it thinks that it
should offset the hour and run the job at 00:15 instead. Does this make
sense? Please correct me if this is wrong.
This theory matches with the fact that data runs for my jobs before
March 11th are working on the right data and the ones after that are
not.

Thanks,
Deepika


On Mon, 2012-03-19 at 15:49 -0700, Mohammad Islam wrote:

> Hi Deepika,
> 
> What do you mean by "changes materialization time to 00:15AM"? What is the nominal time for the problematic action?
> Did it ever run correctly for any other action?
> 
> Regards,
> Mohammad
> 
> 
> Also do you need timezone="America/Los_Angeles" ? Can't you use the default which is UTC? No problem of using LA time, I want to make sure that you have the requirement for non-utc time? 
> Regards,
> Mohammad
>  
> 
> 
> ----- Original Message -----
> From: Deepika Khera <dk...@lyris.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Monday, March 19, 2012 3:08 PM
> Subject: coordinator xml
> 
> Hi,
> 
> I have a coordinator.xml that looks like below. This is a daily job that
> I want to run once every day at 1:15AM GMT. It should process data from
> previous day for hour 00-23.
> So the job that runs on March 19th,1:15AM will process input data for
> {/2012/03/18/00 - /2012/03/18/23} .
> 
> The job starts fine and uses the right datasets but then somehow changes
> the materialization time to 00:15AM. Following that it processes data
> for {/2012/03/17/23} + {/2012/03/18/00 - /2012/03/18/22}. 
> 
> 
> <coordinator-app name="daily-web-coord"
>     frequency="${coord:days(1)}" start="2012-02-29T01:15Z"
> end="2013-01-01T01:30Z"
>     timezone="America/Los_Angeles" xmlns="uri:oozie:coordinator:0.1">
>     <datasets>        
>         <dataset name="daily-web-logs" frequency="${coord:hours(1)}"
>             initial-instance="2012-02-28T00:00Z" timezone="America/Los_Angeles">
>             <uri-template>
>                 ${baseFsURI}/parsed/daily-web/${YEAR}/${MONTH}/${DAY}/${HOUR}
>             </uri-template>
>             <done-flag></done-flag>
>         </dataset>
>     </datasets>
>     <input-events>
>         <data-in name="daily-web-input" dataset="daily-web-logs">
>                 <start-instance>${coord:current(-25)}</start-instance>  
>               <end-instance>${coord:current(-2)}</end-instance>
>         </data-in>        
>     </input-events>
> 
>     <action>
>         <workflow>
>             <app-path>${wf_app_path}</app-path>
>             <configuration>
>                 <property>
>                     <name>nameNode</name>
>                     <value>${nameNode}</value>
>                 </property>
>                 <property>
>                     <name>jobTracker</name>
>                     <value>${jobTracker}</value>
>                 </property>
>                 <property>
>                     <name>wfDailyWebInput</name>
>                     <value>${coord:dataIn('daily-web-input')}</value>
>                 </property>        
>             </configuration>
>         </workflow>
>     </action>
> </coordinator-app>
> 
> Any issues you see with the coordinator.xml?
> 
> Thanks,
> Deepika

Re: coordinator xml

Posted by Deepika Khera <dk...@lyris.com>.
Thanks for your response Mohammad.

Let me put my question differently, when I run the coordinator.xml as
below, should it ever show me the next materialization time as 00:15 (i
am expecting 01:15 as in coordinator.xml). 00:15 is then the nominal
time of the job and it ends up working on the wrong dataset as I
explained. 
When I run this job, it tries to catch up the jobs by running it for
some old data. I have seen in few runs that it picks up the right
dataset. 

About timezone, there is not really a need to have LA time. Initially I
thought that the datasets could vary based on daylight saving, but since
I pick up 24 datasets for a day always, its really not helping me in any
way.
---------------------------

Here is what I think is happening this job schedule says that it should
run the job at 01:15AM every day starting Feb29th. The day light saving
time is off back then(unfortunately due to some other issues this job
did not run and the data stayed there). Then on March 11th, daylight
saving begins. Since I have time zone as LA time, it thinks that it
should offset the hour and run the job at 00:15 instead. Does this make
sense? Please correct me if this is wrong.
This theory matches with the fact that data runs for my jobs before
March 11th are working on the right data and the ones after that are
not.

Thanks,
Deepika


On Mon, 2012-03-19 at 15:49 -0700, Mohammad Islam wrote:

> Hi Deepika,
> 
> What do you mean by "changes materialization time to 00:15AM"? What is the nominal time for the problematic action?
> Did it ever run correctly for any other action?
> 
> Regards,
> Mohammad
> 
> 
> Also do you need timezone="America/Los_Angeles" ? Can't you use the default which is UTC? No problem of using LA time, I want to make sure that you have the requirement for non-utc time? 
> Regards,
> Mohammad
>  
> 
> 
> ----- Original Message -----
> From: Deepika Khera <dk...@lyris.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Monday, March 19, 2012 3:08 PM
> Subject: coordinator xml
> 
> Hi,
> 
> I have a coordinator.xml that looks like below. This is a daily job that
> I want to run once every day at 1:15AM GMT. It should process data from
> previous day for hour 00-23.
> So the job that runs on March 19th,1:15AM will process input data for
> {/2012/03/18/00 - /2012/03/18/23} .
> 
> The job starts fine and uses the right datasets but then somehow changes
> the materialization time to 00:15AM. Following that it processes data
> for {/2012/03/17/23} + {/2012/03/18/00 - /2012/03/18/22}. 
> 
> 
> <coordinator-app name="daily-web-coord"
>     frequency="${coord:days(1)}" start="2012-02-29T01:15Z"
> end="2013-01-01T01:30Z"
>     timezone="America/Los_Angeles" xmlns="uri:oozie:coordinator:0.1">
>     <datasets>        
>         <dataset name="daily-web-logs" frequency="${coord:hours(1)}"
>             initial-instance="2012-02-28T00:00Z" timezone="America/Los_Angeles">
>             <uri-template>
>                 ${baseFsURI}/parsed/daily-web/${YEAR}/${MONTH}/${DAY}/${HOUR}
>             </uri-template>
>             <done-flag></done-flag>
>         </dataset>
>     </datasets>
>     <input-events>
>         <data-in name="daily-web-input" dataset="daily-web-logs">
>                 <start-instance>${coord:current(-25)}</start-instance>  
>               <end-instance>${coord:current(-2)}</end-instance>
>         </data-in>        
>     </input-events>
> 
>     <action>
>         <workflow>
>             <app-path>${wf_app_path}</app-path>
>             <configuration>
>                 <property>
>                     <name>nameNode</name>
>                     <value>${nameNode}</value>
>                 </property>
>                 <property>
>                     <name>jobTracker</name>
>                     <value>${jobTracker}</value>
>                 </property>
>                 <property>
>                     <name>wfDailyWebInput</name>
>                     <value>${coord:dataIn('daily-web-input')}</value>
>                 </property>        
>             </configuration>
>         </workflow>
>     </action>
> </coordinator-app>
> 
> Any issues you see with the coordinator.xml?
> 
> Thanks,
> Deepika



Re: coordinator xml

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Deepika,

What do you mean by "changes materialization time to 00:15AM"? What is the nominal time for the problematic action?
Did it ever run correctly for any other action?

Regards,
Mohammad


Also do you need timezone="America/Los_Angeles" ? Can't you use the default which is UTC? No problem of using LA time, I want to make sure that you have the requirement for non-utc time? 
Regards,
Mohammad
 


----- Original Message -----
From: Deepika Khera <dk...@lyris.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Monday, March 19, 2012 3:08 PM
Subject: coordinator xml

Hi,

I have a coordinator.xml that looks like below. This is a daily job that
I want to run once every day at 1:15AM GMT. It should process data from
previous day for hour 00-23.
So the job that runs on March 19th,1:15AM will process input data for
{/2012/03/18/00 - /2012/03/18/23} .

The job starts fine and uses the right datasets but then somehow changes
the materialization time to 00:15AM. Following that it processes data
for {/2012/03/17/23} + {/2012/03/18/00 - /2012/03/18/22}. 


<coordinator-app name="daily-web-coord"
    frequency="${coord:days(1)}" start="2012-02-29T01:15Z"
end="2013-01-01T01:30Z"
    timezone="America/Los_Angeles" xmlns="uri:oozie:coordinator:0.1">
    <datasets>        
        <dataset name="daily-web-logs" frequency="${coord:hours(1)}"
            initial-instance="2012-02-28T00:00Z" timezone="America/Los_Angeles">
            <uri-template>
                ${baseFsURI}/parsed/daily-web/${YEAR}/${MONTH}/${DAY}/${HOUR}
            </uri-template>
            <done-flag></done-flag>
        </dataset>
    </datasets>
    <input-events>
        <data-in name="daily-web-input" dataset="daily-web-logs">
                <start-instance>${coord:current(-25)}</start-instance>  
              <end-instance>${coord:current(-2)}</end-instance>
        </data-in>        
    </input-events>

    <action>
        <workflow>
            <app-path>${wf_app_path}</app-path>
            <configuration>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>wfDailyWebInput</name>
                    <value>${coord:dataIn('daily-web-input')}</value>
                </property>        
            </configuration>
        </workflow>
    </action>
</coordinator-app>

Any issues you see with the coordinator.xml?

Thanks,
Deepika