You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by 朱健 <zh...@jd.com> on 2015/06/15 12:51:19 UTC

Questions about oozie timezone

Hi,

Thanks for read this email.

I have used oozie for about 2 years. Now I have encountered one problem about the time zone.

Because we located at GMT+08:00 timezone, our Hadoop system makes the convention that all the data path on the HDFS is named by the GMT+08:00 timezone. That means:
At UTC 2015-01-01T00:00Z, the output hourly data located under this folder: $root/2015010108, not the $root/2015010100
At UTC 2015-01-01T01:00Z, the output hourly data located under this folder: $root/2015010109, not the $root/2015010101

So if I set the timezone in the coord to UTC, the oozie job will read the data of 00 hour, but I want it to read the 08. For me in Beijing, China, it is natural for me to understand that the oozie job will read the 08 data at local 08:00

I also tried to set the timezone to GMT+08:00, it didn’t work. Seems the timezone only impact the “Daylight Saving Time”.

Currently I add 8 to my instance number in the coord to fix it temporarily : Change From <instance>0</instance> to <instance>8</instance>
This may be acceptable for hourly job. But it is really ugly to minutes jobs or dailyl jobs. Almost unreadable for human.

So how can I solve this problem?

Thanks,
Jian

Re: Questions about oozie timezone

Posted by David Morel <da...@amakuru.net>.
On 16 Jun 2015, at 2:00, Laurent H wrote:

> I've got the same issue Jian, it's could be great to have an answer 
> oozie
> experts! ;)
>
> --
> Laurent HATIER - Consultant Big Data & Business Intelligence chez 
> CapGemini
> fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
> <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>
>
> 2015-06-15 12:51 GMT+02:00 朱健 <zh...@jd.com>:
>
>> Hi,
>>
>> Thanks for read this email.
>>
>> I have used oozie for about 2 years. Now I have encountered one 
>> problem
>> about the time zone.
>>
>> Because we located at GMT+08:00 timezone, our Hadoop system makes the
>> convention that all the data path on the HDFS is named by the 
>> GMT+08:00
>> timezone. That means:
>> At UTC 2015-01-01T00:00Z, the output hourly data located under this
>> folder: $root/2015010108, not the $root/2015010100
>> At UTC 2015-01-01T01:00Z, the output hourly data located under this
>> folder: $root/2015010109, not the $root/2015010101
>>
>> So if I set the timezone in the coord to UTC, the oozie job will read 
>> the
>> data of 00 hour, but I want it to read the 08. For me in Beijing, 
>> China, it
>> is natural for me to understand that the oozie job will read the 08 
>> data at
>> local 08:00
>>
>> I also tried to set the timezone to GMT+08:00, it didn’t work. 
>> Seems the
>> timezone only impact the “Daylight Saving Time”.
>>
>> Currently I add 8 to my instance number in the coord to fix it 
>> temporarily
>> : Change From <instance>0</instance> to <instance>8</instance>
>> This may be acceptable for hourly job. But it is really ugly to 
>> minutes
>> jobs or dailyl jobs. Almost unreadable for human.
>>
>> So how can I solve this problem?
>>
>> Thanks,
>> Jian
>>

Hi,

the timezone spec in the coordinator node only serves to figure out 
wether
there are 23, 24 or 25 hours on a given day (DST switches); the 
timezones
calculations and anything related to time offsets is done in the 
datasets
sections; try something like:

<coordinator-app xmlns="uri:oozie:coordinator:0.1" timezone="UTC"
     name="${appName}"
     frequency="${coord:hours(1)}"
     start="${startTime}"
     end="${endTime}"
    >
...
     <datasets>
         <dataset
             name="hourly-partition"
             frequency="${coord:hours(1)}"
             initial-instance="${startTime}"
             timezone="Asia/Shanghai">
             <uri-template><!--whatever path 
-->/yyyymmddhh=${YEAR}${MONTH}${DAY}${HOUR}</uri-template>
         </dataset>
     </datasets>

     <input-events>
         <data-in name="in" dataset="hourly-partition">
             <instance>${coord:current(coord:tzOffset()/60)}</instance>
         </data-in>
     </input-events>

David

Re: Questions about oozie timezone

Posted by Laurent H <la...@gmail.com>.
it could be* sorry !

--
Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini
fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
<http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>

2015-06-16 2:00 GMT+02:00 Laurent H <la...@gmail.com>:

> I've got the same issue Jian, it's could be great to have an answer oozie
> experts! ;)
>
> --
> Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini
> fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
> <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>
>
> 2015-06-15 12:51 GMT+02:00 朱健 <zh...@jd.com>:
>
>> Hi,
>>
>> Thanks for read this email.
>>
>> I have used oozie for about 2 years. Now I have encountered one problem
>> about the time zone.
>>
>> Because we located at GMT+08:00 timezone, our Hadoop system makes the
>> convention that all the data path on the HDFS is named by the GMT+08:00
>> timezone. That means:
>> At UTC 2015-01-01T00:00Z, the output hourly data located under this
>> folder: $root/2015010108, not the $root/2015010100
>> At UTC 2015-01-01T01:00Z, the output hourly data located under this
>> folder: $root/2015010109, not the $root/2015010101
>>
>> So if I set the timezone in the coord to UTC, the oozie job will read the
>> data of 00 hour, but I want it to read the 08. For me in Beijing, China, it
>> is natural for me to understand that the oozie job will read the 08 data at
>> local 08:00
>>
>> I also tried to set the timezone to GMT+08:00, it didn’t work. Seems the
>> timezone only impact the “Daylight Saving Time”.
>>
>> Currently I add 8 to my instance number in the coord to fix it
>> temporarily : Change From <instance>0</instance> to <instance>8</instance>
>> This may be acceptable for hourly job. But it is really ugly to minutes
>> jobs or dailyl jobs. Almost unreadable for human.
>>
>> So how can I solve this problem?
>>
>> Thanks,
>> Jian
>>
>
>

Re: Questions about oozie timezone

Posted by Laurent H <la...@gmail.com>.
I've got the same issue Jian, it's could be great to have an answer oozie
experts! ;)

--
Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini
fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
<http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>

2015-06-15 12:51 GMT+02:00 朱健 <zh...@jd.com>:

> Hi,
>
> Thanks for read this email.
>
> I have used oozie for about 2 years. Now I have encountered one problem
> about the time zone.
>
> Because we located at GMT+08:00 timezone, our Hadoop system makes the
> convention that all the data path on the HDFS is named by the GMT+08:00
> timezone. That means:
> At UTC 2015-01-01T00:00Z, the output hourly data located under this
> folder: $root/2015010108, not the $root/2015010100
> At UTC 2015-01-01T01:00Z, the output hourly data located under this
> folder: $root/2015010109, not the $root/2015010101
>
> So if I set the timezone in the coord to UTC, the oozie job will read the
> data of 00 hour, but I want it to read the 08. For me in Beijing, China, it
> is natural for me to understand that the oozie job will read the 08 data at
> local 08:00
>
> I also tried to set the timezone to GMT+08:00, it didn’t work. Seems the
> timezone only impact the “Daylight Saving Time”.
>
> Currently I add 8 to my instance number in the coord to fix it temporarily
> : Change From <instance>0</instance> to <instance>8</instance>
> This may be acceptable for hourly job. But it is really ugly to minutes
> jobs or dailyl jobs. Almost unreadable for human.
>
> So how can I solve this problem?
>
> Thanks,
> Jian
>