You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by "J. McConnell" <j...@ubermenschconsulting.com> on 2014/12/15 22:46:44 UTC

Output event instance resolution skipping a month

I wonder if anyone could help point me in the best direction to diagnose
this issue. I'm working on a new job that will take as input a month's
worth of hourly data and output a result for that month. The input data is
being resolved correctly, but I can't get the output instance to be
resolved correctly. My output dataset is defined as:

  <datasets>
    ...
    <dataset name="output" frequency="${coord:months(1)}"
initial-instance="${initialOutputInstance}" timezone="US/Mountain">

<uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
    </dataset>
  </datasets>

My output event is defined as:

  <output-events>
    <data-out name="output" dataset="output">
      <instance>${coord:current(-1)}</instance>
    </data-out>
  </output-events>

With a start date of:

start=2014-10-01T00:00Z

This results in an output path of:

hdfs://namenode:8020/output/path/2014/08

I expected 2014/09 (I am correctly getting all of September's data as
inputs). The truly confusing thing is that, if I instead ask for
${coord:current(0)} for the output instance, the result is 2014/10. So, it
jumps from August to October.

Has anyone seen behavior like this? If not, does anyone have any
suggestions where I can look to determine what is going on?

Thank you in advance for whatever help you can provide,

- J.

-- 
J. McConnell
Founder, Übermensch Consulting

Oozie and secure cluster

Posted by "Kilaru, Sambaiah" <Sa...@intuit.com>.
Hi All,
  I don¹t know it is a known issue, but with Oozie 4.0, if I run any Oozie
command with out TGT,
Still Oozie trying to contact KDC for session key with empty password.
This is making authentication server
 as failed Password attempt and multiple of this is locking issue.
Why is Oozie trying to reach AS for session key if there is no TGT?
For hdfs commands behaviour is client is exiting with out reaching AS.

Thanks,
Sam


Re: Output event instance resolution skipping a month

Posted by "J. McConnell" <j...@ubermenschconsulting.com>.
Thanks, Shwetha. That indeed was the issue and I eventually got there, but
I appreciate the confirmation.

- J.

On Fri, Dec 19, 2014 at 6:22 AM, Shwetha GS <sh...@inmobi.com> wrote:

> The issue is with using timezone MST:
>
> As per oozie docs
>
> http://oozie.apache.org/docs/4.1.0/CoordinatorFunctionalSpec.html#a4.2._Timezone_Representation
> ,
> The baseline datetime for datasets and coordinator applications are
> expressed in UTC. The baseline datetime is the time of the first
> occurrence.
> The timezone indicator enables Oozie coordinator engine to properly compute
> frequencies that are daylight-saving sensitive.
>
> So, the dataset start time 2013-10-01T00:00Z UTC maps to previous day in
> MST and hence the confusion. Use start time as 2013-10-01T06:00Z
>
> On Thu, Dec 18, 2014 at 9:46 PM, J. McConnell <j...@ubermenschconsulting.com>
> wrote:
> >
> > Does anyone have any suggestions on what I might do to debug this? Are
> > there any flags I can enable either client- or server-side to have some
> > diagnostic information logged?
> >
> > Thank you,
> >
> > - J.
> >
> > On Tue, Dec 16, 2014 at 11:23 AM, J. McConnell <
> j@ubermenschconsulting.com
> > >
> > wrote:
> > >
> > > The given start date is the coord start date. So, currently I have:
> > >
> > > coord start: 2014-10-01T00:00Z
> > > coord frequency: ${coord:months(1)}
> > > output dataset initial-instance: 2013-10-01T00:00Z
> > >
> > > I'm not crystal clear on how coord timezones play with dataset
> timezones,
> > > but the values I'm using have worked for me on all of my other jobs.
> The
> > > coord timezone is UTC and the dataset timezone is US/Mountain.
> > >
> > > Thanks,
> > >
> > > - J.
> > >
> > > On Mon, Dec 15, 2014 at 11:40 PM, Shwetha GS <sh...@inmobi.com>
> > > wrote:
> > >>
> > >> Is start=2014-10-01T00:00Z the coord start or output dataset start?
> > Whats
> > >> the value of coord start, coord frequency and output dataset start?
> > >>
> > >> On Tue, Dec 16, 2014 at 3:16 AM, J. McConnell <
> > j@ubermenschconsulting.com
> > >> >
> > >> wrote:
> > >> >
> > >> > I wonder if anyone could help point me in the best direction to
> > diagnose
> > >> > this issue. I'm working on a new job that will take as input a
> month's
> > >> > worth of hourly data and output a result for that month. The input
> > data
> > >> is
> > >> > being resolved correctly, but I can't get the output instance to be
> > >> > resolved correctly. My output dataset is defined as:
> > >> >
> > >> >   <datasets>
> > >> >     ...
> > >> >     <dataset name="output" frequency="${coord:months(1)}"
> > >> > initial-instance="${initialOutputInstance}" timezone="US/Mountain">
> > >> >
> > >> >
> > >> >
> > >>
> >
> <uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
> > >> >     </dataset>
> > >> >   </datasets>
> > >> >
> > >> > My output event is defined as:
> > >> >
> > >> >   <output-events>
> > >> >     <data-out name="output" dataset="output">
> > >> >       <instance>${coord:current(-1)}</instance>
> > >> >     </data-out>
> > >> >   </output-events>
> > >> >
> > >> > With a start date of:
> > >> >
> > >> > start=2014-10-01T00:00Z
> > >> >
> > >> > This results in an output path of:
> > >> >
> > >> > hdfs://namenode:8020/output/path/2014/08
> > >> >
> > >> > I expected 2014/09 (I am correctly getting all of September's data
> as
> > >> > inputs). The truly confusing thing is that, if I instead ask for
> > >> > ${coord:current(0)} for the output instance, the result is 2014/10.
> > So,
> > >> it
> > >> > jumps from August to October.
> > >> >
> > >> > Has anyone seen behavior like this? If not, does anyone have any
> > >> > suggestions where I can look to determine what is going on?
> > >> >
> > >> > Thank you in advance for whatever help you can provide,
> > >> >
> > >> > - J.
> > >> >
> > >> > --
> > >> > J. McConnell
> > >> > Founder, Übermensch Consulting
> > >> >
> > >>
> > >> --
> > >> _____________________________________________________________
> > >> The information contained in this communication is intended solely for
> > the
> > >> use of the individual or entity to whom it is addressed and others
> > >> authorized to receive it. It may contain confidential or legally
> > >> privileged
> > >> information. If you are not the intended recipient you are hereby
> > notified
> > >> that any disclosure, copying, distribution or taking any action in
> > >> reliance
> > >> on the contents of this information is strictly prohibited and may be
> > >> unlawful. If you have received this communication in error, please
> > notify
> > >> us immediately by responding to this email and then delete it from
> your
> > >> system. The firm is neither liable for the proper and complete
> > >> transmission
> > >> of the information contained in this communication nor for any delay
> in
> > >> its
> > >> receipt.
> > >>
> > >
> > >
> > > --
> > > J. McConnell
> > > Founder, Übermensch Consulting
> > >
> >
> >
> > --
> > J. McConnell
> > Founder, Übermensch Consulting
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>



-- 
J. McConnell
Founder, Übermensch Consulting

Re: Output event instance resolution skipping a month

Posted by Shwetha GS <sh...@inmobi.com>.
The issue is with using timezone MST:

As per oozie docs
http://oozie.apache.org/docs/4.1.0/CoordinatorFunctionalSpec.html#a4.2._Timezone_Representation
,
The baseline datetime for datasets and coordinator applications are
expressed in UTC. The baseline datetime is the time of the first occurrence.
The timezone indicator enables Oozie coordinator engine to properly compute
frequencies that are daylight-saving sensitive.

So, the dataset start time 2013-10-01T00:00Z UTC maps to previous day in
MST and hence the confusion. Use start time as 2013-10-01T06:00Z

On Thu, Dec 18, 2014 at 9:46 PM, J. McConnell <j...@ubermenschconsulting.com>
wrote:
>
> Does anyone have any suggestions on what I might do to debug this? Are
> there any flags I can enable either client- or server-side to have some
> diagnostic information logged?
>
> Thank you,
>
> - J.
>
> On Tue, Dec 16, 2014 at 11:23 AM, J. McConnell <j@ubermenschconsulting.com
> >
> wrote:
> >
> > The given start date is the coord start date. So, currently I have:
> >
> > coord start: 2014-10-01T00:00Z
> > coord frequency: ${coord:months(1)}
> > output dataset initial-instance: 2013-10-01T00:00Z
> >
> > I'm not crystal clear on how coord timezones play with dataset timezones,
> > but the values I'm using have worked for me on all of my other jobs. The
> > coord timezone is UTC and the dataset timezone is US/Mountain.
> >
> > Thanks,
> >
> > - J.
> >
> > On Mon, Dec 15, 2014 at 11:40 PM, Shwetha GS <sh...@inmobi.com>
> > wrote:
> >>
> >> Is start=2014-10-01T00:00Z the coord start or output dataset start?
> Whats
> >> the value of coord start, coord frequency and output dataset start?
> >>
> >> On Tue, Dec 16, 2014 at 3:16 AM, J. McConnell <
> j@ubermenschconsulting.com
> >> >
> >> wrote:
> >> >
> >> > I wonder if anyone could help point me in the best direction to
> diagnose
> >> > this issue. I'm working on a new job that will take as input a month's
> >> > worth of hourly data and output a result for that month. The input
> data
> >> is
> >> > being resolved correctly, but I can't get the output instance to be
> >> > resolved correctly. My output dataset is defined as:
> >> >
> >> >   <datasets>
> >> >     ...
> >> >     <dataset name="output" frequency="${coord:months(1)}"
> >> > initial-instance="${initialOutputInstance}" timezone="US/Mountain">
> >> >
> >> >
> >> >
> >>
> <uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
> >> >     </dataset>
> >> >   </datasets>
> >> >
> >> > My output event is defined as:
> >> >
> >> >   <output-events>
> >> >     <data-out name="output" dataset="output">
> >> >       <instance>${coord:current(-1)}</instance>
> >> >     </data-out>
> >> >   </output-events>
> >> >
> >> > With a start date of:
> >> >
> >> > start=2014-10-01T00:00Z
> >> >
> >> > This results in an output path of:
> >> >
> >> > hdfs://namenode:8020/output/path/2014/08
> >> >
> >> > I expected 2014/09 (I am correctly getting all of September's data as
> >> > inputs). The truly confusing thing is that, if I instead ask for
> >> > ${coord:current(0)} for the output instance, the result is 2014/10.
> So,
> >> it
> >> > jumps from August to October.
> >> >
> >> > Has anyone seen behavior like this? If not, does anyone have any
> >> > suggestions where I can look to determine what is going on?
> >> >
> >> > Thank you in advance for whatever help you can provide,
> >> >
> >> > - J.
> >> >
> >> > --
> >> > J. McConnell
> >> > Founder, Übermensch Consulting
> >> >
> >>
> >> --
> >> _____________________________________________________________
> >> The information contained in this communication is intended solely for
> the
> >> use of the individual or entity to whom it is addressed and others
> >> authorized to receive it. It may contain confidential or legally
> >> privileged
> >> information. If you are not the intended recipient you are hereby
> notified
> >> that any disclosure, copying, distribution or taking any action in
> >> reliance
> >> on the contents of this information is strictly prohibited and may be
> >> unlawful. If you have received this communication in error, please
> notify
> >> us immediately by responding to this email and then delete it from your
> >> system. The firm is neither liable for the proper and complete
> >> transmission
> >> of the information contained in this communication nor for any delay in
> >> its
> >> receipt.
> >>
> >
> >
> > --
> > J. McConnell
> > Founder, Übermensch Consulting
> >
>
>
> --
> J. McConnell
> Founder, Übermensch Consulting
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Output event instance resolution skipping a month

Posted by "J. McConnell" <j...@ubermenschconsulting.com>.
Does anyone have any suggestions on what I might do to debug this? Are
there any flags I can enable either client- or server-side to have some
diagnostic information logged?

Thank you,

- J.

On Tue, Dec 16, 2014 at 11:23 AM, J. McConnell <j...@ubermenschconsulting.com>
wrote:
>
> The given start date is the coord start date. So, currently I have:
>
> coord start: 2014-10-01T00:00Z
> coord frequency: ${coord:months(1)}
> output dataset initial-instance: 2013-10-01T00:00Z
>
> I'm not crystal clear on how coord timezones play with dataset timezones,
> but the values I'm using have worked for me on all of my other jobs. The
> coord timezone is UTC and the dataset timezone is US/Mountain.
>
> Thanks,
>
> - J.
>
> On Mon, Dec 15, 2014 at 11:40 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>>
>> Is start=2014-10-01T00:00Z the coord start or output dataset start? Whats
>> the value of coord start, coord frequency and output dataset start?
>>
>> On Tue, Dec 16, 2014 at 3:16 AM, J. McConnell <j@ubermenschconsulting.com
>> >
>> wrote:
>> >
>> > I wonder if anyone could help point me in the best direction to diagnose
>> > this issue. I'm working on a new job that will take as input a month's
>> > worth of hourly data and output a result for that month. The input data
>> is
>> > being resolved correctly, but I can't get the output instance to be
>> > resolved correctly. My output dataset is defined as:
>> >
>> >   <datasets>
>> >     ...
>> >     <dataset name="output" frequency="${coord:months(1)}"
>> > initial-instance="${initialOutputInstance}" timezone="US/Mountain">
>> >
>> >
>> >
>> <uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
>> >     </dataset>
>> >   </datasets>
>> >
>> > My output event is defined as:
>> >
>> >   <output-events>
>> >     <data-out name="output" dataset="output">
>> >       <instance>${coord:current(-1)}</instance>
>> >     </data-out>
>> >   </output-events>
>> >
>> > With a start date of:
>> >
>> > start=2014-10-01T00:00Z
>> >
>> > This results in an output path of:
>> >
>> > hdfs://namenode:8020/output/path/2014/08
>> >
>> > I expected 2014/09 (I am correctly getting all of September's data as
>> > inputs). The truly confusing thing is that, if I instead ask for
>> > ${coord:current(0)} for the output instance, the result is 2014/10. So,
>> it
>> > jumps from August to October.
>> >
>> > Has anyone seen behavior like this? If not, does anyone have any
>> > suggestions where I can look to determine what is going on?
>> >
>> > Thank you in advance for whatever help you can provide,
>> >
>> > - J.
>> >
>> > --
>> > J. McConnell
>> > Founder, Übermensch Consulting
>> >
>>
>> --
>> _____________________________________________________________
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally
>> privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in
>> reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify
>> us immediately by responding to this email and then delete it from your
>> system. The firm is neither liable for the proper and complete
>> transmission
>> of the information contained in this communication nor for any delay in
>> its
>> receipt.
>>
>
>
> --
> J. McConnell
> Founder, Übermensch Consulting
>


-- 
J. McConnell
Founder, Übermensch Consulting

Re: Output event instance resolution skipping a month

Posted by "J. McConnell" <j...@ubermenschconsulting.com>.
The given start date is the coord start date. So, currently I have:

coord start: 2014-10-01T00:00Z
coord frequency: ${coord:months(1)}
output dataset initial-instance: 2013-10-01T00:00Z

I'm not crystal clear on how coord timezones play with dataset timezones,
but the values I'm using have worked for me on all of my other jobs. The
coord timezone is UTC and the dataset timezone is US/Mountain.

Thanks,

- J.

On Mon, Dec 15, 2014 at 11:40 PM, Shwetha GS <sh...@inmobi.com> wrote:
>
> Is start=2014-10-01T00:00Z the coord start or output dataset start? Whats
> the value of coord start, coord frequency and output dataset start?
>
> On Tue, Dec 16, 2014 at 3:16 AM, J. McConnell <j...@ubermenschconsulting.com>
> wrote:
> >
> > I wonder if anyone could help point me in the best direction to diagnose
> > this issue. I'm working on a new job that will take as input a month's
> > worth of hourly data and output a result for that month. The input data
> is
> > being resolved correctly, but I can't get the output instance to be
> > resolved correctly. My output dataset is defined as:
> >
> >   <datasets>
> >     ...
> >     <dataset name="output" frequency="${coord:months(1)}"
> > initial-instance="${initialOutputInstance}" timezone="US/Mountain">
> >
> >
> >
> <uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
> >     </dataset>
> >   </datasets>
> >
> > My output event is defined as:
> >
> >   <output-events>
> >     <data-out name="output" dataset="output">
> >       <instance>${coord:current(-1)}</instance>
> >     </data-out>
> >   </output-events>
> >
> > With a start date of:
> >
> > start=2014-10-01T00:00Z
> >
> > This results in an output path of:
> >
> > hdfs://namenode:8020/output/path/2014/08
> >
> > I expected 2014/09 (I am correctly getting all of September's data as
> > inputs). The truly confusing thing is that, if I instead ask for
> > ${coord:current(0)} for the output instance, the result is 2014/10. So,
> it
> > jumps from August to October.
> >
> > Has anyone seen behavior like this? If not, does anyone have any
> > suggestions where I can look to determine what is going on?
> >
> > Thank you in advance for whatever help you can provide,
> >
> > - J.
> >
> > --
> > J. McConnell
> > Founder, Übermensch Consulting
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>


-- 
J. McConnell
Founder, Übermensch Consulting

Re: Output event instance resolution skipping a month

Posted by Shwetha GS <sh...@inmobi.com>.
Is start=2014-10-01T00:00Z the coord start or output dataset start? Whats
the value of coord start, coord frequency and output dataset start?

On Tue, Dec 16, 2014 at 3:16 AM, J. McConnell <j...@ubermenschconsulting.com>
wrote:
>
> I wonder if anyone could help point me in the best direction to diagnose
> this issue. I'm working on a new job that will take as input a month's
> worth of hourly data and output a result for that month. The input data is
> being resolved correctly, but I can't get the output instance to be
> resolved correctly. My output dataset is defined as:
>
>   <datasets>
>     ...
>     <dataset name="output" frequency="${coord:months(1)}"
> initial-instance="${initialOutputInstance}" timezone="US/Mountain">
>
>
> <uri-template>${nameNode}/user/${runtimeUser}/${outputBaseDir}/${YEAR}/${MONTH}</uri-template>
>     </dataset>
>   </datasets>
>
> My output event is defined as:
>
>   <output-events>
>     <data-out name="output" dataset="output">
>       <instance>${coord:current(-1)}</instance>
>     </data-out>
>   </output-events>
>
> With a start date of:
>
> start=2014-10-01T00:00Z
>
> This results in an output path of:
>
> hdfs://namenode:8020/output/path/2014/08
>
> I expected 2014/09 (I am correctly getting all of September's data as
> inputs). The truly confusing thing is that, if I instead ask for
> ${coord:current(0)} for the output instance, the result is 2014/10. So, it
> jumps from August to October.
>
> Has anyone seen behavior like this? If not, does anyone have any
> suggestions where I can look to determine what is going on?
>
> Thank you in advance for whatever help you can provide,
>
> - J.
>
> --
> J. McConnell
> Founder, Übermensch Consulting
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.